Guides

Ingesting Events

Overview

We strongly recommend leveraging Pay-i as a proxy, since the service will automatically handle the complexity of accounting for the different unit types, latency and failure tracking, and other multi-model and multi-modal nuances with minimal latency overhead.

However, there are some cases where Pay-i cannot be used as a proxy, such as when leveraging Custom Resources, or when using Pay-i to track costs from non-GenAI sources.

The Ingest API allows you to provide all of the same information that Pay-i would normally automatically calculate as part of a proxied request. The call to the Ingest API should happen after the request to the Provider has completed, so that your service has all of the necessary data to provide.

Important: When working with streaming responses, you must read the stream to the end before ingesting the response. Pay-i needs the complete token information to accurately track usage and calculate costs. If you don't read the entire stream, you'll have incomplete data for ingestion.

All features supported by proxy are supported via Ingest except for blocking Limits. This is because Pay-i cannot block requests it does not proxy. When using ingest, Pay-i will still calculate the costs based on the reported units and associate the data with any provided tags, experiences, and limits for later use.

When the Ingest API is called, an xproxy_result is returned.

Ingest Fields

The Ingest API takes the following inputs to ingest an event:

#InputDescriptionBody/HeaderRequired
1CategoryThe Category of the Resource used in the request.BodyY
2ResourceThe request Resource, used to calculate pricing.BodyY
3UnitsThe number of input and output units for each of the request's Unit Types.BodyY
4E2ELEnd to end latency of the request.BodyN
5TTFTThe time to first token, which is equivalent to the E2EL for non-streaming scenarios. If both E2EL and TTFT are provided, then the Inter-Token Latency (ITL) and Output Tokens Per Second (OTPS) will automatically be calculated.BodyN
6Provider URIThe endpoint used for the request, shown in DevOps views.BodyN
7Request PromptThe JSON sent to the Provider as part of the request. If Logging is disabled for this application, this will not be saved.BodyN
8Request HeadersAn array of header names and values that were sent to the Provider.BodyN
9Provider ResponseThe JSON returned from the Provider after the request has completed. If Logging is disabled for this application, this will not be saved.BodyN
10Response HeadersAn array of header names and values that were received from the Provider with the response.BodyN
11event_timestampThe ISO 8601 timestamp of when the request was sent to the provider. If not provided, the current time is used. See more details in the event_timestamp section of this page.BodyN
12HTTP Status CodeThe status code of the HTTP request to the Provider, e.g., "200" or "400", used for failure tracking.BodyN
13PropertiesComing soonBodyN
14Experience PropertiesComing soonBodyN
15Limit IDsComma separated list of limit-ids to associate with the request. Note that blocking limits are not supported and will result in an error.HeaderN
16Request TagsComma separated list of request tags to associate with the request.HeaderN
17User IDThe user-id associated with the request.HeaderN
18Experience IDThe experience-id associated with the request.HeaderN
19Experience NameThe name of the experience-type used in the request. As with proxied requests, if an experience_name is provided and an experience-id is not, then one will automatically be generated.HeaderN

Ingest Example

{
  "category": "system.openai",
  "resource": "gpt-4o-mini",
  "event_timestamp": "2024-05-13T00:00:00",
  "end_to_end_latency_ms": 12450,
  "time_to_first_token_ms": 1143,
  "http_status_code": 200,
  "provider_uri": "https://api.openai.com/v1/chat/completions",
  "provider_prompt": "{ \"request\": \"Your request JSON here\" }",
  "units": {
    "text": {
      "input": 156,
      "output": 1746
    },
    "text_cache_read": {
      "input": 60,
      "output": 0
    },
    "vision": {
      "input": 3512,
      "output": 0
    }
  },
  "provider_request_headers": {
    "RequestHeader1": [
      "HeaderValue",
      "HeaderValue2"
    ],
    "RequestHeader2": [
      "HeaderValue"
    ]
  },
  "provider_response": [
    "{ \"response\": \"Provider response JSON here\" }"
  ],
  "provider_response_headers": {
    "ResponseHeader1": [
      "HeaderValue",
      "HeaderValue2"
    ],
    "ResponseHeader2": [
      "HeaderValue"
    ]
  },
  "properties": {
    "system.failure": "invalid_json"
  },
  "experience_properties": {
    "system.failure": "failed_customer_expectations"
  }
}

event_timestamp

The event_timestamp specifies when the ingested event has occurred. Specifying an event_timestamp is optional. If not specified, it defaults to UTC.Now.

An event_timestamp can be specified for any time in the past, and up to 5 minutes in the future to account for timing differences between your service and the Pay-i service.

When ingesting an event into the past, Pay-i will automatically use the price of the resource at that time when calculating the costs of the event. If you are ingesting a request for a custom resource, then the appropriate Resource Version is automatically selected.

If the event_timestamp refers to a point in time for which there is no pricing information (e.g., trying to ingest an event for gpt-4 before the model was invented), then an error will be thrown.

Bulk Ingest

Pay-i allows you to send thousands of events as part of a single network request, to reduce network overhead. This makes it easy to provide Pay-i with historical data or to handle high-traffic situations.

If you would like to use this feature, please contact [email protected].


Related APIs