Terms and Glossary
Pay-i provides tools to track, control, and understand costs, performance issues, and failures arising from GenAI consumption. It automatically and accurately calculates all GenAI costs including for scenarios such as tool use, vision, and streaming, where calculation can often be quite difficult. This section describes the high-level concepts used by all Pay-i workflows. To help with contextualizing the concepts described below, we will use the following scenario:
Example Scenario
“Summarization Service” makes a chat completion API call to OpenAI's public SaaS endpoint, using the gpt-4o model, on behalf of a user, 'Jane', in order to summarize a document. This is one of multiple API calls required to summarize the document as part of the "Document Summary" feature. The expenses of the call accrue towards Jane's $10 monthly 'premium account' limits.
Pay-i Concept | Scenario Section |
---|---|
Provider | OpenAI public SaaS endpoint |
Category | OpenAI |
Resource | gpt-4o |
Request | The chat completion API call |
Request Tag | ex. "summarize_document" |
UserID | "Jane" |
Limit | $10/mo |
Limit Tag | ex. "premium" |
Application | "Summarization Service" |
Use Case | "Document-Summary" |
All of these terms are further explained on their own pages.
Proxy and Ingest
There are two ways that Pay-i can process such an API call:
1. Pay-i as a Proxy
- An application sends the chat completion API call to Pay-i and Pay-i will forward the API call onto the Provider, then return the results.
- Pay-i will automatically calculate all utilization costs, even in cases when it is not returned by the API.
- Pay-i will also augment the returned results with an xproxy_result object that can be used to determine the exact cost of the call in real time and the status of any related limits.
- If you followed the quickstart, then Pay-i is configured as a proxy.
- Pay-i has minimal latency overhead (10's of ms) especially as compared to the execution cost of operating GenAI services (1000's of ms).
Note that Pay-i never stores any data from the API calls that it proxies, unless you opt-in. Any such data is encrypted at rest and in transit. Contact [email protected] for more information about our robust security posture.
2. Ingesting Metrics into Pay-i
- Your application sends the chat completion API call to the Provider directly and then reports the token counts and other information to Pay-i asynchronously.
We strongly recommend leveraging Pay-i as a proxy, since the service will automatically handle the complexity of accounting for the different unit types, latency and failure tracking, and other multi-model and multi-modal nuances with minimal latency overhead.
For simplicity, this documentation assumes that Pay-i is operating as a proxy. However, all of the same features are supported via Ingest, using the Ingest Requests workflow. Further, we've made ingesting data from Managed Resources extremely straightforward using the @ingest
Decorator.
Integration Guides
Pay-i provides detailed integration guides for both Proxy and Ingest:
- GenAI Provider Configuration - Detailed setup guides for OpenAI, Azure OpenAI, Anthropic, AWS Bedrock, and LangChain
- Decorators - Using Pay-i's Python SDK decorators for both Proxy and Ingest modes
Updated about 8 hours ago