Terms and Glossary
Concepts to know!
Overview
Pay-i provides tools to track, control, and understand costs, performance issues, and failures arising from GenAI consumption. It automatically and accurately calculates all GenAI costs including for scenarios such as tool use, vision, and streaming, where calculation can often be quite difficult. This section describes the high-level concepts used by all Pay-i workflows. To help with contextualizing the concepts described below, we will use the following scenario:
Example Scenario
“Summarization Service” makes a chat completion API call to OpenAI's public SaaS endpoint, using the gpt-4o model, on behalf of a user, 'Jane', in order to summarize a document. This is one of multiple API calls required to summarize the document as part of the "Document Summary" feature. The expenses of the call accrue towards Jane's $10 monthly 'premium account' limits.
Pay-i Concept | Scenario Section |
---|---|
Provider | OpenAI public SaaS endpoint |
Category | OpenAI |
Resource | gpt-4o |
Request | The chat completion API call |
Request Tag | ex. "summarize_document" |
UserID | "Jane" |
Limit | $10/mo |
Limit Tag | ex. "premium" |
Application | "Summarization Service" |
Use Case | "Document-Summary" |
All of these terms are further explained on their own pages.
Value and ROI
While tracking costs is valuable, Pay-i goes beyond cost management to help you quantify the business value of your GenAI investments. Learn how Pay-i helps measure and maximize ROI in Value of Pay-i Instrumentation.
Operational Approaches
Pay-i offers flexible integration options to track and manage your GenAI usage. There are two primary operational approaches:
- Direct Provider Call with Telemetry: This is the standard and recommended approach. Your application communicates directly with the GenAI provider as usual. The Pay-i SDK integrates with your provider calls to capture telemetry (usage data, metrics) and submits the data to Pay-i via the Ingest API after the provider call completes. This ensures zero added latency to the provider request itself but means
Block
limits cannot be enforced in real-time (onlyAllow
limits for tracking are supported). - Proxy Routing: In this alternative mode, your application routes API calls through Pay-i, which then forwards them to the GenAI provider. Pay-i captures usage data "in-context" during the request flow, automatically calculating costs and token counts. This approach is necessary if you need real-time enforcement of
Block
limits, preventing calls that exceed limits to avoid overruns, but adds minimal latency. Configuration involves pointing your provider SDK client to Pay-i endpoints.
In both approaches, SDK tools like function decorators or custom headers can facilitate adding business context to the submitted telemetry.
For scenarios where automatic SDK telemetry submission is not used or applicable (like tracking custom resources, integrating non-Python applications, or submitting historical data), Pay-i provides the Manual Event Submission (Ingest API). This allows your application to explicitly call the Ingest API to send event data after the interaction has occurred.
For a more detailed technical explanation of these approaches, see the Operational Modes documentation.
Integration Guides
Pay-i provides detailed integration guides for both operational approaches:
- Auto-Instrumentation with GenAI Providers - Setting up automatic API tracking with minimal code changes
- Custom Instrumentation with Pay-i - Adding business context and annotations to your API calls
Updated 9 days ago