Overview

Pay-i provides tools to track, control, and understand costs, performance issues, and failures arising from GenAI consumption. It automatically and accurately calculates all GenAI costs including for scenarios such as tool use, vision, and streaming, where calculation can often be quite difficult. This section describes the high-level concepts used by all Pay-i workflows. To help with contextualizing the concepts described below, we will use the following scenario:

Example Scenario

“Summarization Service” makes a chat completion API call to OpenAI's public SaaS endpoint, using the gpt-4o model, on behalf of a user, 'Jane', in order to summarize a document. This is one of multiple API calls required to summarize the document as part of the "Document Summary" feature. The expenses of the call accrue towards Jane's $10 monthly 'premium account' limits.

Pay-i Concept	Scenario Section
Provider	OpenAI public SaaS endpoint
Category	OpenAI
Resource	gpt-4o
Request	The chat completion API call
Request Tag	ex. "summarize_document"
UserID	"Jane"
Limit	$10/mo
Limit Tag	ex. "premium"
Application	"Summarization Service"
Use Case	"Document-Summary"

All of these terms are further explained on their own pages.

Value and ROI

While tracking costs is valuable, Pay-i goes beyond cost management to help you quantify the business value of your GenAI investments. Learn how Pay-i helps measure and maximize ROI in Value of Pay-i Instrumentation.

Operational Approaches

Pay-i offers flexible integration options to track and manage your GenAI usage. There are two primary operational approaches:

Direct Provider Call with Telemetry: This is the standard and recommended approach. Your application communicates directly with the GenAI provider as usual. The Pay-i SDK integrates with your provider calls to capture telemetry (usage data, metrics) and submits the data to Pay-i via the Ingest API after the provider call completes. This ensures zero added latency to the provider request itself but means Block limits cannot be enforced in real-time (only Allow limits for tracking are supported).
Proxy Routing: In this alternative mode, your application routes API calls through Pay-i, which then forwards them to the GenAI provider. Pay-i captures usage data "in-context" during the request flow, automatically calculating costs and token counts. This approach is necessary if you need real-time enforcement of Block limits, preventing calls that exceed limits to avoid overruns, but adds minimal latency. Configuration involves pointing your provider SDK client to Pay-i endpoints.

In both approaches, SDK tools like function decorators or custom headers can facilitate adding business context to the submitted telemetry.

For scenarios where automatic SDK telemetry submission is not used or applicable (like tracking custom resources, integrating non-Python applications, or submitting historical data), Pay-i provides the Manual Event Submission (Ingest API). This allows your application to explicitly call the Ingest API to send event data after the interaction has occurred.

For a more detailed technical explanation of these approaches, see the Operational Modes documentation.

Integration Guides

Pay-i provides detailed integration guides for both operational approaches:

Auto-Instrumentation with GenAI Providers - Setting up automatic API tracking with minimal code changes
Custom Instrumentation with Pay-i - Adding business context and annotations to your API calls