Pay-i Proxy Configuration
Overview
This guide explains how to configure Pay-i to route your GenAI API calls through its proxy service. While the standard Pay-i Instrumentation is sufficient for most use cases, proxy configuration is required when you need to implement Block
limits that prevent requests from being sent to providers when a budget is exceeded.
When to Use Proxy Configuration
Pay-i's proxy configuration should be used when:
- You need to implement
Block
limits to prevent API calls when budgets are exceeded - You want real-time cost visibility within API responses
- You need to enforce spending limits directly at the API call level
If you're only tracking usage, applying Allow
limits, or adding business context to your calls, the standard Pay-i Instrumentation approach is recommended as it's simpler to implement and adds no latency to your requests.
How it Works
When configured to use Pay-i as a proxy:
- Your application sends API requests to Pay-i instead of directly to the provider
- Pay-i receives the request and checks if any applicable
Block
limits have been exceeded - If limits are not exceeded, Pay-i forwards the request to the provider
- Pay-i receives the response from the provider, augments it with cost information, and returns it to your application
- If any
Block
limits are exceeded, Pay-i prevents the request from reaching the provider and returns an error response
This approach allows Pay-i to enforce spending constraints in real-time, before costs are incurred.
Configuring Proxy Mode
Setting up proxy mode involves two main steps:
- Initialize Pay-i instrumentation with proxy mode enabled
- Configure your provider client to route requests through Pay-i
1. Initialize Instrumentation with Proxy Mode
First, initialize Pay-i instrumentation with proxy mode explicitly enabled:
from payi.lib.instrument import payi_instrument
# Initialize Pay-i instrumentation with proxy mode
payi_instrument(config={"proxy": True})
2. Configure Provider Client
Next, configure your provider client to route requests through Pay-i instead of directly to the provider. Here's an example with OpenAI:
import os
from openai import OpenAI
from payi.lib.helpers import payi_openai_url
# Configure OpenAI client to use Pay-i as a proxy
client = OpenAI(
api_key=os.getenv("OPENAI_API_KEY"),
base_url=payi_openai_url(), # Uses Pay-i's URL instead of OpenAI's
default_headers={"xProxy-api-key": os.getenv("PAYI_API_KEY")} # Authenticate with Pay-i
)
For detailed configuration instructions for other providers, please see the provider-specific guides:
Each provider has unique requirements for authentication and configuration when using proxy mode.
LangChain Note
Important: LangChain integration with Pay-i uses a callback handler approach and does not support proxy configuration or
Block
limits. For LangChain integration, see the LangChain Configuration guide.
Helper Functions
Pay-i provides URL helper functions to simplify configuration:
Helper Function | Description |
---|---|
payi_openai_url() | Generates the correct proxy URL for OpenAI |
payi_azure_openai_url() | Generates the correct proxy URL for Azure OpenAI |
payi_anthropic_url() | Generates the correct proxy URL for Anthropic |
payi_aws_bedrock_url() | Generates the correct proxy URL for AWS Bedrock |
These helpers automatically use the PAYI_BASE_URL
environment variable if set, or default to the standard Pay-i API endpoint.
Request and Response Differences
When using proxy configuration, requests and responses differ from standard instrumentation:
Request Headers
Requests must include special headers for the proxy to work correctly:
xProxy-api-key
: Your Pay-i API key (required)- Provider-specific headers (described in the provider-specific guides)
You can still include any business context annotations in extra_headers
like you would with standard instrumentation.
Response Format
The most significant difference is in the response format. With proxy routing, responses are augmented with an xproxy_result
object that contains real-time information about the request:
{
"xproxy_result": {
"cost": 0.00342,
"limits": [
{
"limit_id": "your-limit-id",
"name": "Example Limit",
"limit_type": "Allow",
"current": 0.45,
"max": 10.0,
"remaining": 9.55
}
]
}
}
This object provides immediate visibility into:
- The exact cost of the current API call
- The status of any limits applied to the request
- How much budget remains for each limit
Error Handling
When a Block
limit is exceeded, the proxy returns an error response instead of forwarding the request to the provider:
{
"error": {
"message": "Limit exceeded: ProjectBudget",
"type": "LimitExceeded",
"code": 429,
"limit_id": "your-limit-id"
}
}
For details on handling proxy errors, see Handling Errors.
Important Considerations
Mixing Warning
IMPORTANT: Do not mix proxy configuration and standard instrumentation for the same API calls. Using both approaches for the same requests will cause double-counting. Configure your application consistently with one approach or the other for a given workflow.
Performance Impact
Proxy routing adds a small amount of latency (typically 10-50ms) to each request. While this is negligible compared to the typical latency of GenAI API calls (often 1000+ ms), it's something to be aware of in high-performance applications.
Related Resources
- Auto-Instrumentation - Standard instrumentation approach
- Operational Approaches - Advanced explanation of the underlying technical approaches
- Handling Successes - Working with the
xproxy_result
object - Handling Errors - Managing error responses in proxy mode
Updated 14 days ago