Guides

Pay-i Proxy Configuration

Overview

This guide explains how to configure Pay-i to route your GenAI API calls through its proxy service. While the standard Pay-i Instrumentation is sufficient for most use cases, proxy configuration is required when you need to implement Block limits that prevent requests from being sent to providers when a budget is exceeded.

When to Use Proxy Configuration

Pay-i's proxy configuration should be used when:

  • You need to implement Block limits to prevent API calls when budgets are exceeded
  • You want real-time cost visibility within API responses
  • You need to enforce spending limits directly at the API call level

If you're only tracking usage, applying Allow limits, or adding business context to your calls, the standard Pay-i Instrumentation approach is recommended as it's simpler to implement and adds no latency to your requests.

How it Works

When configured to use Pay-i as a proxy:

  1. Your application sends API requests to Pay-i instead of directly to the provider
  2. Pay-i receives the request and checks if any applicable Block limits are in "overrun" or "blocked" state
  3. If limits are in "ok" or "exceeded" state (spend <= max), Pay-i forwards the request to the provider
  4. Pay-i receives the response from the provider, augments it with cost information, and returns it to your application
  5. If any Block limits are in "overrun" or "blocked" state (spend > max), Pay-i prevents the request from reaching the provider and returns an error response

Important: A common misconception is that the "exceeded" state means requests are blocked. This is incorrect. The "exceeded" state only indicates that spending has reached the threshold but is still under or equal to the max value. For Block limits, only the "overrun" state (when spend > max) or "blocked" state (subsequent requests after hitting overrun) will prevent requests from reaching the provider.

This approach allows Pay-i to enforce spending constraints in real-time, before costs are incurred.

Configuring Proxy Mode

Setting up proxy mode involves two main steps:

  1. Initialize Pay-i instrumentation with proxy mode enabled
  2. Configure your provider client to route requests through Pay-i

1. Initialize Instrumentation with Proxy Mode

First, initialize Pay-i instrumentation with proxy mode explicitly enabled:

from payi.lib.instrument import payi_instrument

# Initialize Pay-i instrumentation with proxy mode
payi_instrument(config={"proxy": True})

2. Configure Provider Client

Next, configure your provider client to route requests through Pay-i instead of directly to the provider. Here's an example with OpenAI:

import os
from openai import OpenAI
from payi.lib.helpers import payi_openai_url

# Configure OpenAI client to use Pay-i as a proxy
client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url=payi_openai_url(),  # Uses Pay-i's URL instead of OpenAI's
    default_headers={"xProxy-api-key": os.getenv("PAYI_API_KEY")}  # Authenticate with Pay-i
)

For detailed configuration instructions for other providers, please see the provider-specific guides:

Each provider has unique requirements for authentication and configuration when using proxy mode.

LangChain Note

Important: LangChain integration with Pay-i uses a callback handler approach and does not support proxy configuration or Block limits. For LangChain integration, see the LangChain Configuration guide.

Helper Functions

Pay-i provides URL helper functions to simplify configuration:

Helper FunctionDescription
payi_openai_url()Generates the correct proxy URL for OpenAI
payi_azure_openai_url()Generates the correct proxy URL for Azure OpenAI
payi_anthropic_url()Generates the correct proxy URL for Anthropic
payi_aws_bedrock_url()Generates the correct proxy URL for AWS Bedrock

These helpers automatically use the PAYI_BASE_URL environment variable if set, or default to the standard Pay-i API endpoint.

Request and Response Differences

When using proxy configuration, requests and responses differ from standard instrumentation:

Request Headers

Requests must include special headers for the proxy to work correctly:

  • xProxy-api-key: Your Pay-i API key (required)
  • Provider-specific headers (described in the provider-specific guides)

You can still include any business context annotations in extra_headers like you would with standard instrumentation.

Response Format

The most significant difference is in the response format. With proxy routing, responses are augmented with an xproxy_result object that contains real-time information about the request:

{
  "xproxy_result": {
    "cost": 0.00342,
    "limits": [
      {
        "limit_id": "your-limit-id",
        "name": "Example Limit",
        "limit_type": "Allow",
        "current": 0.45,
        "max": 10.0,
        "remaining": 9.55
      }
    ]
  }
}

This object provides immediate visibility into:

  • The exact cost of the current API call
  • The status of any limits applied to the request
  • How much budget remains for each limit

Error Handling

When a Block limit is in "overrun" or "blocked" state (spending has gone over the max value), the proxy returns an error response instead of forwarding the request to the provider:

{
  "error": {
    "message": "Limit exceeded: ProjectBudget",
    "type": "LimitExceeded",
    "code": 429,
    "limit_id": "your-limit-id"
  }
}

For details on handling proxy errors, see Handling Errors.

Important Considerations

Mixing Warning

IMPORTANT: Do not mix proxy configuration and standard instrumentation for the same API calls. Using both approaches for the same requests will cause double-counting. Configure your application consistently with one approach or the other for a given workflow.

Performance Impact

Proxy routing adds a small amount of latency (typically 10-50ms) to each request. While this is negligible compared to the typical latency of GenAI API calls (often 1000+ ms), it's something to be aware of in high-performance applications.

Related Resources