Guides

Pay-i Proxy Configuration

Overview

This guide explains how to configure Pay-i to route your GenAI API calls through its proxy service. While the standard Pay-i Instrumentation is sufficient for most use cases, proxy configuration is required when you need to implement Block limits that prevent requests from being sent to providers when a budget is exceeded.

When to Use Proxy Configuration

Pay-i's proxy configuration should be used when:

  • You need to implement Block limits to prevent API calls when budgets are exceeded
  • You want real-time cost visibility within API responses
  • You need to enforce spending limits directly at the API call level

If you're only tracking usage, applying Allow limits, or adding business context to your calls, the standard Pay-i Instrumentation approach is recommended as it's simpler to implement and adds no latency to your requests.

How it Works

When configured to use Pay-i as a proxy:

  1. Your application sends API requests to Pay-i instead of directly to the provider
  2. Pay-i receives the request and checks if any applicable Block limits have been exceeded
  3. If limits are not exceeded, Pay-i forwards the request to the provider
  4. Pay-i receives the response from the provider, augments it with cost information, and returns it to your application
  5. If any Block limits are exceeded, Pay-i prevents the request from reaching the provider and returns an error response

This approach allows Pay-i to enforce spending constraints in real-time, before costs are incurred.

Configuring Proxy Mode

Setting up proxy mode involves two main steps:

  1. Initialize Pay-i instrumentation with proxy mode enabled
  2. Configure your provider client to route requests through Pay-i

1. Initialize Instrumentation with Proxy Mode

First, initialize Pay-i instrumentation with proxy mode explicitly enabled:

from payi.lib.instrument import payi_instrument

# Initialize Pay-i instrumentation with proxy mode
payi_instrument(config={"proxy": True})

2. Configure Provider Client

Next, configure your provider client to route requests through Pay-i instead of directly to the provider. Here's an example with OpenAI:

import os
from openai import OpenAI
from payi.lib.helpers import payi_openai_url

# Configure OpenAI client to use Pay-i as a proxy
client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url=payi_openai_url(),  # Uses Pay-i's URL instead of OpenAI's
    default_headers={"xProxy-api-key": os.getenv("PAYI_API_KEY")}  # Authenticate with Pay-i
)

For detailed configuration instructions for other providers, please see the provider-specific guides:

Each provider has unique requirements for authentication and configuration when using proxy mode.

LangChain Note

Important: LangChain integration with Pay-i uses a callback handler approach and does not support proxy configuration or Block limits. For LangChain integration, see the LangChain Configuration guide.

Helper Functions

Pay-i provides URL helper functions to simplify configuration:

Helper FunctionDescription
payi_openai_url()Generates the correct proxy URL for OpenAI
payi_azure_openai_url()Generates the correct proxy URL for Azure OpenAI
payi_anthropic_url()Generates the correct proxy URL for Anthropic
payi_aws_bedrock_url()Generates the correct proxy URL for AWS Bedrock

These helpers automatically use the PAYI_BASE_URL environment variable if set, or default to the standard Pay-i API endpoint.

Request and Response Differences

When using proxy configuration, requests and responses differ from standard instrumentation:

Request Headers

Requests must include special headers for the proxy to work correctly:

  • xProxy-api-key: Your Pay-i API key (required)
  • Provider-specific headers (described in the provider-specific guides)

You can still include any business context annotations in extra_headers like you would with standard instrumentation.

Response Format

The most significant difference is in the response format. With proxy routing, responses are augmented with an xproxy_result object that contains real-time information about the request:

{
  "xproxy_result": {
    "cost": 0.00342,
    "limits": [
      {
        "limit_id": "your-limit-id",
        "name": "Example Limit",
        "limit_type": "Allow",
        "current": 0.45,
        "max": 10.0,
        "remaining": 9.55
      }
    ]
  }
}

This object provides immediate visibility into:

  • The exact cost of the current API call
  • The status of any limits applied to the request
  • How much budget remains for each limit

Error Handling

When a Block limit is exceeded, the proxy returns an error response instead of forwarding the request to the provider:

{
  "error": {
    "message": "Limit exceeded: ProjectBudget",
    "type": "LimitExceeded",
    "code": 429,
    "limit_id": "your-limit-id"
  }
}

For details on handling proxy errors, see Handling Errors.

Important Considerations

Mixing Warning

IMPORTANT: Do not mix proxy configuration and standard instrumentation for the same API calls. Using both approaches for the same requests will cause double-counting. Configure your application consistently with one approach or the other for a given workflow.

Performance Impact

Proxy routing adds a small amount of latency (typically 10-50ms) to each request. While this is negligible compared to the typical latency of GenAI API calls (often 1000+ ms), it's something to be aware of in high-performance applications.

Related Resources