Advanced Concepts
Choose the simple pattern before using the APIs directly
We strongly recommend the simplicity of specifying limit ids when calling @track and track_context and inspecting the limit state in the resulting xproxy_result.
Advanced Concepts
This guide explains advanced concepts to manage usage limits and budgets in Pay-i using the Python SDK directly. Limits provide fine-grained cost control and budget enforcement for your AI applications. For a conceptual understanding of limits and their role in the Pay-i platform, please refer to the Limits Concepts page.
When to Use Limits
You'll work with limits in the Pay-i Python SDK for several important purposes:
- Budget Management: Enforce spending caps for AI model usage to prevent unexpected costs
- Cost Control: Implement tiered spending thresholds with notification alerts
- Resource Allocation: Distribute AI budgets across teams, projects, or client accounts
- Usage Guardrails: Prevent runaway costs from unexpected usage spikes or misconfigurations
- Compliance: Meet organizational or regulatory requirements for budget enforcement
Common Workflows
Working with the Pay-i limits API involves several common patterns for creating, monitoring, and enforcing spending limits. This section walks you through these workflows with practical code examples.
The examples below demonstrate how to:
- Create a budget limit with appropriate parameters
- Find existing limits by ID or name
- Monitor usage across multiple requests
- Check limit status and react to exceeded limits
- Reset limits at the end of billing periods
For demonstration purposes, these examples use payi.ingest.units() to manually record GenAI usage metrics. The ingest.units() method returns limit status information that your application must check to determine if limits have been exceeded. In typical applications, you would use the SDK's decorators or context managers for more streamlined usage tracking.
Note: These examples use the Python SDK's client objects (
PayiandAsyncPayi), which provide a resource-based interface to the Pay-i API. For details on client initialization and configuration, see the Pay-i Client Initialization guide.Note for returning users: If you've previously run through the examples in this guide, you may want to clean up any existing limits before proceeding. This will help prevent errors due to limits with the same name but different parameters, as limit creation is only idempotent when all parameters exactly match.
Creating a Monthly Budget Limit
When you need to establish a fixed spending cap for your AI usage, creating a monthly budget limit ensures you won't exceed your allocated budget. Note that limit creation is idempotent only when all parameters exactly match - if you attempt to create a limit with the same name but different parameters, the operation will fail with an error:
pip install payi dotenvfrom payi import Payi
from dotenv import load_dotenv
load_dotenv()
payi = Payi()
limit_id = "monthly_allow_budget"
# Create a monthly budget limit with "allow" type for use-case level tracking
response = payi.limits.create(
limit_name="Monthly Budget",
limit_id=limit_id,
max=100.0,
limit_type="allow",
threshold=0.80
)Real-World Example: Multi-Layered Budget Control System
Let's walk through a comprehensive example of setting up a tiered limit structure that provides granular cost control for an enterprise AI application. This example demonstrates a common pattern in production environments:
Budget tracking with non-blocking limits - These monitor usage against targets without disrupting service, allowing you to track progress toward goals and receive notifications
This mixed approach gives you both protection against runaway costs and visibility into usage patterns:
from payi import Payi
from datetime import datetime, timezone
from dotenv import load_dotenv
load_dotenv()
# Initialize the Pay-i client
payi = Payi()
# Step 1: Set up a multi-layered budget control system with different scopes
def setup_budget_control_system():
"""
Create a sophisticated budget control system with daily, monthly,
and model-specific limits to provide defense-in-depth cost control.
This uses a combination of:
- "allow" limits for tracking usage and goals without blocking requests
- "block" limits as safety caps against runaway costs
"""
# Dictionary to store limit IDs
stored_limit_ids = {}
# Helper function to ensure a limit exists with exact desired parameters
# If a limit with same name but different parameters exists, it deletes and recreates it
def ensure_limit_exists(name, id, max_value, limit_type, threshold_value):
# First check if limit with this name already exists
existing_limits = payi.limits.list(limit_name=name)
if existing_limits.items:
existing_limit = existing_limits.items[0]
existing_id = existing_limit.limit_id
# Check if existing limit has different important parameters
# If so, we need to delete and recreate it since these can't be updated
if (existing_limit.limit_type != limit_type or
existing_limit.threshold != threshold_value):
print(f"Found limit '{name}' with different immutable parameters.")
print(f" Existing: type={existing_limit.limit_type}, threshold={existing_limit.threshold*100:.0f}%")
print(f" Desired: type={limit_type}, threshold={threshold_value*100:.0f}%")
print(f" Deleting and recreating limit...")
# Delete the existing limit
payi.limits.delete(limit_id=existing_id)
print(f" Deleted limit: {name} (ID: {existing_id})")
# Create new limit with desired parameters
limit_response = payi.limits.create(
limit_name=name,
limit_id=id,
max=max_value,
limit_type=limit_type,
threshold=threshold_value
)
limit_obj = limit_response.limit
print(f" Created new limit: '{limit_obj.limit_name}' (${limit_obj.max:.2f}, {limit_obj.limit_type} type, {limit_obj.threshold * 100:.0f}% threshold)")
return limit_obj.limit_id
else:
# Limit type and threshold match, we only need to update the max value if it changed
if existing_limit.max != max_value:
print(f"Found limit '{name}' with matching immutable parameters but different max value.")
update_response = payi.limits.update(
limit_id=existing_id,
max=max_value
)
updated_limit = update_response.limit
print(f" Updated max value: ${existing_limit.max:.2f} → ${updated_limit.max:.2f}")
return updated_limit.limit_id
else:
print(f"Limit '{name}' already exists with identical parameters. Using existing limit.")
return existing_id
else:
# Limit doesn't exist, create a new one
limit_response = payi.limits.create(
limit_name=name,
limit_id=id,
max=max_value,
limit_type=limit_type,
threshold=threshold_value
)
limit_obj = limit_response.limit
print(f"Created new limit: '{limit_obj.limit_name}' (${limit_obj.max:.2f}, {limit_obj.limit_type} type, {limit_obj.threshold * 100:.0f}% threshold)")
return limit_obj.limit_id
# Create daily usage safety cap
daily_limit_id = ensure_limit_exists(
name="Daily Safety Cap",
id="daily",
max_value=50.0,
limit_type="allow",
threshold_value=0.90
)
stored_limit_ids["daily"] = daily_limit_id
# Create monthly budget tracker
monthly_limit_id = ensure_limit_exists(
name="Monthly Budget",
id="monthly",
max_value=1000.0,
limit_type="allow",
threshold_value=0.80
)
stored_limit_ids["monthly"] = monthly_limit_id
# Create premium model usage tracker
premium_limit_id = ensure_limit_exists(
name="Premium Model Usage",
id="premium",
max_value=300.0,
limit_type="allow",
threshold_value=0.83
)
stored_limit_ids["premium"] = premium_limit_id
return stored_limit_ids
# Step 2: Create a function that uses these limits in requests
def process_ai_request(prompt, model, user_id, limit_ids):
"""
Process an AI request using the specified model, applying all budget limits.
Returns both the AI response and budget status information.
"""
# Use typical token counts for this request
# In a real app, you might estimate based on the model and prompt
import random
input_tokens = random.randint(50, 200) # Simulate input size
output_tokens = random.randint(100, 150) # Simulate output size
# Apply a mix of blocking and non-blocking limits:
# - Daily safety cap (blocks when exceeded to prevent unexpected costs)
# - Monthly budget tracker (allows but notifies when exceeding budget targets)
applied_limits = [limit_ids["daily"], limit_ids["monthly"]]
# For premium models, also track against the premium model budget
# Define premium models with their correct categories
premium_model_categories = {
"claude-opus-4-8": "system.anthropic",
"gpt-4o": "system.openai",
"gemini-2.5-pro": "system.google.vertex"
}
# Check if the model is in our premium list
if model in premium_model_categories:
applied_limits.append(limit_ids["premium"])
category = premium_model_categories[model]
else:
# Default to OpenAI for this example
category = "system.openai"
print(f"Processing request with {len(applied_limits)} active limits for {category}/{model}...")
# Make the request with limits applied
response = payi.ingest.units(
category=category,
resource=model,
units={"text": {"input": input_tokens, "output": output_tokens}},
limit_ids=applied_limits, # Apply all relevant limits
user_id=user_id
)
# Check limit status and handle accordingly
# We'll collect status information for all limits in this list
# to return a comprehensive budget status to the calling code
budget_status = []
for limit_id in response.xproxy_result.limits:
# Retrieve the limit using its ID
limit_response = payi.limits.retrieve(limit_id=limit_id)
# Access the limit object from the response
limit = limit_response.limit
# Get current usage value from the totals.cost.total.base path
current_usage = limit.totals.cost.total.base
max_limit = limit.max
# Check if limit was exceeded
usage_ratio = current_usage / max_limit
if usage_ratio >= 1.0:
budget_status.append({
"limit_name": limit.limit_name,
"status": "EXCEEDED",
"current": current_usage,
"max": max_limit,
"percent": usage_ratio * 100 # Include percentage even for exceeded limits
})
else:
usage_pct = usage_ratio * 100
status = "WARNING" if usage_pct > 75 else "OK"
budget_status.append({
"limit_name": limit.limit_name,
"status": status,
"current": current_usage,
"max": max_limit,
"percent": usage_pct
})
return {
"success": not any(item["status"] == "EXCEEDED" for item in budget_status),
"budget_status": budget_status,
"response": response
}
# Step 3: Example usage in a real application
def main():
# Step 1: Set up budget control system once and store the IDs
# In a production app, you would do this setup once and store IDs in a database
limit_ids = setup_budget_control_system()
# Process requests with budget enforcement
user_request = "Write a comprehensive analysis of quantum computing applications in finance"
result = process_ai_request(
prompt=user_request,
model="claude-opus-4-8", # Premium model
user_id="user_abc123",
limit_ids=limit_ids
)
# Handle the result
if result["success"]:
print("Request processed successfully")
# Process the model's response
else:
print("Request blocked due to budget constraints")
print("Budget status:")
for limit in result["budget_status"]:
if limit["status"] == "EXCEEDED":
print(f" ❌ {limit['limit_name']}: ${limit['current']:.5f}/${limit['max']:.2f} ({limit['percent']:.1f}%) - LIMIT EXCEEDED")
if __name__ == "__main__":
main()Best Practices
When working with limits in the Pay-i Python SDK, consider these best practices:
-
Defense in Depth: Implement multiple layers of limits (daily, monthly, per-model) to prevent unexpected cost spikes.
-
Set Appropriate Thresholds: Configure notification thresholds (75-90% of the limit) to receive warnings before limits are reached.
-
Track with High Precision: Use 5 decimal places for usage costs (e.g.,
${usage:.5f}) and 2 decimal places for budget limits (e.g.,${limit:.2f}). This precision is crucial for GenAI micropayments, where even $0.00001 differences matter when accumulated over millions of requests. -
Use Properties for Organization: Apply consistent properties to limits to make them easier to find, filter, and manage.
-
Reset on Appropriate Cycles: Establish a clear reset schedule that aligns with your billing or budget cycles.
-
Monitor Regularly: Check limit status periodically to track usage patterns and adjust limits proactively.
-
Store dynamically created limit IDs: Always store limit IDs that Pay-i creates on behalf of your application. When possible It is recommended to follow a simpler approach and use deterministically known limit ids.
-
Graceful Degradation: When limits are reached, your agent should adjust accordingly (use a cheaper model, gracefully exit executing the use case, etc).