Overview

This guide explains how to manage usage limits and budgets in Pay-i using the Python SDK. Limits provide fine-grained cost control and budget enforcement for your AI applications. For a conceptual understanding of limits and their role in the Pay-i platform, please refer to the Limits Concepts page.

When to Use Limits

You'll work with limits in the Pay-i Python SDK for several important purposes:

Budget Management: Enforce spending caps for AI model usage to prevent unexpected costs
Cost Control: Implement tiered spending thresholds with notification alerts
Resource Allocation: Distribute AI budgets across teams, projects, or client accounts
Usage Guardrails: Prevent runaway costs from unexpected usage spikes or misconfigurations
Compliance: Meet organizational or regulatory requirements for budget enforcement

Common Workflows

Working with the Pay-i limits API involves several common patterns for creating, monitoring, and enforcing spending limits. This section walks you through these workflows with practical code examples.

The examples below demonstrate how to:

Create a budget limit with appropriate parameters
Find existing limits by ID or name
Monitor usage across multiple requests
Check limit status and react to exceeded limits
Reset limits at the end of billing periods

For demonstration purposes, these examples use client.ingest.units() to manually record GenAI usage metrics. The ingest.units() method returns limit status information that your application must check to determine if limits have been exceeded. In typical applications, you would use the SDK's decorators or context managers for more streamlined usage tracking.

Note: These examples use the Python SDK's client objects (Payi and AsyncPayi), which provide a resource-based interface to the Pay-i API. For details on client initialization and configuration, see the Pay-i Client Initialization guide.

Note for returning users: If you've previously run through the examples in this guide, you may want to clean up any existing limits before proceeding. This will help prevent errors due to limits with the same name but different parameters, as limit creation is only idempotent when all parameters exactly match.

Creating a Monthly Budget Limit

When you need to establish a fixed spending cap for your AI usage, creating a monthly budget limit ensures you won't exceed your allocated budget. Note that limit creation is idempotent only when all parameters exactly match - if you attempt to create a limit with the same name but different parameters, the operation will fail with an error:

from payi import Payi

# Initialize the Pay-i client
client = Payi()  # API key will be loaded from PAYI_API_KEY environment variable

# Step 1: Create a monthly budget limit with "allow" type for use-case level tracking
response = client.limits.create(
    limit_name="Monthly Budget",
    max=100.0,           # $100 maximum monthly spend
    limit_type="allow",  # Allow requests even when limit is reached (just for monitoring)
    threshold=0.80       # Get notified at 80% of the limit (threshold must be between 0.75 and 0.99)
)

# Step 2: Store the limit ID for future reference
# You'll need this ID to check status, update, or reset the limit
limit = response.limit
stored_limit_id = limit.limit_id
print(f"Created '{limit.limit_name}' ${limit.max:.2f} {limit.limit_type} limit with {limit.threshold * 100:.0f}% threshold. ID: {stored_limit_id}")
# In a real application, you would store this ID in a database or configuration

# Note: If you run this code multiple times, it will succeed as long as all parameters match.
# The create operation is idempotent when all parameters are identical - it will return the existing limit.
# However, if you try to create a limit with the same name but different parameters (like changing
# limit_type from "allow" to "block"), you'll get a "limit_name_already_exist" error.

Expected output:

Created 'Monthly Budget' $100.00 allow limit with 80% threshold. ID: 2220

Finding Limits By ID or Name

Pay-i offers two distinct ways to access limits, depending on what information you have available:

# Method 1: When you know the limit ID (fastest, most direct)
# Use retrieve() to get a limit by its ID
limit_by_id = client.limits.retrieve(limit_id=stored_limit_id).limit
print(f"Retrieved limit: {limit_by_id.limit_name}")
print(f"Current usage: ${limit_by_id.totals.cost.total.base:.5f} of ${limit_by_id.max:.2f}")

# Method 2: When you know the name but not the ID
# Use list() with limit_name parameter to find by name
limit_by_name = client.limits.list(limit_name="Monthly Budget").items[0]
print(f"Found limit ID: {limit_by_name.limit_id}")
print(f"Current usage: ${limit_by_name.totals.cost.total.base:.5f}")

Expected output:

Retrieved limit: Monthly Budget
Current usage: $0.00000 of $100.00
Found limit ID: 2220
Current usage: $0.00000

Both methods provide access to the same limit properties, making it flexible to work with limits whether you have the ID or just the name.

Making Requests and Monitoring Usage

After creating a limit, use it to control spending across multiple GenAI requests and monitor accumulated usage:

# Continued from previous example where we created a "Monthly Budget" limit
# We already have the stored_limit_id from the previous step

# Step 3: Record some AI request usage data to accumulate usage (simulating GenAI API calls)
for i in range(3):
    request_prompt = f"Generate a creative story about a robot learning to paint (request {i+1}/3)"
    
    # Track each request against our budget limit
    ingest_response = client.ingest.units(
        category="system.openai",
        resource="gpt-4o-2024-05-13",  # Using specific model version
        units={"text": {"input": 100, "output": 150}},  # Approximately 250 tokens total
        limit_ids=[stored_limit_id],  # Apply our budget limit to this request
        user_id="user_123"  # Optional user attribution
    )
    
    print(f"Request {i+1}/3 processed")

# Step 4: Now let's check if we're approaching our limit after multiple requests
limit_response = client.limits.retrieve(limit_id=stored_limit_id)
limit = limit_response.limit

# Step 5: Calculate usage metrics for reporting
current_usage = limit.totals.cost.total.base
max_limit = limit.max
usage_percentage = (current_usage / max_limit) * 100

# Step 6: Generate human-readable status report
print(f"\nBudget Status: '{limit.limit_name}'")
print(f"Type: {limit.limit_type}, Threshold: {limit.threshold * 100:.0f}%")
print(f"Current usage: ${current_usage:.5f} of ${max_limit:.2f} ({usage_percentage:.1f}%)")

# Step 7: Implement alert logic based on usage percentage
if usage_percentage > 90:
    print("⚠️ CRITICAL: Budget nearly exhausted! Consider increasing limit or restricting usage.")
elif usage_percentage > 75:
    print("⚠️ WARNING: Budget usage high. Plan for potential limit increase if usage continues.")
elif usage_percentage > 50:
    print("ℹ️ NOTICE: Budget usage at midpoint.")
else:
    print("✅ Budget usage within expected range.")

# Step 8: Create a second, more restrictive limit for instance-specific budgeting
# This creates a small $1 limit that will be used to demonstrate limit enforcement
small_limit_response = client.limits.create(
    limit_name="Small Instance Limit",
    max=1.0,             # Just $1 maximum (will be exceeded easily)
    limit_type="block",  # Block requests when limit is reached
    threshold=0.80       # Get notified at 80% of the limit
)
small_limit = small_limit_response.limit
stored_small_limit_id = small_limit.limit_id
print(f"Created '{small_limit.limit_name}': ${small_limit.max:.2f} {small_limit.limit_type} limit with {small_limit.threshold * 100:.0f}% threshold. ID: {stored_small_limit_id}")

# Note: If you get a "limit_name_already_exist" error, it means this limit
# already exists with different parameters. See the cleanup section at the end of this guide.

# Step 9: For multiple limits, use list() to check all active limits at once
all_limits = client.limits.list()
print("\nAll active limits:")
for limit in all_limits:
    current_usage = limit.totals.cost.total.base
    max_limit = limit.max
    usage_pct = (current_usage / max_limit) * 100 if max_limit > 0 else 0
    print(f"  {limit.limit_name}: [Type: {limit.limit_type}, Threshold: {limit.threshold * 100:.0f}%]")
    print(f"    ${current_usage:.5f} of ${max_limit:.2f} ({usage_pct:.1f}%)")

Expected output:

Request 1/3 processed
Request 2/3 processed
Request 3/3 processed

Budget Status: 'Monthly Budget'
Type: allow, Threshold: 80%
Current usage: $0.00525 of $100.00 (0.0%)
✅ Budget usage within expected range.
Created 'Small Instance Limit': $1.00 block limit with 80% threshold. ID: 2221

All active limits:
  Monthly Budget: [Type: allow, Threshold: 80%]
    $0.00525 of $100.00 (0.0%)
  Small Instance Limit: [Type: block, Threshold: 80%]
    $0.00000 of $1.00 (0.0%)

Limit Enforcement

Now that we have set up both a large monitoring limit and a small enforcement limit, let's see how the system handles scenarios where limits are exceeded:

# Building on the previous examples, we now have:
# - stored_limit_id: our main $100 "allow" limit for monitoring
# - stored_small_limit_id: our $1 "block" limit for strict enforcement

import random

print("\nSimulating heavy GenAI usage to test limit enforcement...")

# We'll generate large token counts to quickly exceed our $1 limit
for i in range(5):  # Just a few requests should be enough
    # Use high token counts to accelerate limit consumption
    input_tokens = random.randint(50000, 100000)
    output_tokens = random.randint(5000, 15000)
    
    # Track this usage against both our limits
    ingest_response = client.ingest.units(
        category="system.openai",
        resource="gpt-4o-2024-05-13",
        units={"text": {"input": input_tokens, "output": output_tokens}},
        limit_ids=[stored_limit_id, stored_small_limit_id],
        user_id=f"test_user_{i}"
    )
    # Get the cost from the response
    request_cost = ingest_response.xproxy_result.cost.total.base
    print(f"Request {i+1} cost: ${request_cost:.5f}")
    
    # Check if any limits were exceeded by examining limit states
    overrun_limits = []
    for limit_id, limit_info in ingest_response.xproxy_result.limits.items():
        if limit_info.state == 'overrun':
            overrun_limits.append(limit_id)
    
    if overrun_limits:
        print(f"⚠️ ALERT: The following limits were exceeded: {overrun_limits}")
        
        # In a production application, you would take action here:
        # 1. Notify administrators or users about exceeded limits
        # 2. Temporarily disable certain features
        # 3. Switch to a lower-cost model
        # 4. Stop processing additional requests
        
        # Let's check the status of our small limit
        small_limit = client.limits.retrieve(limit_id=stored_small_limit_id).limit
        usage_ratio = small_limit.totals.cost.total.base / small_limit.max
        print(f"Small limit status: ${small_limit.totals.cost.total.base:.5f} of ${small_limit.max:.2f} ({usage_ratio * 100:.1f}%)")
        print(f"Exceeded by: ${(small_limit.totals.cost.total.base - small_limit.max):.5f}")
        
        # Once a limit is exceeded, you would typically stop processing
        print("Stopping further processing due to exceeded limits")
        break
    else:
        print(f"All limits within acceptable ranges")

# Let's examine the status of both limits
main_limit = client.limits.retrieve(limit_id=stored_limit_id).limit
small_limit = client.limits.retrieve(limit_id=stored_small_limit_id).limit

print("\nLimit Status Summary:")
main_usage_pct = (main_limit.totals.cost.total.base / main_limit.max) * 100
small_usage_pct = (small_limit.totals.cost.total.base / small_limit.max) * 100
print(f"Main $100 limit: ${main_limit.totals.cost.total.base:.5f} of ${main_limit.max:.2f} ({main_usage_pct:.1f}%)")
print(f"Small $1 limit: ${small_limit.totals.cost.total.base:.5f} of ${small_limit.max:.2f} ({small_usage_pct:.1f}%)")

print("\nThis multi-tiered approach demonstrates Pay-i's flexible limit system:")
print("- The larger 'allow' limit provides overall usage monitoring without disruption")
print("- The smaller 'block' limit provides strict cost control for specific contexts")
print("- By checking limit states in the response, applications can take appropriate actions")

Expected output:

Simulating heavy GenAI usage to test limit enforcement...
Request 1 cost: $0.26589
All limits within acceptable ranges
Request 2 cost: $0.19295
All limits within acceptable ranges
Request 3 cost: $0.29805
All limits within acceptable ranges
Request 4 cost: $0.22512
All limits within acceptable ranges
Request 5 cost: $0.30888
⚠️ ALERT: The following limits were exceeded: ['2221']
Small limit status: $1.29089 of $1.00 (129.1%)
Exceeded by: $0.29089
Stopping further processing due to exceeded limits

Limit Status Summary:
Main $100 limit: $1.29614 of $100.00 (1.3%)
Small $1 limit: $1.29089 of $1.00 (129.1%)

This multi-tiered approach demonstrates Pay-i's flexible limit system:
- The larger 'allow' limit provides overall usage monitoring without disruption
- The smaller 'block' limit provides strict cost control for specific contexts
- By checking limit states in the response, applications can take appropriate actions

Understanding Limit States

Building on our earlier examples where we checked for the 'overrun' state, let's explore all possible limit states in detail. When tracking usage against limits with client.ingest.units(), the response contains valuable state information for each limit. Understanding these states helps you implement sophisticated handling logic:

Important: A common misconception is that when a limit state is "exceeded" the requests are blocked. This is not correct. The "exceeded" state simply means spending has reached or passed the threshold but is still below or equal to the max value. Requests are still allowed in the "exceeded" state. Only the "blocked" state (for Block limits) actually prevents requests.

State	Description
`ok`	The limit has not been exceeded; usage is below the threshold. Mathematically: spend < max*threshold
`exceeded`	The limit is over its threshold but less than or equal to its maximum value. This is a warning state, NOT blocked. Mathematically: spend >= max*threshold AND spend <= max
`overrun`	The limit has been exceeded; usage is above the maximum value. Mathematically: spend > max. In ingest mode, this is the state you must check for to detect exceeded limits.
`blocked`	The limit was exceeded and directly caused the request to be blocked. This only occurs when a Block limit's spend has already gone over its max (spend > max) and an additional request is attempted. Appears in the `blocked_limit_ids` list in the response.
`blocked_external`	The limit was included in a request that was blocked, but this specific limit was not the cause of the blocking (the request was blocked due to other limits being exceeded).
`failed`	The limit check failed due to an error or exception.

Here's an example of how to check limit states and provide custom handling for each:

# Process a request with multiple limits applied
# Using the limits we created earlier: stored_limit_id (Monthly Budget) and stored_small_limit_id (Small Instance Limit)
ingest_response = client.ingest.units(
    category="system.openai",
    resource="gpt-4-0125-preview",
    units={"text": {"input": 500, "output": 300}},
    limit_ids=[stored_limit_id, stored_small_limit_id]
)

# Check if any limits were exceeded or approaching their threshold
for limit_id, limit_info in ingest_response.xproxy_result.limits.items():
    # Get full limit details
    limit = client.limits.retrieve(limit_id=limit_id).limit
    limit_name = limit.limit_name
    
    # Handle different limit states
    # Calculate usage percentage for consistent reporting
    usage_ratio = limit.totals.cost.total.base / limit.max
    usage_percentage = usage_ratio * 100
    
    if limit_info.state == 'overrun':
        print(f"⚠️ Limit '{limit_name}' has been exceeded")
        print(f"  Current usage: ${limit.totals.cost.total.base:.5f} of ${limit.max:.2f} ({usage_percentage:.1f}%)")
        # Take action: notify users, restrict access, etc.
        
    elif limit_info.state == 'exceeded':
        print(f"⚠️ Warning: Limit '{limit_name}' has exceeded its threshold")
        print(f"  Current usage: ${limit.totals.cost.total.base:.5f} of ${limit.max:.2f} ({usage_percentage:.1f}%)")
        # Take action: send alerts, prepare for potential limit increase, etc.
        
    elif limit_info.state == 'ok':
        # Limit is in good standing
        print(f"✅ Limit '{limit_name}' is at {usage_percentage:.1f}% usage")

Expected output:

✅ Limit 'Monthly Budget' is at 1.3% usage
⚠️ Limit 'Small Instance Limit' has been exceeded
  Current usage: $1.32389 of $1.00 (132.4%)

This approach lets you implement sophisticated usage policies with different behaviors for each limit state.

Updating an Existing Limit

After creating a limit, you may need to adjust its parameters as your usage patterns evolve. The update() method lets you modify an existing limit's properties:

# Get the limit's current details
limit_response = client.limits.retrieve(limit_id=stored_limit_id)
limit = limit_response.limit
print(f"Current limit: {limit.limit_name}, max=${limit.max:.2f}")

# Update the limit with a higher max value
updated_response = client.limits.update(
    limit_id=stored_limit_id,
    max=200.0  # Double the budget
)
updated_limit = updated_response.limit
print(f"Updated limit: {updated_limit.limit_name}, max=${updated_limit.max:.2f}")

# You can also update just the name
renamed_response = client.limits.update(
    limit_id=stored_limit_id,
    limit_name="Quarterly Budget"  # Rename the limit
)
renamed_limit = renamed_response.limit
print(f"Renamed limit: {renamed_limit.limit_name}, max=${renamed_limit.max:.2f}")

Important Note About Updates: The update() method only supports modifying two properties:

limit_name: You can rename a limit
max: You can change the maximum budget value

Properties like limit_type ("block" vs "allow") and threshold cannot be changed after a limit is created. If you need to change these immutable properties, you must create a new limit with the desired configuration and delete the old one.

Resetting a Periodic Budget Limit

When starting a new billing cycle or clearing accumulated usage, you can reset a limit back to zero:

# Continued from previous example

# Step 10: Reset limits at the end of billing periods
def reset_monthly_limits():
    # Get all limits tagged as "monthly"
    monthly_limits = []
    all_limits = client.limits.list()
    
    for limit in all_limits:
        # Check if limit has "monthly" tag (if you use tags to organize limits)
        if hasattr(limit, "limit_tags") and "monthly" in limit.limit_tags:
            monthly_limits.append(limit)
    
    # Step 11: Reset each monthly limit
    for limit in monthly_limits:
        current_usage = limit.totals.cost.total.base
        print(f"Resetting monthly limit: {limit.limit_name} (current usage: ${current_usage:.5f})")
        reset_result = client.limits.reset(limit_id=limit.limit_id)
        new_usage = reset_result.limit.totals.cost.total.base
        print(f"  ✅ Reset complete. New usage: ${new_usage:.5f}")

# Step 12: Reset our monthly budget limit directly
# Retrieve current limit status to confirm what we're resetting, using the stored limit ID from the previous example
limit = client.limits.retrieve(limit_id=stored_limit_id).limit
current_usage = limit.totals.cost.total.base
max_limit = limit.max
print(f"Preparing to reset limit '{limit.limit_name}'")
print(f"Type: {limit.limit_type}, Threshold: {limit.threshold * 100:.0f}%")
print(f"Current usage: ${current_usage:.5f} of ${max_limit:.2f}")

# Step 13: Perform the reset
reset_result = client.limits.reset(limit_id=stored_limit_id)
reset_limit_history = reset_result.limit_history
print(f"Limit '{reset_limit_history.limit_name}' has been reset")

# To get the current usage, we need to retrieve the limit again
updated_limit = client.limits.retrieve(limit_id=stored_limit_id).limit
new_usage = updated_limit.totals.cost.total.base
print(f"New current value: ${new_usage:.5f}")  # Should be 0

Expected output:

Preparing to reset limit 'Monthly Budget'
Type: allow, Threshold: 80%
Current usage: $1.32914 of $100.00
Limit 'Monthly Budget' has been reset
New current value: $0.00000

Cleaning Up Limits

Important: If you've run through the previous examples, you should run the cleanup code below before proceeding to the Real-World Example in the next section. This will prevent conflicts from having limits with the same names but different parameters, as the examples use similar limit names.

When experimenting with limits or running the examples in this guide multiple times, you may need to delete created limits. Here's how to safely delete limits:

# Cleanup: Find and delete limits created during development
from payi import Payi
client = Payi()

# Option 1: Delete limits by name
def delete_limit_by_name(name):
    limits = client.limits.list(limit_name=name)
    if limits.items:
        limit = limits.items[0]
        print(f"Deleting limit: {limit.limit_name} (ID: {limit.limit_id})")
        client.limits.delete(limit_id=limit.limit_id)
        print("Limit deleted successfully")
    else:
        print(f"No limit found with name: {name}")

# Delete the limits we created in this guide
delete_limit_by_name("Monthly Budget")
delete_limit_by_name("Small Instance Limit")

# Option 2: Delete multiple limits with specific tags
def delete_limits_with_tag(tag):
    count = 0
    for limit in client.limits.list():
        if hasattr(limit, "limit_tags") and tag in limit.limit_tags:
            print(f"Deleting limit: {limit.limit_name} (ID: {limit.limit_id})")
            client.limits.delete(limit_id=limit.limit_id)
            count += 1
    print(f"Deleted {count} limits with tag: {tag}")

# Example: Delete all development or test limits
delete_limits_with_tag("dev")
delete_limits_with_tag("test")

For safety reasons, the SDK does not provide a method to delete all limits at once. This design choice helps prevent accidental deletion of important production limits.

Real-World Example: Multi-Layered Budget Control System

Let's walk through a comprehensive example of setting up a tiered limit structure that provides granular cost control for an enterprise AI application. This example demonstrates a common pattern in production environments:

Safety caps with blocking limits - These prevent unexpected overspending or abuse by hard-blocking requests when thresholds are exceeded
Budget tracking with non-blocking limits - These monitor usage against targets without disrupting service, allowing you to track progress toward goals and receive notifications

This mixed approach gives you both protection against runaway costs and visibility into usage patterns:

from payi import Payi
from datetime import datetime, timezone

# Initialize the Pay-i client
client = Payi()

# Step 1: Set up a multi-layered budget control system with different scopes
def setup_budget_control_system():
    """
    Create a sophisticated budget control system with daily, monthly,
    and model-specific limits to provide defense-in-depth cost control.
    
    This uses a combination of:
    - "allow" limits for tracking usage and goals without blocking requests
    - "block" limits as safety caps against runaway costs
    """
    # Dictionary to store limit IDs
    stored_limit_ids = {}
    
    # Helper function to ensure a limit exists with exact desired parameters
    # If a limit with same name but different parameters exists, it deletes and recreates it
    def ensure_limit_exists(name, max_value, limit_type, tags, threshold_value):
        # First check if limit with this name already exists
        existing_limits = client.limits.list(limit_name=name)
        
        if existing_limits.items:
            existing_limit = existing_limits.items[0]
            existing_id = existing_limit.limit_id
            
            # Check if existing limit has different important parameters
            # If so, we need to delete and recreate it since these can't be updated
            if (existing_limit.limit_type != limit_type or
                existing_limit.threshold != threshold_value):
                print(f"Found limit '{name}' with different immutable parameters.")
                print(f"  Existing: type={existing_limit.limit_type}, threshold={existing_limit.threshold*100:.0f}%")
                print(f"  Desired: type={limit_type}, threshold={threshold_value*100:.0f}%")
                print(f"  Deleting and recreating limit...")
                
                # Delete the existing limit
                client.limits.delete(limit_id=existing_id)
                print(f"  Deleted limit: {name} (ID: {existing_id})")
                
                # Create new limit with desired parameters
                limit_response = client.limits.create(
                    limit_name=name,
                    max=max_value,
                    limit_type=limit_type,
                    limit_tags=tags,
                    threshold=threshold_value
                )
                limit_obj = limit_response.limit
                print(f"  Created new limit: '{limit_obj.limit_name}' (${limit_obj.max:.2f}, {limit_obj.limit_type} type, {limit_obj.threshold * 100:.0f}% threshold)")
                return limit_obj.limit_id
            
            else:
                # Limit type and threshold match, we only need to update the max value if it changed
                if existing_limit.max != max_value:
                    print(f"Found limit '{name}' with matching immutable parameters but different max value.")
                    update_response = client.limits.update(
                        limit_id=existing_id,
                        max=max_value
                    )
                    updated_limit = update_response.limit
                    print(f"  Updated max value: ${existing_limit.max:.2f} → ${updated_limit.max:.2f}")
                    return updated_limit.limit_id
                else:
                    print(f"Limit '{name}' already exists with identical parameters. Using existing limit.")
                    return existing_id
        
        else:
            # Limit doesn't exist, create a new one
            limit_response = client.limits.create(
                limit_name=name,
                max=max_value,
                limit_type=limit_type,
                limit_tags=tags,
                threshold=threshold_value
            )
            limit_obj = limit_response.limit
            print(f"Created new limit: '{limit_obj.limit_name}' (${limit_obj.max:.2f}, {limit_obj.limit_type} type, {limit_obj.threshold * 100:.0f}% threshold)")
            return limit_obj.limit_id
    
    # Create daily usage safety cap
    daily_limit_id = ensure_limit_exists(
        name="Daily Safety Cap",
        max_value=50.0,                                  # $50 per day
        limit_type="block",                              # Automatically block requests when exceeded
        tags=["daily", "safety", "operational"],
        threshold_value=0.90                             # Alert at 90% usage
    )
    stored_limit_ids["daily"] = daily_limit_id
    
    # Create monthly budget tracker
    monthly_limit_id = ensure_limit_exists(
        name="Monthly Budget",
        max_value=1000.0,                                # $1000 per month
        limit_type="allow",                              # Allows requests even when limit is reached
        tags=["monthly", "budget", "finance"],
        threshold_value=0.80                             # Alert at 80% usage
    )
    stored_limit_ids["monthly"] = monthly_limit_id
    
    # Create premium model usage tracker
    premium_limit_id = ensure_limit_exists(
        name="Premium Model Usage",
        max_value=300.0,                                 # $300 target for premium models
        limit_type="allow",                              # Allows requests even when exceeded
        tags=["model-specific", "premium", "tracking"],
        threshold_value=0.83                             # Alert at 83% usage
    )
    stored_limit_ids["premium"] = premium_limit_id
    
    return stored_limit_ids

# Step 2: Create a function that uses these limits in requests
def process_ai_request(prompt, model, user_id, limit_ids):
    """
    Process an AI request using the specified model, applying all budget limits.
    Returns both the AI response and budget status information.
    """
    # Use typical token counts for this request
    # In a real app, you might estimate based on the model and prompt
    import random
    input_tokens = random.randint(50, 200)  # Simulate input size
    output_tokens = random.randint(100, 150)  # Simulate output size
    
    # Apply a mix of blocking and non-blocking limits:
    # - Daily safety cap (blocks when exceeded to prevent unexpected costs)
    # - Monthly budget tracker (allows but notifies when exceeding budget targets)
    applied_limits = [limit_ids["daily"], limit_ids["monthly"]]
    
    # For premium models, also track against the premium model budget
    # Define premium models with their correct categories
    premium_model_categories = {
        "claude-3-opus-20240229": "system.anthropic",
        "gpt-4o": "system.openai",
        "gemini-1.5-pro": "system.google.vertex"
    }
    
    # Check if the model is in our premium list
    if model in premium_model_categories:
        applied_limits.append(limit_ids["premium"])
        category = premium_model_categories[model]
    else:
        # Default to OpenAI for this example
        category = "system.openai"
    
    print(f"Processing request with {len(applied_limits)} active limits for {category}/{model}...")
    
    # Make the request with limits applied
    response = client.ingest.units(
        category=category,
        resource=model,
        units={"text": {"input": input_tokens, "output": output_tokens}},
        limit_ids=applied_limits,  # Apply all relevant limits
        user_id=user_id
    )
    
    # Check limit status and handle accordingly
    # We'll collect status information for all limits in this list
    # to return a comprehensive budget status to the calling code
    budget_status = []
    for limit_id in response.xproxy_result.limits:
        # Retrieve the limit using its ID
        limit_response = client.limits.retrieve(limit_id=limit_id)
        
        # Access the limit object from the response
        limit = limit_response.limit
        
        # Get current usage value from the totals.cost.total.base path
        current_usage = limit.totals.cost.total.base
        max_limit = limit.max
        
        # Check if limit was exceeded
        usage_ratio = current_usage / max_limit
        if usage_ratio >= 1.0:
            budget_status.append({
                "limit_name": limit.limit_name,
                "status": "EXCEEDED",
                "current": current_usage,
                "max": max_limit,
                "percent": usage_ratio * 100  # Include percentage even for exceeded limits
            })
        else:
            usage_pct = usage_ratio * 100
            status = "WARNING" if usage_pct > 75 else "OK"
            budget_status.append({
                "limit_name": limit.limit_name,
                "status": status,
                "current": current_usage,
                "max": max_limit,
                "percent": usage_pct
            })
    
    return {
        "success": not any(item["status"] == "EXCEEDED" for item in budget_status),
        "budget_status": budget_status,
        "response": response
    }

# Step 3: Example usage in a real application
def main():
    # Step 1: Set up budget control system once and store the IDs
    # In a production app, you would do this setup once and store IDs in a database
    limit_ids = setup_budget_control_system()
    
    # Process requests with budget enforcement
    user_request = "Write a comprehensive analysis of quantum computing applications in finance"
    
    result = process_ai_request(
        prompt=user_request,
        model="claude-3-opus-20240229",  # Premium model
        user_id="user_abc123",
        limit_ids=limit_ids
    )
    
    # Handle the result
    if result["success"]:
        print("Request processed successfully")
        # Process the model's response
    else:
        print("Request blocked due to budget constraints")
        print("Budget status:")
        for limit in result["budget_status"]:
            if limit["status"] == "EXCEEDED":
                print(f"  ❌ {limit['limit_name']}: ${limit['current']:.5f}/${limit['max']:.2f} ({limit['percent']:.1f}%) - LIMIT EXCEEDED")

if __name__ == "__main__":
    main()

Pagination in List Methods

When using the list() method, the SDK automatically handles pagination for you:

# Automatically iterates through all pages of results
for limit in client.limits.list():
    print(f"Limit: {limit.limit_name}")

You don't need to manually handle pagination or worry about cursor management - the SDK takes care of making multiple API calls as needed when you iterate through the results. This makes it easy to work with large collections of limits without worrying about pagination implementation details.

For advanced usage, including cursor and sorting parameters, see the Pagination Parameters documentation.

Best Practices

When working with limits in the Pay-i Python SDK, consider these best practices:

Defense in Depth: Implement multiple layers of limits (daily, monthly, per-model) to prevent unexpected cost spikes.
Set Appropriate Thresholds: Configure notification thresholds (75-90% of the limit) to receive warnings before limits are reached.
Track with High Precision: Use 5 decimal places for usage costs (e.g., ${usage:.5f}) and 2 decimal places for budget limits (e.g., ${limit:.2f}). This precision is crucial for GenAI micropayments, where even $0.00001 differences matter when accumulated over millions of requests.
Use Tags for Organization: Apply consistent tags to limits to make them easier to find, filter, and manage.
Reset on Appropriate Cycles: Establish a clear reset schedule that aligns with your billing or budget cycles.
Monitor Regularly: Check limit status periodically to track usage patterns and adjust limits proactively.
Store Limit IDs: Always store limit IDs after creation. While the create operation is idempotent with identical parameters, attempting to recreate with different parameters will fail. Storing IDs ensures reliable access to your limits.
Understand Idempotency: The create operation only succeeds for the same name if all parameters are identical. This allows you to safely run initial setup code multiple times, but don't rely on it for retrieving limits by name with partial parameters.
Graceful Degradation: When possible, implement fallback strategies when premium model limits are reached (e.g., switch to a less expensive model).
Keep Historical Data: Before resetting limits, consider logging historical usage for reporting and forecasting.

API Reference

For detailed information on all the methods, parameters, and response types provided by the client.limits resource, please refer to the Python SDK Limits API Reference.

The reference documentation includes:

Complete method signatures with all parameters
Return type structures for all response types
Detailed explanations of parameter behavior
REST API endpoint mappings
Examples for each method

This separate reference guide complements the workflow examples provided in this document, offering a more technical and comprehensive view of the limits API.