Guides

Day 4: Applying Limits with track_context

📋 Overview

Welcome to Day 4 of our 5-day learning path! In Day 3, you learned how to assign specific Use Cases to your GenAI calls using the @track(use_case_name) decorator. Today, we'll take another step forward by exploring how to apply spending limits to control your GenAI costs.

Today's Goal: Learn how to apply existing Pay-i spending limits to your GenAI functions using the track_context function.

Why Apply Limits?

Applying spending limits to your GenAI calls provides several important benefits:

  • Cost control: Prevent unexpected or runaway expenditure
  • Budget management: Allocate specific budgets to different features or users
  • Risk mitigation: Protect against potential abuse or bugs
  • Departmental budgeting: Assign spending allowances to different teams or departments
  • Client billing: Enforce spending caps for clients on specific pricing tiers

Recap: Annotations So Far

In previous days, we've covered:

  • Day 1: Basic auto-instrumentation with payi_instrument()
  • Day 2: Adding user context with track_context(user_id="...")
  • Day 3: Assigning Use Cases with @track(use_case_name="...")

Today, we'll extend our knowledge by focusing on the limit_ids parameter used with the track_context function.

🔍 Prerequisites: Creating Limits

Before you can apply limits to your code, those limits must first be created in Pay-i. You can create limits in two ways:

  1. Via the Pay-i Portal: Create and configure limits in the web interface
  2. Via the SDK: Create limits programmatically using client.limits.create()

For this example, let's create two limits using the Python SDK:

import os
from payi import Payi # Use the explicit class name

# Initialize Pay-i client (automatically loads PAYI_API_KEY env var)
payi_client = Payi()

# Create a department-level limit for the marketing team
marketing_limit = payi_client.limits.create(
    limit_name="Marketing Department Budget",
    max=50.0,  # $50 budget
    limit_type="block"  # Block requests when the limit is reached
)
marketing_limit_id = marketing_limit.limit.limit_id  # Save this ID for later use

# Create a per-user budget limit
user_limit = payi_client.limits.create(
    limit_name="Premium User Budget",
    max=10.0,  # $10 per user
    limit_type="allow",  # Log but allow requests that exceed the limit
    threshold=0.9  # Alert at 90% usage
)
user_limit_id = user_limit.limit.limit_id  # Save this ID for later use

print(f"Marketing limit ID: {marketing_limit_id}")
print(f"User limit ID: {user_limit_id}")

In a real application, rather than creating limits every time, you would typically create them once (either via the SDK or the Portal). Then, during application startup, you could fetch all relevant limits and store their IDs for later use. Here's how you might populate a dictionary mapping limit names to IDs:

# Example: Populating a store with all existing limit IDs at startup
limit_ids_store = {}
all_limits = payi_client.limits.list()
for limit in all_limits:
    limit_ids_store[limit.limit_name] = limit.limit_id

print(f"Fetched Limit IDs: {limit_ids_store}") 
# This store now contains IDs for limits created earlier and any others.

The subsequent examples will assume such a limit_ids_store dictionary has been populated and retrieve the necessary IDs from it.

The limit_id values returned by create() or retrieved from list() are unique string identifiers that you'll use in your code to apply these limits to your GenAI calls. The limit_ids_store pattern shown above is a common way to manage these IDs for dynamic application.

Note: Once created, you can also find these limit IDs in the Pay-i Portal by navigating to the Limits section and viewing the details of each limit.

🔑 Core Concept: Applying Limits with track_context

The primary way to apply spending limits to your GenAI calls is by using the limit_ids parameter within the track_context function. This allows you to dynamically apply limits based on runtime information.

The process involves:

  1. Retrieving the unique ID of a pre-existing limit (created via the Portal or SDK). This is typically done by looking up the limit by its name from a store populated at application startup (see Prerequisites).
  2. Passing this retrieved ID (or a list of IDs) to the limit_ids parameter of track_context.

Here's the core syntax:

from payi.lib.instrument import payi_instrument, track_context
from openai import OpenAI

# Initialize Pay-i (required before using track_context)
payi_instrument()

# Initialize OpenAI client
client = OpenAI()

def generate_content_with_dynamic_limit(prompt, user_id, target_limit_name):
    # Retrieve the limit ID dynamically from the store
    limit_id_to_apply = limit_ids_store.get(target_limit_name)
    
    # Prepare the limit_ids parameter: pass the ID if found, otherwise None
    # Passing None ensures parent context limits are inherited if no specific limit is found
    limit_ids_param = [limit_id_to_apply] if limit_id_to_apply else None
    
    # Apply the limit ID (and user ID) using the context manager
    with track_context(user_id=user_id, limit_ids=limit_ids_param):
        # GenAI calls made within this block will use the dynamic limit ID (if found)
        response = client.chat.completions.create(
            model="gpt-3.5-turbo", # Or any other model
            messages=[{"role": "user", "content": prompt}]
        )
        # Full implementation details are shown in Practical Example 1
    
    return response.choices[0].message.content

result = generate_content_with_dynamic_limit(
    prompt="Explain dynamic limits.",
    user_id="user_789",
    target_limit_name="Premium User Budget" # Or another limit name
)
print(result)

This approach is ideal because the specific limit to apply often depends on runtime variables (like user identity, user tier, request type, etc.).

💻 Practical Examples

Example 1: Applying a Dynamic User Limit with track_context

Here's how to apply different user limits based on tier:

import os
from openai import OpenAI

# Configure OpenAI client
client = OpenAI()  # Uses OPENAI_API_KEY environment variable

# (Assumes 'limit_ids_store' was populated as shown in Prerequisites)
# We'll retrieve the specific limit ID inside the function based on tier

# Apply user-specific limits and select model based on tier
def generate_content_by_tier(prompt, user_id, user_tier):
    # Determine model, system prompt, and limit name based on user tier
    if user_tier == "premium":
        # Premium users get access to more capable models with optional self-set spending caps
        target_limit_name = f"Premium-{user_id}-Budget" # User-specific self-defined budget cap
        model_to_use = "gpt-4o"
        system_prompt = "You are a premium assistant providing detailed insights."
    elif user_tier == "trial":
        # Trial users have mandatory spending limits applied by the system
        target_limit_name = "Trial User Budget" # Standard limit applied to all trial users
        model_to_use = "gpt-3.5-turbo"
        system_prompt = "You are a helpful assistant."
    else: # Standard tier
        # Standard users have high fixed spendin cap. If they want to customize that, they need to upgrade to Premium
        target_limit_name = "Standard Tier Budget"
        model_to_use = "gpt-4"
        system_prompt = "You are a helpful assistant."
        
    # Retrieve the appropriate limit ID from the store
    limit_id_to_apply = limit_ids_store.get(target_limit_name)
    
    # Prepare the limit_ids parameter
    limit_ids_param = [limit_id_to_apply] if limit_id_to_apply else None
    
    # Apply user ID and the tier-specific limit ID
    with track_context(user_id=user_id, limit_ids=limit_ids_param):
        response = client.chat.completions.create(
            model=model_to_use,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": prompt}
            ]
        )
    return response.choices[0].message.content

# Generate content for a premium user (GPT-4o with optional self-set spending cap)
premium_content = generate_content_by_tier(
    prompt="Write a detailed analysis of recent AI developments",
    user_id="user_premium_123",
    user_tier="premium"
)
print(f"Premium content (GPT-4o): {premium_content[:50]}...")

# Generate content for a standard user (GPT-4 with tier-fixed spending cap)
standard_content = generate_content_by_tier(
    prompt="Explain the significance of recent AI developments",
    user_id="user_standard_456",
    user_tier="standard"
)
print(f"Standard content (GPT-4): {standard_content[:50]}...")

# Generate content for a trial user (GPT-3.5-turbo with mandatory spending limit)
trial_content = generate_content_by_tier(
    prompt="Summarize the main points about AI developments",
    user_id="user_trial_789",
    user_tier="trial"
)
print(f"Trial content (GPT-3.5-turbo): {trial_content[:50]}...")

This approach determines the appropriate limit name ("Premium User Budget" or "Basic User Tier Limit") based on the user_tier, retrieves the corresponding ID from the limit_ids_store, and applies it dynamically using track_context. It also selects a different model and system prompt based on the tier. This tracks each user's spending against their tier-specific limit.

Example 2: Combining Use Cases, User IDs, and Multiple Limits

You can combine @track for use cases with track_context for user IDs and multiple dynamic limits:

import os
from openai import OpenAI
from payi.lib.instrument import track, track_context

# Configure OpenAI client
client = OpenAI()  # Uses OPENAI_API_KEY environment variable

# Static annotation for use case only
@track(
    use_case_name="customer_support"  # Associate with the customer support use case
)
def generate_support_response(ticket_id, question, user_id, department, user_tier):
    # Dynamic tracking for user ID and limit IDs
    # Only retrieve relevant limits based on actual conditions
    limit_ids_to_apply = []
    
    # Apply different department limits based on which department the request is for
    if department == "Marketing":
        marketing_limit_id = limit_ids_store.get("Marketing Department Budget")
        if marketing_limit_id:
            limit_ids_to_apply.append(marketing_limit_id)
    elif department == "Sales":
        sales_limit_id = limit_ids_store.get("Sales Team Budget")
        if sales_limit_id:
            limit_ids_to_apply.append(sales_limit_id)
    elif department == "Engineering":
        eng_limit_id = limit_ids_store.get("Engineering Department Budget")
        if eng_limit_id:
            limit_ids_to_apply.append(eng_limit_id)
    elif department == "Support":
        # Support department might have a higher limit for better customer service
        support_limit_id = limit_ids_store.get("Customer Support Budget")
        if support_limit_id:
            limit_ids_to_apply.append(support_limit_id)
    
    # Only get trial user limit if the user is on trial tier
    if user_tier == "trial":
        trial_limit_id = limit_ids_store.get("Trial User Budget")
        if trial_limit_id:
            limit_ids_to_apply.append(trial_limit_id)
    
    # Pass the list if it's not empty, otherwise pass None to inherit parent limits
    limit_ids_param = limit_ids_to_apply if limit_ids_to_apply else None

    with track_context(
        user_id=user_id,               # Track which user this is for
        limit_ids=limit_ids_param      # Apply dynamic limits (if any found)
    ):
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful customer support assistant."},
                {"role": "user", "content": f"Ticket ID: {ticket_id}\nQuestion: {question}"}
            ]
        )
    return response.choices[0].message.content

# Generate a support response (will dynamically apply appropriate limits based on department and user tier)
support_reply = generate_support_response(
    ticket_id="TKT-12345",
    question="How do I reset my password?",
    user_id="customer_67890",
    department="Marketing",
    user_tier="trial"
)
print(f"Support response: {support_reply}")

In this example:

  1. The function is associated with the customer_support use case via @track.
  2. The department budget limit ID ("Marketing Department Budget") is retrieved and applied dynamically via track_context.
  3. A user-specific budget limit ID ("Premium User Budget") is retrieved and also applied dynamically via track_context.
  4. The user ID is recorded via track_context.

This layered approach gives you maximum visibility and control over your GenAI spending.

✅ Verification: Checking Limit Application

To verify that your limits are being applied correctly:

  1. Run your application and make several AI calls that should be tracked against your limits.
  2. Log in to developer.pay-i.com.
  3. Navigate to your application dashboard.
  4. Click on Limits in the left sidebar.
  5. You'll see a list of all your limits with their current usage and status.

For limits that have exceeded their threshold but not the maximum, you'll see yellow warning indicators. For limits that have reached their maximum (in "Block" mode), you'll see that requests have been blocked.

You can click on any limit to view detailed information about the requests that have been tracked against it, including timestamps, costs, and associated use cases and users.

➡️ Next Steps

Congratulations! You've learned how to apply spending limits dynamically to your GenAI calls using the track_context function.

Tomorrow in Day 5, we'll bring everything together and explore more advanced patterns for combining all the techniques we've learned throughout this learning path.

💡 Additional Resources