Instrumenting a Web Server
Overview
Web servers handle many concurrent requests, each on behalf of a different user, use case, or workflow. This makes them fundamentally different from scripts or notebooks where a single execution context exists for the lifetime of the process. By default, payi_instrument() enables global instrumentation, which automatically tracks every GenAI call made in the process under a shared context. This works well for single-threaded and multi threaded where you control the flow of execution from start to finish, but in a web server there is no single context that makes sense for all requests — each HTTP request carries its own state: user, use case, and tracking metadata.
To handle this, you disable global instrumentation at startup and instead instrument each request handler individually using @track or track_context(). This ensures that each HTTP request establishes its own isolated Pay-i context, and concurrent requests never interfere with each other.
Prerequisites
- A Pay-i Application and API key (see Pay-i API Keys)
- Python 3.9 or later
- The
payiandopenaipackages installed
pip install payi openai flaskStep 1: Initialize Pay-i at startup
Call payi_instrument() once when the application starts, with global_instrumentation set to False. This sets up the SDK and patches the provider clients without creating a global tracking context.
from payi.lib.instrument import payi_instrument
payi_instrument(
config={
"global_instrumentation": False,
}
)payi_instrument() must be called before creating any provider clients. Clients created before instrumentation are not patched and their calls will not be tracked.
Step 2: Instrument each endpoint
With global instrumentation disabled, you use @track or track_context() in each request handler to establish a per-request Pay-i context. All GenAI calls made within that scope are tracked against the parameters you provide — use case, user, limits, and properties.
The example below is a Flask application that serves as a simple document generator with two endpoints:
- Create document — accepts a prompt, calls an LLM to generate a short poem, and returns a document ID.
- Append to document — accepts a document ID and text, calls an LLM to append the text while keeping the writing voice consistent, and stores the result.
Full example
import uuid
from flask import Flask, request, jsonify
from openai import OpenAI
from payi.lib.instrument import payi_instrument, track, track_context
# --- Initialize Pay-i before creating any clients ---
payi_instrument(
config={
"global_instrumentation": False,
}
)
app = Flask(__name__)
client = OpenAI()
@app.route("/documents", methods=["POST"])
@track(use_case_name="create_document")
def create_document():
data = request.json
prompt = data["prompt"]
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "system", "content": "You are a poet. Write a short poem based on the user's prompt."},
{"role": "user", "content": prompt},
],
)
poem = response.choices[0].message.content
document_id = str(uuid.uuid4())
# store_document(document_id, poem)
return jsonify({"document_id": document_id, "content": poem})
@app.route("/documents/<document_id>/append", methods=["POST"])
def append_document(document_id):
data = request.json
text_to_append = data["text"]
# validate_document_id(document_id)
with track_context(use_case_name="append_document", use_case_id=document_id):
# existing_content = get_document(document_id)
existing_content = "(existing document content)"
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{
"role": "system",
"content": (
"You are editing an existing document. Append the new text to the document "
"while ensuring the writing voice and style remain consistent throughout. "
"Return the full updated document."
),
},
{"role": "user", "content": f"Existing document:\n{existing_content}\n\nText to append:\n{text_to_append}"},
],
)
updated_content = response.choices[0].message.content
# store_document(document_id, updated_content)
return jsonify({"document_id": document_id, "content": updated_content})
if __name__ == "__main__":
app.run(debug=True)How the two endpoints differ
The two endpoints demonstrate different instrumentation approaches, each suited to a different situation.
Create document: @track
@trackThe create_document endpoint uses the @track decorator because the tracking parameters are known at definition time and apply to the entire function. Pay-i automatically generates a use_case_id for each invocation, so every document creation is tracked as a distinct Instance of the create_document use case.
@app.route("/documents", methods=["POST"])
@track(use_case_name="create_document")
def create_document():
# All GenAI calls in this function are tracked under "create_document"
...Append document: track_context()
track_context()The append_document endpoint uses track_context() because the use_case_id comes from the request path and is only available at runtime. By passing the document ID as the use_case_id, all append operations on the same document are grouped under one use case instance — allowing you to see the total cost of building that document over time.
def append_document(document_id):
with track_context(use_case_name="append_document", use_case_id=document_id):
# All GenAI calls in this block are tracked under "append_document"
# with the document_id as the instance ID
...Why global instrumentation is disabled for web servers
When "config": { "global_instrumentation": True } (the default), payi_instrument() creates a process-wide context that is automatically applied to every GenAI call. This is convenient for scripts where every call shares the same context, but it causes problems in web servers:
- No shared context exists. Each HTTP request has its own user, use case, and parameters. A global context would either apply incorrect metadata to requests or provide no useful attribution.
- Concurrent requests would collide. If one request sets a global
use_case_idand another request reads it, the tracking data becomes incorrect.
Specifying "config": { "global_instrumentation": False } disables the automatic context. You then use @track or track_context() to create an isolated context for each request, ensuring accurate attribution even under concurrent load.
Updated about 3 hours ago