Historical Data Backfill
Overview
This guide covers strategies for importing large volumes of historical GenAI usage data into Pay-i. These approaches are typically used during onboarding periods or when migrating from other tracking systems to ensure complete visibility of your past usage patterns.
Historical data backfill allows you to:
- Import bulk past GenAI usage data that wasn't tracked in real time
- Migrate large datasets from another analytics system
- Establish baseline usage patterns for forecasting and budgeting
- Create a complete historical record for compliance or analysis
Note: This page focuses specifically on importing large datasets of historical records. For submitting individual historical events or small batches, please refer to the Manual Event Submission (Ingest API) documentation.
Methods
Bulk Ingest API
The /requests/bulk-ingest
endpoint is the primary method for importing large historical datasets into Pay-i. This specialized API is designed to handle high-volume data imports efficiently by:
- Processing thousands of events in a single network request
- Reducing API call overhead compared to individual event submission
- Supporting the same data structure and capabilities as the standard Ingest API
- Automatically calculating historical pricing based on event timestamps
Access to the Bulk Ingest API requires contacting [email protected] as it is not enabled by default. The Pay-i team can help configure your account for bulk ingestion and provide guidance on the optimal data format and import strategy for your specific scenario.
For complete API reference documentation, see the Bulk Ingest API Documentation.
Manual Event Submission (for smaller sets)
For smaller datasets or individual historical records, the standard Manual Event Submission (Ingest API) approach is more appropriate. This method uses the event_timestamp
parameter to specify when each event occurred, ensuring accurate historical tracking.
The standard Ingest API is suitable for:
- One-off historical events
- Small batches of records (dozens to hundreds)
- Regular submission of recent historical data
Considerations
When performing historical data backfill, keep these important factors in mind:
Event Timestamp Usage
The event_timestamp
field is critical for accurate historical pricing. Pay-i will automatically use the resource pricing that was in effect at the time specified by this timestamp.
{
"category": "system.openai",
"resource": "gpt-4",
"event_timestamp": "2024-02-15T00:00:00Z",
"units": {
"text": {
"input": 1200,
"output": 450
}
}
// Additional fields...
}
If you attempt to backfill data with a timestamp that predates a resource's availability (e.g., trying to backfill gpt-4 usage from before its release), Pay-i will return an error.
Data Preparation
Before initiating a large-scale data backfill:
- Ensure your historical data contains all required fields (category, resource, units)
- Map your historical usage to the appropriate Pay-i resources and categories
- Convert any custom metrics to Pay-i's units structure
- Validate your data format with a small test batch
Related Resources
- Manual Event Submission (Ingest API) - For submitting individual historical events
- Bulk Ingest API Documentation - Complete API reference for bulk data submission
Updated 10 days ago