Guides

Limits (Continued)

Limits are used to control your GenAI traffic and costs.

Overview

Limits can be created either via the API or in the Pay-i UI.

After creating a limit, the API or Pay-i UI will provide a limit-id, which is used when making requests. You use the optional header xProxy-Limit-IDs to specify any number of limit-id's with a comma-separated list. When you make a request with one or more limit-id's, those limits are evaluated to see if their configuration allows the request to proceed.

If a limit blocks the request, then a call to the Provider is not made (and thus no cost is incurred), and the resulting xproxy_result object will indicate which limit prevented the call from proceeding. If none of the limits block the request, the request becomes a normal API call to the Provider.

After a request is completed, the total cost of that request is added to each limit’s totals. The resulting xproxy_result object will also detail the state of each limit, and which, if any, are over their thresholds and/or maximums.

Limit Creation

Limits are created with the following characteristics:

  1. name
    1. A friendly name used to visualize the limit in the Pay-i UI.
  2. max
    1. The maximum spend in USD.
    2. Any spend in excess of the max is tracked and reported in the Pay-i dashboard and in the limit API as Limit Overrun.
      1. To see how it is tracked in the API, see Understanding Limit Details.
  3. threshold
    1. The threshold is expressed as a percentage of the limit's maximum between 75% and 99%.
    2. The Risk Threshold is the threshold multiplied by the max.
      1. For example, with a $10.00 max and a threshold of 0.8:
        1. Risk Threshold = $10.00 * 0.8 = $8.00.
    3. If a threshold is not specified it defaults to 1.0, meaning the Risk Threshold is simply equal to the max.
    4. To see how Risk Threshold is used, see Limit States.
  4. limit_type
    1. The limit_type defines what happens when spend exceeds the limit's max.
    2. There are two limit types: Allow and Block.
      1. Allow limits will allow all requests to proceed, even if the max has been reached or exceeded.
      2. Block limits will prevent requests from proceeding if the max has been reached or exceeded.
        1. The request which causes the limits to reach or exceed its maximum may carry Limit Overrun costs.
        2. All requests made after the limit has reached or exceeded the maximum will be blocked.
  5. Tags
    1. See Limit Tags.

Limit States

The Pay-i API will return a state object for each limit used in a request in the xproxy_result. These states can be used by your code to choose the appropriate actions to de-risk spend from exceeding the limit's max, such as by switching to a cheaper end-to-end solution, stopping further requests, or up-selling your user.

This section describes the possible values for the state object and when they occur.

StateDescription
okThe amount spent is still below the limit's Risk Threshold.
exceededThe amount spent has exceeded the limit's Risk Threshold but is less than the limit's max.
overrunThe amount of spend has exceeded the limit's max.

For an Allow limit, all further requests will result in the overrun limit state.
For a Block limit, only the request which causes the spend to exceed the limit's max will have the overrun limit state. All further requests will result in the blocked limit state.
blockedThis limit has blocked the request.
blocked_externalblocked_external can only appear when multiple limits are used for a single request, and at least one of them is of type Block. The following example illustrates when blocked_external is used.

Limit A, type: Allow
Limit B, type: Block
A request is made declaring both limits, A and B. Limit A allows the request. Limit B blocks the request. The state of the request for Limit B is blocked. The state of the request for Limit A is blocked_external, indicating the request was blocked by another limit that was part of the same request.

All reported limit states are stored and can be reviewed with the Get Limit Details API. See the Understanding Limit Details section for more information.

Limit State Diagram

* - For Block limits, the request which causes the budget to exceed its maximum will have the overrun state. All further requests will have the blocked state.

Limit State Examples

Allow

  • max = $10.00
  • threshold = 0.8
  • Risk Threshold = $8.00
#Spend (Before)Request CostSpend (After)StateOverrun
1.$7.80$0.19$7.99ok$0.00
2.$7.99$2.00$9.99exceeded$0.00
2.$9.99$0.30$10.29overrun$0.29
3.$10.29$0.50$10.79overrun$0.79

Block

  • max = $10.00
  • threshold = 0.8
  • Risk Threshold = $8.00
#Spend (Before)Request CostSpend (After)StateOverrun
1.$7.80$0.19$7.99ok$0.00
2.$7.99$2.00$9.99exceeded$0.00
2.$9.99$0.30$10.29overrun$0.29
3.$10.29N/A$10.29blocked$0.29


Related APIs