1000 requests per day or 60 requests per minute. By implementing rate limits, you can prevent abuse while protecting your resources from being overwhelmed by excessive traffic.
Why Rate Limit
- Prevent abuse of the API: Limit the total requests a user can make in a given period to control cost.
- Protect resources from excessive traffic: Maintain availability for all users.
- Control operational cost: Limit the total number of requests sent and total cost.
- Comply with third-party API usage policies: Each model provider has their own rate limit for your key. Helicone’s rate limit is bounded by your provider’s policy.
Quick Start
Set up rate limiting by adding theHelicone-RateLimit-Policy header to your requests:
Configuration Reference
TheHelicone-RateLimit-Policy header uses this format:
Parameters
Maximum number of requests (or cost in cents) allowed within the time window.Example:
1000 for 1000 requestsTime window in seconds. Minimum is 60 seconds.Example:
3600 for 1 hour, 86400 for 1 dayUnit type:
request (default) or cents for cost-based limiting.Example: u=cents to limit by spending instead of request countSegment type:
user for per-user limits, or custom property name for per-property limits. Omit for global limits.Example: s=user or s=organizationRate Limiting Scopes
Helicone supports three types of rate limiting based on who or what you want to limit:Global Rate Limiting
Applies the same limit across all requests using your API key. Use case: “Limit my entire application to 10,000 requests per hour”Per-User Rate Limiting
Applies separate limits for each user ID. Use case: “Each user can make 1,000 requests per day”Per-Property Rate Limiting
Applies separate limits for each custom property value. Use case: “Each organization can make 5,000 requests per hour”Common Use Cases
Global Application Limits
Limit your entire application’s usage:Per-User Limits
Limit each user individually:Per-user rate limiting requires the
Helicone-User-Id header. See User Metrics for more details.Cost-Based Limits
Limit by spending instead of request count:Custom Property Limits
Limit by custom properties like organization or tier:Extracting Rate Limit Response Headers
Extracting the headers allows you to test your rate limit policy in a local environment before deploying to production. If your rate limit policy is active, the following headers will be returned:Helicone-RateLimit-Limit: The quota for the number of requests allowed in the time window.Helicone-RateLimit-Policy: The active rate limit policy.Helicone-RateLimit-Remaining: The remaining quota in the current window.
If a request is rate-limited, a 429 rate limit error will be returned.
Latency Considerations
Using rate limits adds a small amount of latency to your requests. This feature is deployed with Cloudflare’s key-value data store, which is a low-latency service that stores data in a small number of centralized data centers and caches that data in Cloudflare’s data centers after access. The latency add-on is minimal compared to multi-second OpenAI requests.Coming Soon
- Token-based rate limiting - Limit by number of tokens instead of just request count or cost
- Multiple rate limit policies - Apply multiple rate limiting criteria to a single request (e.g., limit by both request count AND cost simultaneously)
Need more help?
Need more help?
Additional questions or feedback? Reach out to
help@helicone.ai or schedule a
call with us.