429
(Too Many Requests), 500
(Internal Server Error), or 503
(Service Unavailable).
Why Use Retries
- Handle rate limits gracefully - Automatically retry when you hit provider rate limits
- Overcome temporary failures - Recover from transient network issues or server overload
- Improve reliability - Increase the success rate of your LLM requests without manual intervention
If you’re using the AI Gateway, automatic failover is usually better than retries. However, retries are ideal when you must use a specific provider endpoint (e.g., EU-hosted models for compliance, fine-tuned models, or region-specific deployments).
How It Works
Helicone uses exponential backoff to intelligently space out retry attempts. This strategy:- Starts with a short delay (default 1 second)
- Doubles the wait time after each failed attempt
- Caps the maximum wait time (default 10 seconds)
- Prevents overwhelming the server while maximizing success chances
Quick Start
To enable automatic retries, add theHelicone-Retry-Enabled: true
header to your requests:
Each retry attempt is logged separately in Helicone, allowing you to track retry patterns and success rates.
Configuration
Customize retry behavior with these optional headers:Maximum number of retry attempts. Set to “0” to disable retries for specific requests.Example:
"5"
for up to 5 retriesExponential backoff multiplier. Controls how quickly the delay increases between retries.Example:
"2"
doubles the wait time after each attemptMinimum delay between retries in milliseconds.Example:
"1000"
for 1 second minimum waitMaximum delay between retries in milliseconds, regardless of exponential growth.Example:
"10000"
caps wait time at 10 secondsAll header values must be strings. Numbers should be quoted:
"Helicone-Retry-Num": "3"
not "Helicone-Retry-Num": 3
Common Use Cases
EU-Hosted Model for GDPR Compliance
Fine-Tuned Model on Specific Provider
Custom Provider Endpoint
Retry Triggers
Helicone automatically retries requests that fail with these status codes:- 429 - Rate limit exceeded
- 500 - Internal server error
- 502 - Bad gateway
- 503 - Service unavailable
- 504 - Gateway timeout
Need more help?
Need more help?
Additional questions or feedback? Reach out to
help@helicone.ai or schedule a
call with us.