Configure Helicone to automatically retry failed LLM requests, overcoming rate limits and server issues using intelligent exponential backoff.
Retrying requests is a common best practice when dealing with overloaded servers or hitting rate limits. These issues typically manifest as HTTP status codes 429
(Too Many Requests) and 500
(Internal Server Error).
For more information on error codes, see the OpenAI API error codes documentation.
Learn About Exponential Backoff
To effectively deal with retries, we use a strategy called exponential backoff. Exponential backoff involves increasing the wait time between retries exponentially, which helps to spread out the request load and gives the server a chance to recover. This is done by multiplying the wait time by a factor (default is 2) for each subsequent retry.
To get started, set Helicone-Retry-Enabled
to true
.
You can customize the behavior of the retries feature by setting additional headers in your request.
Parameter | Description |
---|---|
helicone-retry-num | Number of retries |
helicone-retry-factor | The exponential backoff factor used to increaase the wait time between subsequent retries. The default is usually 2 . |
helicone-retry-min-timeout | Minimum timeout (in milliseconds) between retries |
helicone-retry-max-timeout | Maximum timeout (in milliseconds) between retries |
Header values have to be strings. For example, "helicone-retry-num": "3"
.
Need more help?
Additional questions or feedback? Reach out to help@helicone.ai or schedule a call with us.
Configure Helicone to automatically retry failed LLM requests, overcoming rate limits and server issues using intelligent exponential backoff.
Retrying requests is a common best practice when dealing with overloaded servers or hitting rate limits. These issues typically manifest as HTTP status codes 429
(Too Many Requests) and 500
(Internal Server Error).
For more information on error codes, see the OpenAI API error codes documentation.
Learn About Exponential Backoff
To effectively deal with retries, we use a strategy called exponential backoff. Exponential backoff involves increasing the wait time between retries exponentially, which helps to spread out the request load and gives the server a chance to recover. This is done by multiplying the wait time by a factor (default is 2) for each subsequent retry.
To get started, set Helicone-Retry-Enabled
to true
.
You can customize the behavior of the retries feature by setting additional headers in your request.
Parameter | Description |
---|---|
helicone-retry-num | Number of retries |
helicone-retry-factor | The exponential backoff factor used to increaase the wait time between subsequent retries. The default is usually 2 . |
helicone-retry-min-timeout | Minimum timeout (in milliseconds) between retries |
helicone-retry-max-timeout | Maximum timeout (in milliseconds) between retries |
Header values have to be strings. For example, "helicone-retry-num": "3"
.
Need more help?
Additional questions or feedback? Reach out to help@helicone.ai or schedule a call with us.