Currently, only BYOK (Bring Your Own Keys) and passthrough routing are supported. Pass-through billing (PTB) is coming soon.
The Problem
Using LLMs in production means dealing with:- Provider outages that break your app
- Rate limits that block your users
- Regional restrictions that limit availability
- Vendor lock-in that prevents optimization
The Solution
Provider routing gives you access to the same model across multiple providers. When OpenAI goes down, your app automatically switches to Azure or AWS Bedrock. When you hit rate limits, traffic flows to another provider. All without changing your code.How It Works
1
You request a model
Your app asks for
gpt-4o-mini
just like normal2
Gateway finds providers
Consults the Model Registry to find all providers offering this model
3
Smart routing
Applies sorting algorithm (cheapest first) then attempts providers
4
Automatic failover
If a provider fails, instantly tries the next one
You request a model
The simplest approach lets the gateway handle everything:The gateway only tries providers where you’ve configured API keys. See Provider Setup to add your keys.
Routing Options
Format:
model: "gpt-4o-mini"
Best for:- Maximum uptime in production
- Automatic cost optimization
- Zero-config reliability
How the Gateway Finds Models
The Model Registry is our source of truth for which providers support which models. This powers intelligent routing.Two Ways to Access Models
Option 1: Passthrough Billing (PTB) - Coming Soon
Use Helicone’s API keys in supported regions. Zero configuration required.Option 2: Your Own Keys (BYOK)
Add your provider keys in Provider Settings. The gateway uses YOUR keys for all requests.When you add a provider deployment, ALL models and regions that provider supports become available through your deployment. PTB fallback for reliability is coming soon.
Passthrough Routing (Unknown Models)
The gateway forwards ANY model/provider combination, even if not in our registry:Smart Routing Algorithm
When multiple deployments are available, the gateway intelligently selects which to use:Routing Priority
- Your deployments (BYOK) - Always tried first
- PTB endpoints - Automatic fallback for reliability (coming soon)
Selection Logic
Within each priority level, we:- Sort by cost - Cheapest deployments first
- Load balance - If costs are equal or unknown, randomly distribute requests
- Your cheapest deployment (e.g., Brazil if cheaper)
- Your other deployments (e.g., US)
- Helicone PTB endpoints (coming soon)
Failover Triggers
The gateway automatically tries the next provider when encountering these errors:Error | Description |
---|---|
429 | Rate limit errors |
401 | Authentication errors |
400 | Context length errors |
408 | Timeout errors |
500+ | Server errors |
The gateway only attempts providers where you have configured API keys (BYOK). When Pass-through Billing (PTB) launches, the gateway will automatically try Helicone’s API keys as a fallback.