Supports BYOK (Bring Your Own Keys), passthrough routing, and pass-through billing (PTB). To enable PTB with Helicone’s API keys, request access in Settings → Credits.
The Problem
Using LLMs in production means dealing with:- Provider outages that break your app
- Rate limits that block your users
- Regional restrictions that limit availability
- Vendor lock-in that prevents optimization
The Solution
Provider routing gives you access to the same model across multiple providers. When OpenAI goes down, your app automatically switches to Azure or AWS Bedrock. When you hit rate limits, traffic flows to another provider. All without changing your code.How It Works
1
You request a model
Your app asks for
gpt-4o-mini
just like normal2
Gateway finds providers
Consults the Model Registry to find all providers offering this model
3
Smart routing
Applies sorting algorithm (cheapest first) then attempts providers
4
Automatic failover
If a provider fails, instantly tries the next one
You request a model
The simplest approach lets the gateway handle everything:The gateway only tries providers where you’ve configured API keys. See Provider Setup to add your keys.
Routing Options
- Automatic
- Provider Specific
- Custom Deployment
- Fallback Chain
Format:
model: "gpt-4o-mini"
Best for:- Maximum uptime in production
- Automatic cost optimization
- Zero-config reliability
How the Gateway Finds Models
The Model Registry is our source of truth for which providers support which models. This powers intelligent routing.Two Ways to Access Models
Option 1: Passthrough Billing (PTB)
Use Helicone’s API keys in supported regions. Zero configuration required - just request access in Settings → Credits.Option 2: Your Own Keys (BYOK)
Add your provider keys in Provider Settings. The gateway uses YOUR keys for all requests.When you add a provider deployment, ALL models and regions that provider supports become available through your deployment.
Passthrough Routing (Unknown Models)
The gateway forwards ANY model/provider combination, even if not in our registry:Smart Routing Algorithm
When multiple deployments are available, the gateway intelligently selects which to use:Routing Priority
- Your deployments (BYOK) - Always tried first
- PTB endpoints - Automatic fallback for reliability
Selection Logic
Within each priority level, we:- Sort by cost - Cheapest deployments first
- Load balance - If costs are equal or unknown, randomly distribute requests
- Your cheapest deployment (e.g., Brazil if cheaper)
- Your other deployments (e.g., US)
- Helicone PTB endpoints
Failover Triggers
The gateway automatically tries the next provider when encountering these errors:Error | Description |
---|---|
429 | Rate limit errors |
401 | Authentication errors |
400 | Context length errors |
408 | Timeout errors |
500+ | Server errors |
The gateway attempts providers where you have configured API keys (BYOK) first, then falls back to Helicone’s API keys via Pass-through Billing (PTB) if enabled.