Provider Fallback
Automatic failover between LLM providers for high availability.
Overview
AI Gateway automatically fails over to backup providers when your primary provider has errors. Configure fallback chains to ensure your AI features stay online even when providers have outages.
How It Works
Request → Primary Provider
↓ (error)
Fallback 1
↓ (error)
Fallback 2
↓ (error)
Return error to client
When the primary provider returns an error (5xx, timeout, rate limit), AI Gateway automatically retries with the next configured provider.
Configuration
Dashboard Setup
- Navigate to AI Gateway Settings
- Go to the Fallback tab
- Configure model mappings:
| Primary Model | Fallback 1 | Fallback 2 |
|---|---|---|
| gpt-4o | claude-3-5-sonnet | gemini-pro |
| gpt-3.5-turbo | claude-3-haiku | - |
- Click Save
Fallback Triggers
AI Gateway triggers fallback on:
| Error Type | Example | Triggers Fallback |
|---|---|---|
| Server Error | 500, 502, 503 | Yes |
| Rate Limit | 429 | Yes |
| Timeout | No response in 30s | Yes |
| Auth Error | 401, 403 | No |
| Bad Request | 400 | No |
Auth and bad request errors don't trigger fallback as they indicate configuration issues.
Model Mapping
Map models across providers for seamless failover:
Equivalent Models
| Use Case | OpenAI | Anthropic | |
|---|---|---|---|
| High capability | gpt-4o | claude-3-5-sonnet | gemini-1.5-pro |
| Fast & cheap | gpt-3.5-turbo | claude-3-haiku | gemini-1.5-flash |
| Reasoning | o1 | claude-sonnet-4 | - |
Recommended Chains
Production (high availability):
gpt-4o → claude-3-5-sonnet → gemini-1.5-pro
Cost-optimized:
gpt-3.5-turbo → claude-3-haiku → gemini-1.5-flash
Quality-optimized:
gpt-4o → claude-3-opus → gpt-4-turbo
Behavior
Request Translation
When falling back to a different provider, AI Gateway automatically translates the request format:
// Original request (OpenAI format)
{
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' }
],
temperature: 0.7
}
// Automatically translated for Anthropic fallback
{
model: 'claude-3-5-sonnet',
system: 'You are helpful.',
messages: [
{ role: 'user', content: 'Hello!' }
],
temperature: 0.7
}Response Normalization
Fallback responses are normalized to OpenAI format:
// Always returns OpenAI-compatible response
{
id: 'chatcmpl-xxx',
object: 'chat.completion',
model: 'claude-3-5-sonnet', // Actual model used
choices: [{
message: { role: 'assistant', content: '...' },
finish_reason: 'stop'
}],
usage: {
prompt_tokens: 50,
completion_tokens: 100,
total_tokens: 150
}
}Fallback Headers
Check which provider served your request:
X-Provider: anthropic # Provider that responded
X-Fallback-Used: true # Fallback was triggered
X-Primary-Error: rate_limited # Why primary failed
Advanced Configuration
Timeout Settings
Configure timeout before fallback:
| Setting | Default | Description |
|---|---|---|
| Request Timeout | 30s | Time to wait for provider response |
| Fallback Delay | 100ms | Delay before trying fallback |
Retry Logic
Before falling back, AI Gateway retries the primary provider:
- First attempt: Immediate
- Retry 1: After 1s
- Retry 2: After 2s
- Fallback: After all retries fail
Circuit Breaker
When a provider consistently fails, it's temporarily disabled:
- Threshold: 5 failures in 1 minute
- Duration: 30 seconds
- Recovery: Gradual traffic increase
Monitoring Fallbacks
Dashboard Metrics
View fallback statistics in the dashboard:
- Fallback Rate: Percentage of requests using fallback
- Primary Provider Health: Success rate by provider
- Fallback Reasons: Distribution of error types
Alerts
Set up alerts for fallback events:
- Go to Settings > Alerts
- Create alert for "Fallback Rate > 10%"
- Configure notification (email, Slack, webhook)
Best Practices
1. Always Configure Fallbacks
Don't rely on a single provider for production:
✗ gpt-4o (no fallback)
✓ gpt-4o → claude-3-5-sonnet → gemini-pro
2. Match Capabilities
Ensure fallback models can handle your use case:
// If using function calling, fallback must support it
gpt-4o (functions) → claude-3-5-sonnet (tools)
// If using vision, fallback must support it
gpt-4o (vision) → claude-3-opus (no vision) ✗
gpt-4o (vision) → gemini-1.5-pro (vision) ✓
3. Test Fallbacks
Periodically test your fallback chain:
// Force fallback for testing
const response = await fetch('https://api.transactional.dev/ai/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.GATEWAY_API_KEY}`,
'X-Force-Fallback': 'true', // Testing header
},
body: JSON.stringify({...}),
});4. Monitor Costs
Fallback models may have different pricing. Monitor cost impact:
| Provider | Cost per 1M tokens |
|---|---|
| gpt-4o | $5.00 (input) |
| claude-3-5-sonnet | $3.00 (input) |
| gemini-1.5-pro | $1.25 (input) |
Limitations
- Streaming: Fallback only occurs if primary fails before streaming starts
- Stateful requests: Multi-turn conversations may have context differences
- Features: Some features (specific tool formats) may not translate perfectly
Next Steps
- Caching - Combine with caching for reliability
- Rate Limiting - Understand rate limits
- Cost Tracking - Monitor fallback costs
On This Page
- Overview
- How It Works
- Configuration
- Dashboard Setup
- Fallback Triggers
- Model Mapping
- Equivalent Models
- Recommended Chains
- Behavior
- Request Translation
- Response Normalization
- Fallback Headers
- Advanced Configuration
- Timeout Settings
- Retry Logic
- Circuit Breaker
- Monitoring Fallbacks
- Dashboard Metrics
- Alerts
- Best Practices
- 1. Always Configure Fallbacks
- 2. Match Capabilities
- 3. Test Fallbacks
- 4. Monitor Costs
- Limitations
- Next Steps