Overview

AI Gateway automatically fails over to backup providers when your primary provider has errors. Configure fallback chains to ensure your AI features stay online even when providers have outages.

How It Works

Request → Primary Provider
              ↓ (error)
          Fallback 1
              ↓ (error)
          Fallback 2
              ↓ (error)
          Return error to client

When the primary provider returns an error (5xx, timeout, rate limit), AI Gateway automatically retries with the next configured provider.

Configuration

Dashboard Setup

Navigate to AI Gateway Settings
Go to the Fallback tab
Configure model mappings:

Primary Model	Fallback 1	Fallback 2
gpt-4o	claude-3-5-sonnet	gemini-pro
gpt-3.5-turbo	claude-3-haiku	-

Click Save

Fallback Triggers

AI Gateway triggers fallback on:

Error Type	Example	Triggers Fallback
Server Error	500, 502, 503	Yes
Rate Limit	429	Yes
Timeout	No response in 30s	Yes
Auth Error	401, 403	No
Bad Request	400	No

Auth and bad request errors don't trigger fallback as they indicate configuration issues.

Model Mapping

Map models across providers for seamless failover:

Equivalent Models

Use Case	OpenAI	Anthropic	Google
High capability	gpt-4o	claude-3-5-sonnet	gemini-1.5-pro
Fast & cheap	gpt-3.5-turbo	claude-3-haiku	gemini-1.5-flash
Reasoning	o1	claude-sonnet-4	-

Recommended Chains

Production (high availability):

gpt-4o → claude-3-5-sonnet → gemini-1.5-pro

Cost-optimized:

gpt-3.5-turbo → claude-3-haiku → gemini-1.5-flash

Quality-optimized:

gpt-4o → claude-3-opus → gpt-4-turbo

Behavior

Request Translation

When falling back to a different provider, AI Gateway automatically translates the request format:

// Original request (OpenAI format)
{
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' }
  ],
  temperature: 0.7
}
 
// Automatically translated for Anthropic fallback
{
  model: 'claude-3-5-sonnet',
  system: 'You are helpful.',
  messages: [
    { role: 'user', content: 'Hello!' }
  ],
  temperature: 0.7
}

Response Normalization

Fallback responses are normalized to OpenAI format:

// Always returns OpenAI-compatible response
{
  id: 'chatcmpl-xxx',
  object: 'chat.completion',
  model: 'claude-3-5-sonnet',  // Actual model used
  choices: [{
    message: { role: 'assistant', content: '...' },
    finish_reason: 'stop'
  }],
  usage: {
    prompt_tokens: 50,
    completion_tokens: 100,
    total_tokens: 150
  }
}

Fallback Headers

Check which provider served your request:

X-Provider: anthropic           # Provider that responded
X-Fallback-Used: true           # Fallback was triggered
X-Primary-Error: rate_limited   # Why primary failed

Advanced Configuration

Timeout Settings

Configure timeout before fallback:

Setting	Default	Description
Request Timeout	30s	Time to wait for provider response
Fallback Delay	100ms	Delay before trying fallback

Retry Logic

Before falling back, AI Gateway retries the primary provider:

First attempt: Immediate
Retry 1: After 1s
Retry 2: After 2s
Fallback: After all retries fail

Circuit Breaker

When a provider consistently fails, it's temporarily disabled:

Threshold: 5 failures in 1 minute
Duration: 30 seconds
Recovery: Gradual traffic increase

Monitoring Fallbacks

Dashboard Metrics

View fallback statistics in the dashboard:

Fallback Rate: Percentage of requests using fallback
Primary Provider Health: Success rate by provider
Fallback Reasons: Distribution of error types

Alerts

Set up alerts for fallback events:

Go to Settings > Alerts
Create alert for "Fallback Rate > 10%"
Configure notification (email, Slack, webhook)

Best Practices

1. Always Configure Fallbacks

Don't rely on a single provider for production:

✗ gpt-4o (no fallback)
✓ gpt-4o → claude-3-5-sonnet → gemini-pro

2. Match Capabilities

Ensure fallback models can handle your use case:

// If using function calling, fallback must support it
gpt-4o (functions) → claude-3-5-sonnet (tools)

// If using vision, fallback must support it
gpt-4o (vision) → claude-3-opus (no vision) ✗
gpt-4o (vision) → gemini-1.5-pro (vision) ✓

3. Test Fallbacks

Periodically test your fallback chain:

// Force fallback for testing
const response = await fetch('https://api.transactional.dev/ai/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.GATEWAY_API_KEY}`,
    'X-Force-Fallback': 'true',  // Testing header
  },
  body: JSON.stringify({...}),
});

4. Monitor Costs

Fallback models may have different pricing. Monitor cost impact:

Provider	Cost per 1M tokens
gpt-4o	$5.00 (input)
claude-3-5-sonnet	$3.00 (input)
gemini-1.5-pro	$1.25 (input)

Limitations

Streaming: Fallback only occurs if primary fails before streaming starts
Stateful requests: Multi-turn conversations may have context differences
Features: Some features (specific tool formats) may not translate perfectly

Next Steps

Caching - Combine with caching for reliability
Rate Limiting - Understand rate limits
Cost Tracking - Monitor fallback costs

Provider Fallback