Transactional

Provider Fallback

Automatic failover between LLM providers for high availability.

Overview

AI Gateway automatically fails over to backup providers when your primary provider has errors. Configure fallback chains to ensure your AI features stay online even when providers have outages.

How It Works

Request → Primary Provider
              ↓ (error)
          Fallback 1
              ↓ (error)
          Fallback 2
              ↓ (error)
          Return error to client

When the primary provider returns an error (5xx, timeout, rate limit), AI Gateway automatically retries with the next configured provider.

Configuration

Dashboard Setup

  1. Navigate to AI Gateway Settings
  2. Go to the Fallback tab
  3. Configure model mappings:
Primary ModelFallback 1Fallback 2
gpt-4oclaude-3-5-sonnetgemini-pro
gpt-3.5-turboclaude-3-haiku-
  1. Click Save

Fallback Triggers

AI Gateway triggers fallback on:

Error TypeExampleTriggers Fallback
Server Error500, 502, 503Yes
Rate Limit429Yes
TimeoutNo response in 30sYes
Auth Error401, 403No
Bad Request400No

Auth and bad request errors don't trigger fallback as they indicate configuration issues.

Model Mapping

Map models across providers for seamless failover:

Equivalent Models

Use CaseOpenAIAnthropicGoogle
High capabilitygpt-4oclaude-3-5-sonnetgemini-1.5-pro
Fast & cheapgpt-3.5-turboclaude-3-haikugemini-1.5-flash
Reasoningo1claude-sonnet-4-

Production (high availability):

gpt-4o → claude-3-5-sonnet → gemini-1.5-pro

Cost-optimized:

gpt-3.5-turbo → claude-3-haiku → gemini-1.5-flash

Quality-optimized:

gpt-4o → claude-3-opus → gpt-4-turbo

Behavior

Request Translation

When falling back to a different provider, AI Gateway automatically translates the request format:

// Original request (OpenAI format)
{
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' }
  ],
  temperature: 0.7
}
 
// Automatically translated for Anthropic fallback
{
  model: 'claude-3-5-sonnet',
  system: 'You are helpful.',
  messages: [
    { role: 'user', content: 'Hello!' }
  ],
  temperature: 0.7
}

Response Normalization

Fallback responses are normalized to OpenAI format:

// Always returns OpenAI-compatible response
{
  id: 'chatcmpl-xxx',
  object: 'chat.completion',
  model: 'claude-3-5-sonnet',  // Actual model used
  choices: [{
    message: { role: 'assistant', content: '...' },
    finish_reason: 'stop'
  }],
  usage: {
    prompt_tokens: 50,
    completion_tokens: 100,
    total_tokens: 150
  }
}

Fallback Headers

Check which provider served your request:

X-Provider: anthropic           # Provider that responded
X-Fallback-Used: true           # Fallback was triggered
X-Primary-Error: rate_limited   # Why primary failed

Advanced Configuration

Timeout Settings

Configure timeout before fallback:

SettingDefaultDescription
Request Timeout30sTime to wait for provider response
Fallback Delay100msDelay before trying fallback

Retry Logic

Before falling back, AI Gateway retries the primary provider:

  1. First attempt: Immediate
  2. Retry 1: After 1s
  3. Retry 2: After 2s
  4. Fallback: After all retries fail

Circuit Breaker

When a provider consistently fails, it's temporarily disabled:

  • Threshold: 5 failures in 1 minute
  • Duration: 30 seconds
  • Recovery: Gradual traffic increase

Monitoring Fallbacks

Dashboard Metrics

View fallback statistics in the dashboard:

  • Fallback Rate: Percentage of requests using fallback
  • Primary Provider Health: Success rate by provider
  • Fallback Reasons: Distribution of error types

Alerts

Set up alerts for fallback events:

  1. Go to Settings > Alerts
  2. Create alert for "Fallback Rate > 10%"
  3. Configure notification (email, Slack, webhook)

Best Practices

1. Always Configure Fallbacks

Don't rely on a single provider for production:

✗ gpt-4o (no fallback)
✓ gpt-4o → claude-3-5-sonnet → gemini-pro

2. Match Capabilities

Ensure fallback models can handle your use case:

// If using function calling, fallback must support it
gpt-4o (functions) → claude-3-5-sonnet (tools)

// If using vision, fallback must support it
gpt-4o (vision) → claude-3-opus (no vision) ✗
gpt-4o (vision) → gemini-1.5-pro (vision) ✓

3. Test Fallbacks

Periodically test your fallback chain:

// Force fallback for testing
const response = await fetch('https://api.transactional.dev/ai/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.GATEWAY_API_KEY}`,
    'X-Force-Fallback': 'true',  // Testing header
  },
  body: JSON.stringify({...}),
});

4. Monitor Costs

Fallback models may have different pricing. Monitor cost impact:

ProviderCost per 1M tokens
gpt-4o$5.00 (input)
claude-3-5-sonnet$3.00 (input)
gemini-1.5-pro$1.25 (input)

Limitations

  • Streaming: Fallback only occurs if primary fails before streaming starts
  • Stateful requests: Multi-turn conversations may have context differences
  • Features: Some features (specific tool formats) may not translate perfectly

Next Steps