Your AI Should Never Go Down. Here is How to Set Up Fallback Routing.
Configure fallback chains across LLM providers so your AI features stay up when any single provider goes down. Includes TypeScript implementation with health checks and cost-aware routing.
Transactional Team
Jan 26, 2026
>>
10 min read
Share
On March 12, 2024, the OpenAI API went down for 3 hours. On January 25, 2025, Anthropic had a 90-minute outage. Google's Vertex AI has had multiple incidents. Every major LLM provider has gone down, and every one of them will go down again.
If your production AI feature depends on a single provider, it will go down with them. This is a common failure pattern across the industry. The fix is straightforward: fallback routing.
What You Will Learn
Why single-provider AI is fragile
How to implement a fallback router with health checks
Latency-based and cost-aware routing strategies
A complete TypeScript implementation you can adapt
Why Fallback Routing Matters
99.99%Uptime with 2-Provider Fallback
43 hrsAnnual Downtime at 99.5% (Single Provider)
94%Failed Requests Auto-Recovered
3Routing Strategies (Priority, Latency, Cost)
The Problem with Single-Provider AI
When you hardcode openai.chat.completions.create() everywhere, you have a single point of failure. An outage, rate limit, or even a slow response from that one provider means your entire AI feature is down.
The numbers are not great. Based on public status page data, each major provider has 99.5% to 99.9% uptime. That sounds high until you calculate the downtime: 99.9% uptime is 8.7 hours per year. 99.5% is 43 hours. If you chain two independent providers with 99.5% uptime each, your combined availability jumps to 99.9975%.
Step 1: Define the Provider Interface
First, create a uniform interface across providers. Every LLM provider has a slightly different API, but they all do the same thing: take messages in, return a completion out.
The router tries providers in order, falling back to the next one on failure:
// router.tsclass FallbackRouter { private providers: LLMProvider[]; private health: ProviderHealthChecker; private strategy: 'priority' | 'latency' | 'cost'; constructor( providers: LLMProvider[], strategy: 'priority' | 'latency' | 'cost' = 'priority' ) { this.providers = providers; this.health = new ProviderHealthChecker(providers); this.strategy = strategy; } async complete(request: LLMRequest): Promise<LLMResponse> { const orderedProviders = this.getProviderOrder(); let lastError: Error | null = null; for (const provider of orderedProviders) { if (!this.health.isHealthy(provider.name)) { continue; // skip unhealthy providers } const adapter = ADAPTERS[provider.name]; if (!adapter) continue; for (let attempt = 0; attempt < provider.maxRetries; attempt++) { try { const response = await Promise.race([ adapter(request), timeout(provider.timeoutMs), ]); this.health.recordSuccess( provider.name, response.latencyMs ); return response; } catch (error) { lastError = error as Error; // Rate limit errors - do not retry, move to next provider if (isRateLimitError(error)) { this.health.recordFailure(provider.name); break; } // Timeout or server error - retry if (attempt < provider.maxRetries - 1) { await delay(Math.pow(2, attempt) * 1000); // exponential backoff } } } this.health.recordFailure(provider.name); } throw new Error( `All providers failed. Last error: ${lastError?.message}` ); } private getProviderOrder(): LLMProvider[] { const healthy = this.providers.filter((p) => this.health.isHealthy(p.name) ); switch (this.strategy) { case 'latency': return healthy.sort( (a, b) => this.health.getLatency(a.name) - this.health.getLatency(b.name) ); case 'cost': return healthy.sort( (a, b) => a.costMultiplier - b.costMultiplier ); case 'priority': default: return healthy.sort((a, b) => a.priority - b.priority); } }}function timeout(ms: number): Promise<never> { return new Promise((_, reject) => setTimeout(() => reject(new Error('Request timed out')), ms) );}function delay(ms: number): Promise<void> { return new Promise((resolve) => setTimeout(resolve, ms));}function isRateLimitError(error: unknown): boolean { if (error instanceof Error) { return ( error.message.includes('429') || error.message.includes('rate limit') ); } return false;}
Step 5: Configure and Use
// usage.tsconst router = new FallbackRouter( [ { name: 'anthropic', models: ['claude-sonnet-4-6'], priority: 1, costMultiplier: 1.0, maxRetries: 2, timeoutMs: 30_000, }, { name: 'openai', models: ['gpt-4o'], priority: 2, costMultiplier: 0.8, maxRetries: 2, timeoutMs: 30_000, }, ], 'priority' // try anthropic first, fallback to openai);// Use it exactly like a single providerconst response = await router.complete({ messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Explain dependency injection.' }, ], maxTokens: 500,});console.log(`Response from ${response.provider} (${response.model})`);console.log(`Latency: ${response.latencyMs}ms`);console.log(response.content);
Model Mapping
Different providers have different model strengths. Map equivalent capability tiers:
const MODEL_EQUIVALENTS: Record<string, Record<string, string>> = { // Tier: Best reasoning reasoning: { anthropic: 'claude-opus-4-6', openai: 'gpt-4.1', }, // Tier: Fast and capable standard: { anthropic: 'claude-sonnet-4-6', openai: 'gpt-4o', }, // Tier: Fast and cheap fast: { anthropic: 'claude-haiku-4-5', openai: 'gpt-4o-mini', },};// When falling back, use the equivalent model tierfunction getModelForProvider( tier: string, provider: string): string { return MODEL_EQUIVALENTS[tier]?.[provider] ?? 'gpt-4o-mini';}
Routing Strategies Compared
Strategy
Best For
Trade-off
Priority
Consistent behavior, preferred model quality
May not use cheapest or fastest option
Latency
User-facing features needing fast responses
May route to more expensive providers
Cost
Background processing, batch jobs
May sacrifice speed for savings
For most production applications, we recommend priority routing with cost as the tiebreaker. You get consistent model behavior with automatic fallback when things break.
The Takeaway
Single-provider AI is a single point of failure. A fallback router with health checks takes a few hundred lines of code and gives you near-perfect uptime. The pattern is simple: define adapters, track health, route in order, fall back on failure.
If you want this managed, Transactional's AI Gateway provides multi-provider routing, automatic fallbacks, and real-time health monitoring out of the box. But the router above will get you to 99.99% availability on its own.