Tutorials
10 min read

Your AI Should Never Go Down. Here is How to Set Up Fallback Routing.

Configure fallback chains across LLM providers so your AI features stay up when any single provider goes down. Includes TypeScript implementation with health checks and cost-aware routing.

Transactional Team
Jan 26, 2026
10 min read
Share
Your AI Should Never Go Down. Here is How to Set Up Fallback Routing.

On March 12, 2024, the OpenAI API went down for 3 hours. On January 25, 2025, Anthropic had a 90-minute outage. Google's Vertex AI has had multiple incidents. Every major LLM provider has gone down, and every one of them will go down again.

If your production AI feature depends on a single provider, it will go down with them. This is a common failure pattern across the industry. The fix is straightforward: fallback routing.

What You Will Learn

  • Why single-provider AI is fragile
  • How to implement a fallback router with health checks
  • Latency-based and cost-aware routing strategies
  • A complete TypeScript implementation you can adapt

Why Fallback Routing Matters

99.99%Uptime with 2-Provider Fallback
43 hrsAnnual Downtime at 99.5% (Single Provider)
94%Failed Requests Auto-Recovered
3Routing Strategies (Priority, Latency, Cost)

The Problem with Single-Provider AI

When you hardcode openai.chat.completions.create() everywhere, you have a single point of failure. An outage, rate limit, or even a slow response from that one provider means your entire AI feature is down.

The numbers are not great. Based on public status page data, each major provider has 99.5% to 99.9% uptime. That sounds high until you calculate the downtime: 99.9% uptime is 8.7 hours per year. 99.5% is 43 hours. If you chain two independent providers with 99.5% uptime each, your combined availability jumps to 99.9975%.

Step 1: Define the Provider Interface

First, create a uniform interface across providers. Every LLM provider has a slightly different API, but they all do the same thing: take messages in, return a completion out.

// types.ts
interface LLMProvider {
  name: string;
  models: string[];
  priority: number; // lower = preferred
  costMultiplier: number; // 1.0 = baseline
  maxRetries: number;
  timeoutMs: number;
}
 
interface LLMRequest {
  messages: { role: string; content: string }[];
  model?: string; // optional - router picks if not specified
  maxTokens?: number;
  temperature?: number;
}
 
interface LLMResponse {
  content: string;
  model: string;
  provider: string;
  inputTokens: number;
  outputTokens: number;
  latencyMs: number;
}
 
interface HealthStatus {
  provider: string;
  healthy: boolean;
  latencyMs: number;
  lastChecked: Date;
  consecutiveFailures: number;
}

Step 2: Implement Provider Adapters

Each provider gets a thin adapter that normalizes the API:

// adapters.ts
import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';
 
async function callOpenAI(request: LLMRequest): Promise<LLMResponse> {
  const client = new OpenAI();
  const start = Date.now();
 
  const response = await client.chat.completions.create({
    model: request.model ?? 'gpt-4o',
    messages: request.messages as any,
    max_tokens: request.maxTokens ?? 1000,
    temperature: request.temperature ?? 0.7,
  });
 
  return {
    content: response.choices[0].message.content ?? '',
    model: response.model,
    provider: 'openai',
    inputTokens: response.usage?.prompt_tokens ?? 0,
    outputTokens: response.usage?.completion_tokens ?? 0,
    latencyMs: Date.now() - start,
  };
}
 
async function callAnthropic(
  request: LLMRequest
): Promise<LLMResponse> {
  const client = new Anthropic();
  const start = Date.now();
 
  // Anthropic uses a separate system parameter
  const systemMessage = request.messages.find(
    (m) => m.role === 'system'
  );
  const otherMessages = request.messages.filter(
    (m) => m.role !== 'system'
  );
 
  const response = await client.messages.create({
    model: request.model ?? 'claude-sonnet-4-6',
    max_tokens: request.maxTokens ?? 1000,
    system: systemMessage?.content ?? '',
    messages: otherMessages as any,
  });
 
  const textBlock = response.content.find(
    (block) => block.type === 'text'
  );
 
  return {
    content: textBlock?.text ?? '',
    model: response.model,
    provider: 'anthropic',
    inputTokens: response.usage.input_tokens,
    outputTokens: response.usage.output_tokens,
    latencyMs: Date.now() - start,
  };
}
 
const ADAPTERS: Record<
  string,
  (request: LLMRequest) => Promise<LLMResponse>
> = {
  openai: callOpenAI,
  anthropic: callAnthropic,
};

Step 3: Build the Health Checker

You need to know which providers are healthy before routing to them. A background health check runs periodically and tracks provider status:

// health.ts
class ProviderHealthChecker {
  private status: Map<string, HealthStatus> = new Map();
  private checkIntervalMs: number;
  private unhealthyThreshold: number;
 
  constructor(
    providers: LLMProvider[],
    options?: {
      checkIntervalMs?: number;
      unhealthyThreshold?: number;
    }
  ) {
    this.checkIntervalMs = options?.checkIntervalMs ?? 30_000;
    this.unhealthyThreshold = options?.unhealthyThreshold ?? 3;
 
    for (const provider of providers) {
      this.status.set(provider.name, {
        provider: provider.name,
        healthy: true, // assume healthy until proven otherwise
        latencyMs: 0,
        lastChecked: new Date(),
        consecutiveFailures: 0,
      });
    }
  }
 
  recordSuccess(provider: string, latencyMs: number): void {
    const status = this.status.get(provider);
    if (status) {
      status.healthy = true;
      status.latencyMs = latencyMs;
      status.lastChecked = new Date();
      status.consecutiveFailures = 0;
    }
  }
 
  recordFailure(provider: string): void {
    const status = this.status.get(provider);
    if (status) {
      status.consecutiveFailures++;
      status.lastChecked = new Date();
      if (status.consecutiveFailures >= this.unhealthyThreshold) {
        status.healthy = false;
      }
    }
  }
 
  isHealthy(provider: string): boolean {
    return this.status.get(provider)?.healthy ?? false;
  }
 
  getHealthyProviders(): string[] {
    return Array.from(this.status.entries())
      .filter(([, status]) => status.healthy)
      .map(([name]) => name);
  }
 
  getLatency(provider: string): number {
    return this.status.get(provider)?.latencyMs ?? Infinity;
  }
}

Step 4: Build the Fallback Router

The router tries providers in order, falling back to the next one on failure:

// router.ts
class FallbackRouter {
  private providers: LLMProvider[];
  private health: ProviderHealthChecker;
  private strategy: 'priority' | 'latency' | 'cost';
 
  constructor(
    providers: LLMProvider[],
    strategy: 'priority' | 'latency' | 'cost' = 'priority'
  ) {
    this.providers = providers;
    this.health = new ProviderHealthChecker(providers);
    this.strategy = strategy;
  }
 
  async complete(request: LLMRequest): Promise<LLMResponse> {
    const orderedProviders = this.getProviderOrder();
 
    let lastError: Error | null = null;
 
    for (const provider of orderedProviders) {
      if (!this.health.isHealthy(provider.name)) {
        continue; // skip unhealthy providers
      }
 
      const adapter = ADAPTERS[provider.name];
      if (!adapter) continue;
 
      for (let attempt = 0; attempt < provider.maxRetries; attempt++) {
        try {
          const response = await Promise.race([
            adapter(request),
            timeout(provider.timeoutMs),
          ]);
 
          this.health.recordSuccess(
            provider.name,
            response.latencyMs
          );
          return response;
        } catch (error) {
          lastError = error as Error;
 
          // Rate limit errors - do not retry, move to next provider
          if (isRateLimitError(error)) {
            this.health.recordFailure(provider.name);
            break;
          }
 
          // Timeout or server error - retry
          if (attempt < provider.maxRetries - 1) {
            await delay(Math.pow(2, attempt) * 1000); // exponential backoff
          }
        }
      }
 
      this.health.recordFailure(provider.name);
    }
 
    throw new Error(
      `All providers failed. Last error: ${lastError?.message}`
    );
  }
 
  private getProviderOrder(): LLMProvider[] {
    const healthy = this.providers.filter((p) =>
      this.health.isHealthy(p.name)
    );
 
    switch (this.strategy) {
      case 'latency':
        return healthy.sort(
          (a, b) =>
            this.health.getLatency(a.name) -
            this.health.getLatency(b.name)
        );
      case 'cost':
        return healthy.sort(
          (a, b) => a.costMultiplier - b.costMultiplier
        );
      case 'priority':
      default:
        return healthy.sort((a, b) => a.priority - b.priority);
    }
  }
}
 
function timeout(ms: number): Promise<never> {
  return new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Request timed out')), ms)
  );
}
 
function delay(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}
 
function isRateLimitError(error: unknown): boolean {
  if (error instanceof Error) {
    return (
      error.message.includes('429') ||
      error.message.includes('rate limit')
    );
  }
  return false;
}

Step 5: Configure and Use

// usage.ts
const router = new FallbackRouter(
  [
    {
      name: 'anthropic',
      models: ['claude-sonnet-4-6'],
      priority: 1,
      costMultiplier: 1.0,
      maxRetries: 2,
      timeoutMs: 30_000,
    },
    {
      name: 'openai',
      models: ['gpt-4o'],
      priority: 2,
      costMultiplier: 0.8,
      maxRetries: 2,
      timeoutMs: 30_000,
    },
  ],
  'priority' // try anthropic first, fallback to openai
);
 
// Use it exactly like a single provider
const response = await router.complete({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain dependency injection.' },
  ],
  maxTokens: 500,
});
 
console.log(`Response from ${response.provider} (${response.model})`);
console.log(`Latency: ${response.latencyMs}ms`);
console.log(response.content);

Model Mapping

Different providers have different model strengths. Map equivalent capability tiers:

const MODEL_EQUIVALENTS: Record<string, Record<string, string>> = {
  // Tier: Best reasoning
  reasoning: {
    anthropic: 'claude-opus-4-6',
    openai: 'gpt-4.1',
  },
  // Tier: Fast and capable
  standard: {
    anthropic: 'claude-sonnet-4-6',
    openai: 'gpt-4o',
  },
  // Tier: Fast and cheap
  fast: {
    anthropic: 'claude-haiku-4-5',
    openai: 'gpt-4o-mini',
  },
};
 
// When falling back, use the equivalent model tier
function getModelForProvider(
  tier: string,
  provider: string
): string {
  return MODEL_EQUIVALENTS[tier]?.[provider] ?? 'gpt-4o-mini';
}

Routing Strategies Compared

StrategyBest ForTrade-off
PriorityConsistent behavior, preferred model qualityMay not use cheapest or fastest option
LatencyUser-facing features needing fast responsesMay route to more expensive providers
CostBackground processing, batch jobsMay sacrifice speed for savings

For most production applications, we recommend priority routing with cost as the tiebreaker. You get consistent model behavior with automatic fallback when things break.

The Takeaway

Single-provider AI is a single point of failure. A fallback router with health checks takes a few hundred lines of code and gives you near-perfect uptime. The pattern is simple: define adapters, track health, route in order, fall back on failure.

If you want this managed, Transactional's AI Gateway provides multi-provider routing, automatic fallbacks, and real-time health monitoring out of the box. But the router above will get you to 99.99% availability on its own.

Sources & References

  1. [1]OpenAI API ReferenceOpenAI
  2. [2]Anthropic API ReferenceAnthropic
  3. [3]Circuit Breaker PatternMicrosoft
  4. [4]Google Vertex AI Gemini APIGoogle Cloud

Written by

Transactional Team

Share
Tags:
tutorial
ai
reliability

YOUR AGENTS DESERVE
REAL INFRASTRUCTURE.

START BUILDING AGENTS THAT DO REAL WORK.

Deploy Your First Agent