Tutorials

10 min read

Your AI Should Never Go Down. Here is How to Set Up Fallback Routing.

Configure fallback chains across LLM providers so your AI features stay up when any single provider goes down. Includes TypeScript implementation with health checks and cost-aware routing.

Transactional Team

Jan 26, 2026

10 min read

Your AI Should Never Go Down. Here is How to Set Up Fallback Routing.

On March 12, 2024, the OpenAI API went down for 3 hours. On January 25, 2025, Anthropic had a 90-minute outage. Google's Vertex AI has had multiple incidents. Every major LLM provider has gone down, and every one of them will go down again.

If your production AI feature depends on a single provider, it will go down with them. This is a common failure pattern across the industry. The fix is straightforward: fallback routing.

What You Will Learn

Why single-provider AI is fragile
How to implement a fallback router with health checks
Latency-based and cost-aware routing strategies
A complete TypeScript implementation you can adapt

Why Fallback Routing Matters

99.99%Uptime with 2-Provider Fallback

43 hrsAnnual Downtime at 99.5% (Single Provider)

94%Failed Requests Auto-Recovered

3Routing Strategies (Priority, Latency, Cost)

The Problem with Single-Provider AI

When you hardcode openai.chat.completions.create() everywhere, you have a single point of failure. An outage, rate limit, or even a slow response from that one provider means your entire AI feature is down.

The numbers are not great. Based on public status page data, each major provider has 99.5% to 99.9% uptime. That sounds high until you calculate the downtime: 99.9% uptime is 8.7 hours per year. 99.5% is 43 hours. If you chain two independent providers with 99.5% uptime each, your combined availability jumps to 99.9975%.

Step 1: Define the Provider Interface

First, create a uniform interface across providers. Every LLM provider has a slightly different API, but they all do the same thing: take messages in, return a completion out.

// types.ts
interface LLMProvider {
  name: string;
  models: string[];
  priority: number; // lower = preferred
  costMultiplier: number; // 1.0 = baseline
  maxRetries: number;
  timeoutMs: number;
}
 
interface LLMRequest {
  messages: { role: string; content: string }[];
  model?: string; // optional - router picks if not specified
  maxTokens?: number;
  temperature?: number;
}
 
interface LLMResponse {
  content: string;
  model: string;
  provider: string;
  inputTokens: number;
  outputTokens: number;
  latencyMs: number;
}
 
interface HealthStatus {
  provider: string;
  healthy: boolean;
  latencyMs: number;
  lastChecked: Date;
  consecutiveFailures: number;
}

Step 2: Implement Provider Adapters

Each provider gets a thin adapter that normalizes the API:

// adapters.ts
import OpenAI from 'openai';
import Anthropic from '@anthropic-ai/sdk';
 
async function callOpenAI(request: LLMRequest): Promise<LLMResponse> {
  const client = new OpenAI();
  const start = Date.now();
 
  const response = await client.chat.completions.create({
    model: request.model ?? 'gpt-4o',
    messages: request.messages as any,
    max_tokens: request.maxTokens ?? 1000,
    temperature: request.temperature ?? 0.7,
  });
 
  return {
    content: response.choices[0].message.content ?? '',
    model: response.model,
    provider: 'openai',
    inputTokens: response.usage?.prompt_tokens ?? 0,
    outputTokens: response.usage?.completion_tokens ?? 0,
    latencyMs: Date.now() - start,
  };
}
 
async function callAnthropic(
  request: LLMRequest
): Promise<LLMResponse> {
  const client = new Anthropic();
  const start = Date.now();
 
  // Anthropic uses a separate system parameter
  const systemMessage = request.messages.find(
    (m) => m.role === 'system'
  );
  const otherMessages = request.messages.filter(
    (m) => m.role !== 'system'
  );
 
  const response = await client.messages.create({
    model: request.model ?? 'claude-sonnet-4-6',
    max_tokens: request.maxTokens ?? 1000,
    system: systemMessage?.content ?? '',
    messages: otherMessages as any,
  });
 
  const textBlock = response.content.find(
    (block) => block.type === 'text'
  );
 
  return {
    content: textBlock?.text ?? '',
    model: response.model,
    provider: 'anthropic',
    inputTokens: response.usage.input_tokens,
    outputTokens: response.usage.output_tokens,
    latencyMs: Date.now() - start,
  };
}
 
const ADAPTERS: Record<
  string,
  (request: LLMRequest) => Promise<LLMResponse>
> = {
  openai: callOpenAI,
  anthropic: callAnthropic,
};

Step 3: Build the Health Checker

You need to know which providers are healthy before routing to them. A background health check runs periodically and tracks provider status:

// health.ts
class ProviderHealthChecker {
  private status: Map<string, HealthStatus> = new Map();
  private checkIntervalMs: number;
  private unhealthyThreshold: number;
 
  constructor(
    providers: LLMProvider[],
    options?: {
      checkIntervalMs?: number;
      unhealthyThreshold?: number;
    }
  ) {
    this.checkIntervalMs = options?.checkIntervalMs ?? 30_000;
    this.unhealthyThreshold = options?.unhealthyThreshold ?? 3;
 
    for (const provider of providers) {
      this.status.set(provider.name, {
        provider: provider.name,
        healthy: true, // assume healthy until proven otherwise
        latencyMs: 0,
        lastChecked: new Date(),
        consecutiveFailures: 0,
      });
    }
  }
 
  recordSuccess(provider: string, latencyMs: number): void {
    const status = this.status.get(provider);
    if (status) {
      status.healthy = true;
      status.latencyMs = latencyMs;
      status.lastChecked = new Date();
      status.consecutiveFailures = 0;
    }
  }
 
  recordFailure(provider: string): void {
    const status = this.status.get(provider);
    if (status) {
      status.consecutiveFailures++;
      status.lastChecked = new Date();
      if (status.consecutiveFailures >= this.unhealthyThreshold) {
        status.healthy = false;
      }
    }
  }
 
  isHealthy(provider: string): boolean {
    return this.status.get(provider)?.healthy ?? false;
  }
 
  getHealthyProviders(): string[] {
    return Array.from(this.status.entries())
      .filter(([, status]) => status.healthy)
      .map(([name]) => name);
  }
 
  getLatency(provider: string): number {
    return this.status.get(provider)?.latencyMs ?? Infinity;
  }
}

Step 4: Build the Fallback Router

The router tries providers in order, falling back to the next one on failure:

// router.ts
class FallbackRouter {
  private providers: LLMProvider[];
  private health: ProviderHealthChecker;
  private strategy: 'priority' | 'latency' | 'cost';
 
  constructor(
    providers: LLMProvider[],
    strategy: 'priority' | 'latency' | 'cost' = 'priority'
  ) {
    this.providers = providers;
    this.health = new ProviderHealthChecker(providers);
    this.strategy = strategy;
  }
 
  async complete(request: LLMRequest): Promise<LLMResponse> {
    const orderedProviders = this.getProviderOrder();
 
    let lastError: Error | null = null;
 
    for (const provider of orderedProviders) {
      if (!this.health.isHealthy(provider.name)) {
        continue; // skip unhealthy providers
      }
 
      const adapter = ADAPTERS[provider.name];
      if (!adapter) continue;
 
      for (let attempt = 0; attempt < provider.maxRetries; attempt++) {
        try {
          const response = await Promise.race([
            adapter(request),
            timeout(provider.timeoutMs),
          ]);
 
          this.health.recordSuccess(
            provider.name,
            response.latencyMs
          );
          return response;
        } catch (error) {
          lastError = error as Error;
 
          // Rate limit errors - do not retry, move to next provider
          if (isRateLimitError(error)) {
            this.health.recordFailure(provider.name);
            break;
          }
 
          // Timeout or server error - retry
          if (attempt < provider.maxRetries - 1) {
            await delay(Math.pow(2, attempt) * 1000); // exponential backoff
          }
        }
      }
 
      this.health.recordFailure(provider.name);
    }
 
    throw new Error(
      `All providers failed. Last error: ${lastError?.message}`
    );
  }
 
  private getProviderOrder(): LLMProvider[] {
    const healthy = this.providers.filter((p) =>
      this.health.isHealthy(p.name)
    );
 
    switch (this.strategy) {
      case 'latency':
        return healthy.sort(
          (a, b) =>
            this.health.getLatency(a.name) -
            this.health.getLatency(b.name)
        );
      case 'cost':
        return healthy.sort(
          (a, b) => a.costMultiplier - b.costMultiplier
        );
      case 'priority':
      default:
        return healthy.sort((a, b) => a.priority - b.priority);
    }
  }
}
 
function timeout(ms: number): Promise<never> {
  return new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Request timed out')), ms)
  );
}
 
function delay(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}
 
function isRateLimitError(error: unknown): boolean {
  if (error instanceof Error) {
    return (
      error.message.includes('429') ||
      error.message.includes('rate limit')
    );
  }
  return false;
}

Step 5: Configure and Use

// usage.ts
const router = new FallbackRouter(
  [
    {
      name: 'anthropic',
      models: ['claude-sonnet-4-6'],
      priority: 1,
      costMultiplier: 1.0,
      maxRetries: 2,
      timeoutMs: 30_000,
    },
    {
      name: 'openai',
      models: ['gpt-4o'],
      priority: 2,
      costMultiplier: 0.8,
      maxRetries: 2,
      timeoutMs: 30_000,
    },
  ],
  'priority' // try anthropic first, fallback to openai
);
 
// Use it exactly like a single provider
const response = await router.complete({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain dependency injection.' },
  ],
  maxTokens: 500,
});
 
console.log(`Response from ${response.provider} (${response.model})`);
console.log(`Latency: ${response.latencyMs}ms`);
console.log(response.content);

Model Mapping

Different providers have different model strengths. Map equivalent capability tiers:

const MODEL_EQUIVALENTS: Record<string, Record<string, string>> = {
  // Tier: Best reasoning
  reasoning: {
    anthropic: 'claude-opus-4-6',
    openai: 'gpt-4.1',
  },
  // Tier: Fast and capable
  standard: {
    anthropic: 'claude-sonnet-4-6',
    openai: 'gpt-4o',
  },
  // Tier: Fast and cheap
  fast: {
    anthropic: 'claude-haiku-4-5',
    openai: 'gpt-4o-mini',
  },
};
 
// When falling back, use the equivalent model tier
function getModelForProvider(
  tier: string,
  provider: string
): string {
  return MODEL_EQUIVALENTS[tier]?.[provider] ?? 'gpt-4o-mini';
}

Routing Strategies Compared

Strategy	Best For	Trade-off
Priority	Consistent behavior, preferred model quality	May not use cheapest or fastest option
Latency	User-facing features needing fast responses	May route to more expensive providers
Cost	Background processing, batch jobs	May sacrifice speed for savings

For most production applications, we recommend priority routing with cost as the tiebreaker. You get consistent model behavior with automatic fallback when things break.

The Takeaway

Single-provider AI is a single point of failure. A fallback router with health checks takes a few hundred lines of code and gives you near-perfect uptime. The pattern is simple: define adapters, track health, route in order, fall back on failure.

If you want this managed, Transactional's AI Gateway provides multi-provider routing, automatic fallbacks, and real-time health monitoring out of the box. But the router above will get you to 99.99% availability on its own.

Sources & References

[1]OpenAI API Reference — OpenAI
[2]Anthropic API Reference — Anthropic
[3]Circuit Breaker Pattern — Microsoft
[4]Google Vertex AI Gemini API — Google Cloud

Written by

Transactional Team

Tags:

tutorial

reliability

Tutorials

Webhooks Will Fail. Here are the Retry and Idempotency Patterns That Save You.

Practical patterns for reliable webhook delivery: exponential backoff with jitter, idempotency keys, dead letter queues, and signature verification. TypeScript code included.

Transactional TeamMar 7, 2026

Industry Insights

We Evaluated 12 LLM Observability Tools. Most of Them Do Not Matter.

A practical evaluation of LLM observability tools across tracing, cost tracking, quality monitoring, and prompt management. What matters, what is marketing, and what to actually look for.

Transactional TeamMar 5, 2026

Case Studies

An Enterprise Team Was Shipping Hallucinations to Users. Traces Showed Them Where.

How an enterprise company with AI-powered customer support reduced hallucination rates from 8% to 0.3% and cut AI issue MTTR from days to minutes using LLM observability and trace-level analysis.

Transactional TeamMar 4, 2026

YOUR AGENTS DESERVE
REAL INFRASTRUCTURE.

START BUILDING AGENTS THAT DO REAL WORK.

Deploy Your First Agent