4,000 Lines of Glue Code

A typical production stack talks to multiple LLM providers. OpenAI for GPT-4o. Anthropic for Claude. Google for Gemini. Mistral for cheap classification tasks. Each provider has its own SDK, its own request format, its own error codes, and its own retry logic.

The glue code adds up fast. Teams commonly report thousands of lines spread across dozens of files. Multiple streaming implementations. Multiple ways to handle rate limits. Multiple billing dashboards to check every morning.

Every time a new model is added, it means another 200 lines of provider-specific integration code. Every time a provider has an outage, teams scramble to manually switch traffic. Every time finance asks "how much are we spending on AI?", the answer takes hours to compile.

AI Gateway was built to make all of that disappear.

AI Gateway at a Glance

13+LLM Providers Supported

3-8msAdded Latency per Request

94%Failed Requests Auto-Recovered

60-80%Cache Hit Rate (Classification)

What AI Gateway Does

AI Gateway is a unified API that sits between your application and every LLM provider. You send requests in one format. The gateway handles routing, translation, failover, caching, rate limiting, and cost tracking.

One endpoint. One SDK. One dashboard.

Your Application
      |
      v
  AI Gateway (single endpoint)
      |
      ├── Schema Normalization
      ├── Intelligent Routing
      ├── Semantic Caching
      ├── Rate Limiting
      ├── Cost Tracking
      ├── Prompt Management
      |
      v
  ┌─────────┬───────────┬────────┬─────────┬──────────┐
  │ OpenAI  │ Anthropic │ Google │ Mistral │ 9 more   │
  └─────────┴───────────┴────────┴─────────┴──────────┘

Unified API

Every request uses the same format regardless of the target provider. We chose the OpenAI chat completion schema as the canonical format because it is the most widely adopted. If you have used the OpenAI SDK, you already know our API.

import Transactional from "@transactional/sdk";
 
const client = new Transactional({ apiKey: "tx_live_..." });
 
// Route to any provider with a single API call
const response = await client.ai.chat({
  model: "anthropic/claude-sonnet-4-20250514",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain connection pooling in PostgreSQL." }
  ],
  max_tokens: 1024,
  temperature: 0.7
});
 
// Same format regardless of provider
console.log(response.choices[0].message.content);
console.log(response.usage.total_tokens);
console.log(response.cost_usd); // Real-time cost

Switch from OpenAI to Claude by changing one string. No SDK swap. No schema rewrite. No deployment.

Automatic Failover

Define fallback chains. When a provider returns a 5xx, hits a rate limit, or times out, the gateway automatically retries with the next provider. No code changes required.

// Configure fallback chains in the dashboard or via API
const response = await client.ai.chat({
  model: "openai/gpt-4o",
  messages: [...],
  fallbacks: [
    "anthropic/claude-sonnet-4-20250514",
    "google/gemini-2.0-flash"
  ],
  timeout_ms: 10000 // Per-provider timeout
});

In production, 94% of failed requests recover automatically on a secondary provider. Your users never notice.

Cost Tracking

Every request logs its token usage and cost in real-time. We maintain per-model pricing tables updated weekly. No more scraping four billing dashboards.

The cost dashboard breaks down spending by model, endpoint, API key, and time period. Set budget alerts. Get notified before a runaway pipeline burns through your monthly allocation.

Semantic Caching

Exact-match caching catches identical requests. Semantic caching goes further -- it uses embeddings to identify semantically similar requests and serves cached responses.

"What is the capital of France?" and "What's France's capital city?" hit the same cache entry. Classification pipelines see 60-80% cache hit rates. Chat applications see 5-15%. Either way, it is free money.

Cache is scoped per API key. One customer's data never leaks into another's.

Rate Limiting

Set per-key, per-model, or per-endpoint rate limits. The gateway enforces them before requests reach the provider, so you never waste tokens on requests that would have been throttled anyway.

Prompt Management

Version your prompts. A/B test them. Roll back to a previous version in one click. Track which prompt version produced which outputs. Stop storing prompts in application code where they get lost in git history.

Architecture Overview

The gateway adds 3-8ms of latency per request. That overhead is almost entirely serialization -- translating between your canonical format and the provider's format.

Streaming is fully supported. We normalize every provider's streaming protocol into OpenAI-compatible Server-Sent Events. Anthropic's typed events, Google's aggregated chunks, Mistral's format -- all normalized into a consistent stream your client code can consume with one parser.

Error codes are normalized too. OpenAI's 429, Anthropic's overloaded, Google's 503 -- all mapped to a consistent rate_limited error code with the same response structure.

13 Providers, One Integration

Currently supported providers:

OpenAI (GPT-4o, GPT-4o-mini, o1, o3)
Anthropic (Claude Opus, Sonnet, Haiku)
Google (Gemini 2.0 Flash, Pro)
Mistral (Large, Small, Codestral)
DeepSeek (V3, R1)
Meta (Llama 3.3)
Cohere (Command R+)
And 6 more, with new providers added monthly

Getting Started in 5 Minutes

Create a Transactional account and grab your API key.
Add your provider API keys in the AI Gateway settings.
Install the SDK: npm install @transactional/sdk
Replace your provider-specific calls with the unified API.
Watch requests flow through the dashboard.

That is it. No infrastructure to deploy. No proxy to maintain. No YAML to configure.

The Real Win

The technical features matter, but the real win is operational. When a provider has an outage, teams using AI Gateway do not need to notice. Failover kicks in, traffic shifts to the next provider, and the dashboard shows exactly what happened.

When finance asks about AI spend, the answer takes 10 seconds instead of hours. When a developer wants to try a new model, they change one string instead of writing a new integration.

Thousands of lines of glue code reduced to zero. That is what AI Gateway does.

Check out the AI Gateway feature page to see it in action, or dive into the documentation to start integrating.

We Built One API for All LLM Providers. Introducing AI Gateway.

4,000 Lines of Glue Code

What AI Gateway Does

Unified API

Automatic Failover

Cost Tracking

Semantic Caching

Rate Limiting

Prompt Management

Architecture Overview

13 Providers, One Integration

Getting Started in 5 Minutes

The Real Win

Sources & References

Related Posts

We Evaluated 12 LLM Observability Tools. Most of Them Do Not Matter.

An Enterprise Team Was Shipping Hallucinations to Users. Traces Showed Them Where.

Your AI Agent Will Crash in Production. Plan for It.

YOUR AGENTS DESERVE
REAL INFRASTRUCTURE.

We Built One API for All LLM Providers. Introducing AI Gateway.

4,000 Lines of Glue Code

What AI Gateway Does

Unified API

Automatic Failover

Cost Tracking

Semantic Caching

Rate Limiting

Prompt Management

Architecture Overview

13 Providers, One Integration

Getting Started in 5 Minutes

The Real Win

Sources & References

Related Posts

We Evaluated 12 LLM Observability Tools. Most of Them Do Not Matter.

An Enterprise Team Was Shipping Hallucinations to Users. Traces Showed Them Where.

Your AI Agent Will Crash in Production. Plan for It.

YOUR AGENTS DESERVEREAL INFRASTRUCTURE.

YOUR AGENTS DESERVE
REAL INFRASTRUCTURE.