We Built One API for All LLM Providers. Introducing AI Gateway.
Meet AI Gateway: a single unified API that routes to 13+ LLM providers with automatic failover, cost tracking, semantic caching, rate limiting, and prompt management.
Transactional Team
Jan 10, 2026
//
7 min read
Share
4,000 Lines of Glue Code
A typical production stack talks to multiple LLM providers. OpenAI for GPT-4o. Anthropic for Claude. Google for Gemini. Mistral for cheap classification tasks. Each provider has its own SDK, its own request format, its own error codes, and its own retry logic.
The glue code adds up fast. Teams commonly report thousands of lines spread across dozens of files. Multiple streaming implementations. Multiple ways to handle rate limits. Multiple billing dashboards to check every morning.
Every time a new model is added, it means another 200 lines of provider-specific integration code. Every time a provider has an outage, teams scramble to manually switch traffic. Every time finance asks "how much are we spending on AI?", the answer takes hours to compile.
AI Gateway was built to make all of that disappear.
AI Gateway at a Glance
13+LLM Providers Supported
3-8msAdded Latency per Request
94%Failed Requests Auto-Recovered
60-80%Cache Hit Rate (Classification)
What AI Gateway Does
AI Gateway is a unified API that sits between your application and every LLM provider. You send requests in one format. The gateway handles routing, translation, failover, caching, rate limiting, and cost tracking.
One endpoint. One SDK. One dashboard.
Your Application
|
v
AI Gateway (single endpoint)
|
├── Schema Normalization
├── Intelligent Routing
├── Semantic Caching
├── Rate Limiting
├── Cost Tracking
├── Prompt Management
|
v
┌─────────┬───────────┬────────┬─────────┬──────────┐
│ OpenAI │ Anthropic │ Google │ Mistral │ 9 more │
└─────────┴───────────┴────────┴─────────┴──────────┘
Unified API
Every request uses the same format regardless of the target provider. We chose the OpenAI chat completion schema as the canonical format because it is the most widely adopted. If you have used the OpenAI SDK, you already know our API.
import Transactional from "@transactional/sdk";const client = new Transactional({ apiKey: "tx_live_..." });// Route to any provider with a single API callconst response = await client.ai.chat({ model: "anthropic/claude-sonnet-4-20250514", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Explain connection pooling in PostgreSQL." } ], max_tokens: 1024, temperature: 0.7});// Same format regardless of providerconsole.log(response.choices[0].message.content);console.log(response.usage.total_tokens);console.log(response.cost_usd); // Real-time cost
Switch from OpenAI to Claude by changing one string. No SDK swap. No schema rewrite. No deployment.
Automatic Failover
Define fallback chains. When a provider returns a 5xx, hits a rate limit, or times out, the gateway automatically retries with the next provider. No code changes required.
// Configure fallback chains in the dashboard or via APIconst response = await client.ai.chat({ model: "openai/gpt-4o", messages: [...], fallbacks: [ "anthropic/claude-sonnet-4-20250514", "google/gemini-2.0-flash" ], timeout_ms: 10000 // Per-provider timeout});
In production, 94% of failed requests recover automatically on a secondary provider. Your users never notice.
Cost Tracking
Every request logs its token usage and cost in real-time. We maintain per-model pricing tables updated weekly. No more scraping four billing dashboards.
The cost dashboard breaks down spending by model, endpoint, API key, and time period. Set budget alerts. Get notified before a runaway pipeline burns through your monthly allocation.
Semantic Caching
Exact-match caching catches identical requests. Semantic caching goes further -- it uses embeddings to identify semantically similar requests and serves cached responses.
"What is the capital of France?" and "What's France's capital city?" hit the same cache entry. Classification pipelines see 60-80% cache hit rates. Chat applications see 5-15%. Either way, it is free money.
Cache is scoped per API key. One customer's data never leaks into another's.
Rate Limiting
Set per-key, per-model, or per-endpoint rate limits. The gateway enforces them before requests reach the provider, so you never waste tokens on requests that would have been throttled anyway.
Prompt Management
Version your prompts. A/B test them. Roll back to a previous version in one click. Track which prompt version produced which outputs. Stop storing prompts in application code where they get lost in git history.
Architecture Overview
The gateway adds 3-8ms of latency per request. That overhead is almost entirely serialization -- translating between your canonical format and the provider's format.
Streaming is fully supported. We normalize every provider's streaming protocol into OpenAI-compatible Server-Sent Events. Anthropic's typed events, Google's aggregated chunks, Mistral's format -- all normalized into a consistent stream your client code can consume with one parser.
Error codes are normalized too. OpenAI's 429, Anthropic's overloaded, Google's 503 -- all mapped to a consistent rate_limited error code with the same response structure.
13 Providers, One Integration
Currently supported providers:
OpenAI (GPT-4o, GPT-4o-mini, o1, o3)
Anthropic (Claude Opus, Sonnet, Haiku)
Google (Gemini 2.0 Flash, Pro)
Mistral (Large, Small, Codestral)
DeepSeek (V3, R1)
Meta (Llama 3.3)
Cohere (Command R+)
And 6 more, with new providers added monthly
Getting Started in 5 Minutes
Create a Transactional account and grab your API key.
Add your provider API keys in the AI Gateway settings.
Install the SDK: npm install @transactional/sdk
Replace your provider-specific calls with the unified API.
Watch requests flow through the dashboard.
That is it. No infrastructure to deploy. No proxy to maintain. No YAML to configure.
The Real Win
The technical features matter, but the real win is operational. When a provider has an outage, teams using AI Gateway do not need to notice. Failover kicks in, traffic shifts to the next provider, and the dashboard shows exactly what happened.
When finance asks about AI spend, the answer takes 10 seconds instead of hours. When a developer wants to try a new model, they change one string instead of writing a new integration.
Thousands of lines of glue code reduced to zero. That is what AI Gateway does.