Cost Tracking
Monitor and manage your LLM spending with real-time cost analytics.
Overview
AI Gateway tracks the cost of every LLM request in real-time. View spending by model, user, time period, and more in the dashboard.
How Costs Are Calculated
Cost is calculated based on actual token usage:
Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
Example
For a GPT-4o request:
- Input: 500 tokens × $2.50/1M = $0.00125
- Output: 200 tokens × $10.00/1M = $0.002
- Total: $0.00325
Viewing Costs
Dashboard Overview
Navigate to AI Gateway > Analytics to see:
- Total Spend: Current month's total
- Cost by Model: Breakdown by model
- Cost by Day: Daily spending trend
- Top Users: Highest spending users
Per-Request Costs
Every request includes cost in the response:
{
"id": "chatcmpl-123",
"usage": {
"prompt_tokens": 500,
"completion_tokens": 200,
"total_tokens": 700
},
"x_cost": {
"input_cost": 0.00125,
"output_cost": 0.002,
"total_cost": 0.00325,
"currency": "USD"
}
}Response Headers
X-Cost-Input: 0.00125
X-Cost-Output: 0.002
X-Cost-Total: 0.00325
Cost Breakdown
By Model
| Model | Input ($/1M) | Output ($/1M) | Monthly Spend |
|---|---|---|---|
| gpt-4o | $2.50 | $10.00 | $1,234.56 |
| gpt-4o-mini | $0.15 | $0.60 | $456.78 |
| claude-3-5-sonnet | $3.00 | $15.00 | $789.00 |
By Time Period
View costs over different periods:
- Today
- This week
- This month
- Last 30 days
- Custom range
By User
Track per-user spending when you include user in requests:
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [...],
user: 'user-123', // Track by user
});Budget Alerts
Set up alerts when spending exceeds thresholds:
Configuring Alerts
- Go to Settings > Alerts
- Click Add Alert
- Configure:
- Type: Budget Alert
- Threshold: $100 (daily), $1000 (monthly)
- Channel: Email, Slack, Webhook
Alert Types
| Alert | Trigger |
|---|---|
| Daily Spend | Exceeds daily budget |
| Monthly Spend | Exceeds monthly budget |
| Cost Spike | 50%+ increase from average |
| Model Cost | Specific model exceeds threshold |
Cost Optimization
1. Use Caching
Enable caching to reduce duplicate requests:
| Metric | Before Cache | After Cache |
|---|---|---|
| Requests | 100,000 | 100,000 |
| Cache Hit Rate | 0% | 60% |
| Billable Requests | 100,000 | 40,000 |
| Cost Savings | - | 60% |
2. Choose the Right Model
Match model capability to task requirements:
| Task | Recommended Model | Cost/1M tokens |
|---|---|---|
| Simple Q&A | gpt-3.5-turbo | $0.50 |
| Complex reasoning | gpt-4o | $2.50 |
| Quick responses | claude-3-haiku | $0.25 |
3. Optimize Prompts
Reduce token usage with efficient prompts:
// Verbose (150 tokens)
const verbose = `Please analyze the following text and provide a
comprehensive summary that covers all the main points, themes,
and important details. Make sure to include...`;
// Concise (20 tokens)
const concise = `Summarize the key points of this text:`;4. Set Max Tokens
Limit output length:
await openai.chat.completions.create({
model: 'gpt-4o',
max_tokens: 500, // Limit output
messages: [...],
});5. Monitor Expensive Requests
Identify and optimize high-cost requests:
- Go to Analytics > Requests
- Sort by Cost (descending)
- Analyze expensive requests
- Optimize prompts or model selection
Export Cost Data
Export cost data for accounting:
CSV Export
- Go to Analytics
- Click Export > CSV
- Select date range
- Download file
API Export
curl https://api.transactional.dev/ai-gateway/costs \
-H "Authorization: Bearer $GATEWAY_API_KEY" \
-G --data-urlencode "from=2024-01-01" \
--data-urlencode "to=2024-01-31" \
--data-urlencode "format=csv"Cost Reports
Weekly Summary
Receive weekly cost summaries via email:
- Go to Settings > Notifications
- Enable Weekly Cost Summary
- Select recipients
Custom Reports
Create custom reports:
- Go to Analytics > Reports
- Click New Report
- Configure:
- Metrics (cost, tokens, requests)
- Grouping (model, user, day)
- Filters (date range, models)
- Schedule (daily, weekly, monthly)
Cost by Provider
When using fallback, costs vary by provider:
| Request Path | Cost |
|---|---|
| OpenAI (primary) | $0.0025 |
| Anthropic (fallback) | $0.0030 |
| Google (fallback 2) | $0.0015 |
Track fallback costs separately in analytics.
Billing
How Billing Works
- AI Gateway tracks costs in real-time
- No markup on provider costs
- Pay only for actual usage
- Monthly invoice with detailed breakdown
Invoice Details
Your invoice includes:
- Total requests
- Token usage by model
- Cost by provider
- Cache savings
Next Steps
- Caching - Reduce costs with caching
- Rate Limiting - Control usage
- Analytics Dashboard - View your costs
On This Page
- Overview
- How Costs Are Calculated
- Example
- Viewing Costs
- Dashboard Overview
- Per-Request Costs
- Response Headers
- Cost Breakdown
- By Model
- By Time Period
- By User
- Budget Alerts
- Configuring Alerts
- Alert Types
- Cost Optimization
- 1. Use Caching
- 2. Choose the Right Model
- 3. Optimize Prompts
- 4. Set Max Tokens
- 5. Monitor Expensive Requests
- Export Cost Data
- CSV Export
- API Export
- Cost Reports
- Weekly Summary
- Custom Reports
- Cost by Provider
- Billing
- How Billing Works
- Invoice Details
- Next Steps