Transactional

Cost Tracking

Monitor and manage your LLM spending with real-time cost analytics.

Overview

AI Gateway tracks the cost of every LLM request in real-time. View spending by model, user, time period, and more in the dashboard.

How Costs Are Calculated

Cost is calculated based on actual token usage:

Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

Example

For a GPT-4o request:

  • Input: 500 tokens × $2.50/1M = $0.00125
  • Output: 200 tokens × $10.00/1M = $0.002
  • Total: $0.00325

Viewing Costs

Dashboard Overview

Navigate to AI Gateway > Analytics to see:

  • Total Spend: Current month's total
  • Cost by Model: Breakdown by model
  • Cost by Day: Daily spending trend
  • Top Users: Highest spending users

Per-Request Costs

Every request includes cost in the response:

{
  "id": "chatcmpl-123",
  "usage": {
    "prompt_tokens": 500,
    "completion_tokens": 200,
    "total_tokens": 700
  },
  "x_cost": {
    "input_cost": 0.00125,
    "output_cost": 0.002,
    "total_cost": 0.00325,
    "currency": "USD"
  }
}

Response Headers

X-Cost-Input: 0.00125
X-Cost-Output: 0.002
X-Cost-Total: 0.00325

Cost Breakdown

By Model

ModelInput ($/1M)Output ($/1M)Monthly Spend
gpt-4o$2.50$10.00$1,234.56
gpt-4o-mini$0.15$0.60$456.78
claude-3-5-sonnet$3.00$15.00$789.00

By Time Period

View costs over different periods:

  • Today
  • This week
  • This month
  • Last 30 days
  • Custom range

By User

Track per-user spending when you include user in requests:

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
  user: 'user-123',  // Track by user
});

Budget Alerts

Set up alerts when spending exceeds thresholds:

Configuring Alerts

  1. Go to Settings > Alerts
  2. Click Add Alert
  3. Configure:
    • Type: Budget Alert
    • Threshold: $100 (daily), $1000 (monthly)
    • Channel: Email, Slack, Webhook

Alert Types

AlertTrigger
Daily SpendExceeds daily budget
Monthly SpendExceeds monthly budget
Cost Spike50%+ increase from average
Model CostSpecific model exceeds threshold

Cost Optimization

1. Use Caching

Enable caching to reduce duplicate requests:

MetricBefore CacheAfter Cache
Requests100,000100,000
Cache Hit Rate0%60%
Billable Requests100,00040,000
Cost Savings-60%

2. Choose the Right Model

Match model capability to task requirements:

TaskRecommended ModelCost/1M tokens
Simple Q&Agpt-3.5-turbo$0.50
Complex reasoninggpt-4o$2.50
Quick responsesclaude-3-haiku$0.25

3. Optimize Prompts

Reduce token usage with efficient prompts:

// Verbose (150 tokens)
const verbose = `Please analyze the following text and provide a
comprehensive summary that covers all the main points, themes,
and important details. Make sure to include...`;
 
// Concise (20 tokens)
const concise = `Summarize the key points of this text:`;

4. Set Max Tokens

Limit output length:

await openai.chat.completions.create({
  model: 'gpt-4o',
  max_tokens: 500,  // Limit output
  messages: [...],
});

5. Monitor Expensive Requests

Identify and optimize high-cost requests:

  1. Go to Analytics > Requests
  2. Sort by Cost (descending)
  3. Analyze expensive requests
  4. Optimize prompts or model selection

Export Cost Data

Export cost data for accounting:

CSV Export

  1. Go to Analytics
  2. Click Export > CSV
  3. Select date range
  4. Download file

API Export

curl https://api.transactional.dev/ai-gateway/costs \
  -H "Authorization: Bearer $GATEWAY_API_KEY" \
  -G --data-urlencode "from=2024-01-01" \
        --data-urlencode "to=2024-01-31" \
        --data-urlencode "format=csv"

Cost Reports

Weekly Summary

Receive weekly cost summaries via email:

  1. Go to Settings > Notifications
  2. Enable Weekly Cost Summary
  3. Select recipients

Custom Reports

Create custom reports:

  1. Go to Analytics > Reports
  2. Click New Report
  3. Configure:
    • Metrics (cost, tokens, requests)
    • Grouping (model, user, day)
    • Filters (date range, models)
  4. Schedule (daily, weekly, monthly)

Cost by Provider

When using fallback, costs vary by provider:

Request PathCost
OpenAI (primary)$0.0025
Anthropic (fallback)$0.0030
Google (fallback 2)$0.0015

Track fallback costs separately in analytics.

Billing

How Billing Works

  • AI Gateway tracks costs in real-time
  • No markup on provider costs
  • Pay only for actual usage
  • Monthly invoice with detailed breakdown

Invoice Details

Your invoice includes:

  • Total requests
  • Token usage by model
  • Cost by provider
  • Cache savings

Next Steps