Cost Analysis
Analyzing and optimizing LLM costs with Observability.
Overview
Observability provides detailed cost tracking for every LLM call. Understand where your money goes and find optimization opportunities.
Viewing Costs
Dashboard Overview
Navigate to Observability > Analytics to see:
- Total Cost: Sum for selected period
- Cost Trend: Daily/hourly breakdown
- Cost by Model: Model comparison
- Top Traces: Highest cost traces
Cost Breakdown
View costs by different dimensions:
By Model
| Model | Requests | Input Tokens | Output Tokens | Cost |
|---|---|---|---|---|
| gpt-4o | 10,000 | 2M | 1M | $150 |
| claude-3-5-sonnet | 5,000 | 1M | 500K | $55 |
| gpt-3.5-turbo | 20,000 | 1.5M | 750K | $22 |
By User
| User | Requests | Tokens | Cost |
|---|---|---|---|
| user-123 | 500 | 250K | $5 |
| user-456 | 300 | 150K | $3 |
| user-789 | 200 | 100K | $2 |
By Trace Name
| Trace | Requests | Avg Tokens | Total Cost |
|---|---|---|---|
| chat-completion | 15,000 | 800 | $120 |
| summarize | 5,000 | 2,500 | $62 |
| code-review | 2,000 | 3,000 | $45 |
Cost Calculation
Per-Request Cost
Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
Example for gpt-4o:
- Input: 1,000 tokens × $2.50/1M = $0.0025
- Output: 500 tokens × $10.00/1M = $0.005
- Total: $0.0075
Model Pricing Reference
| Model | Input ($/1M) | Output ($/1M) |
|---|---|---|
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4-turbo | $10.00 | $30.00 |
| o1 | $15.00 | $60.00 |
| claude-3-5-sonnet | $3.00 | $15.00 |
| claude-3-opus | $15.00 | $75.00 |
| claude-3-haiku | $0.25 | $1.25 |
Finding Cost Anomalies
High-Cost Traces
Identify expensive traces:
- Go to Traces
- Sort by Cost (descending)
- Review top traces for optimization
Cost Spikes
Detect unusual spending:
- View Cost Trend chart
- Look for spikes
- Drill into specific time periods
- Identify root cause (traffic spike, new feature, bug)
Inefficient Patterns
Find waste:
-- High token count, low value
Traces with > 10K tokens but < 5s duration
-- Repeated identical requests (cache misses)
Same input, multiple requests, no caching
-- Expensive model for simple tasks
gpt-4o used for simple classificationCost Optimization Strategies
1. Model Selection
Use the right model for the job:
| Task | Recommended | Cost |
|---|---|---|
| Complex reasoning | gpt-4o | $$$ |
| General chat | gpt-4o-mini | $ |
| Simple tasks | gpt-3.5-turbo | $ |
| Fast responses | claude-3-haiku | $ |
2. Prompt Optimization
Reduce token usage:
// Verbose (200 tokens)
const prompt = `Please analyze the following text and provide
a comprehensive summary that covers all the main points,
themes, and important details. Make sure to include...`;
// Optimized (50 tokens)
const prompt = `Summarize the key points of this text:`;3. Enable Caching
Cache identical requests:
// Without caching: Every request costs money
// With caching: Identical requests are free
// Enable in AI Gateway Settings
// Cache hit rate of 50% = 50% cost savings4. Limit Output Length
Set appropriate max_tokens:
// Don't pay for more than you need
const response = await openai.chat.completions.create({
model: 'gpt-4o',
max_tokens: 500, // Limit output
messages: [...],
});5. Use Streaming Wisely
Streaming doesn't save tokens, but:
- Improves perceived latency
- Allows early termination
- Better user experience
6. Batch Processing
Combine multiple queries:
// Instead of 10 requests for 10 questions
// Batch into 1 request
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{
role: 'user',
content: `Answer these questions:\n${questions.join('\n')}`
}],
});Budget Management
Setting Budgets
Configure spending limits:
- Go to Settings > Budgets
- Set limits:
- Daily budget: $100
- Monthly budget: $2,000
- Per-user limit: $10
Budget Alerts
Get notified before exceeding:
- Go to Settings > Alerts
- Create budget alert:
- 80% of daily budget
- 90% of monthly budget
Budget Actions
What happens when budget is reached:
- Alert only: Send notification
- Soft limit: Alert + warning to users
- Hard limit: Block requests
Cost Reports
Automated Reports
Receive regular cost summaries:
- Go to Settings > Reports
- Enable Weekly Cost Report
- Select recipients
- Choose format (email, Slack)
Custom Reports
Create detailed cost analysis:
- Go to Analytics > Reports
- Click Create Report
- Configure:
- Time period
- Dimensions (model, user, trace)
- Metrics (cost, tokens, requests)
- Export as CSV/PDF
Report Contents
Weekly Cost Report - Jan 15-21, 2024
Summary:
- Total Cost: $1,234.56
- Change from last week: +12%
- Total Requests: 150,000
- Avg Cost/Request: $0.008
Top Models by Cost:
1. gpt-4o: $800 (65%)
2. claude-3-5-sonnet: $300 (24%)
3. gpt-3.5-turbo: $134 (11%)
Top Users by Cost:
1. user-enterprise-1: $200
2. user-enterprise-2: $150
...
Cost Optimization Opportunities:
- 20% of gpt-4o requests could use gpt-4o-mini
- Cache hit rate is 30% (target: 50%)
- 500 identical requests not cached
Cost Attribution
By Feature
Tag traces for feature-level cost tracking:
const trace = obs.trace({
name: 'chat',
tags: ['feature:chat-v2', 'team:product'],
});View costs by tag in analytics.
By Customer
Track per-customer costs:
const trace = obs.trace({
name: 'api-request',
userId: customerId,
metadata: {
customerTier: 'enterprise',
billingCode: 'CUST-123',
},
});Export for billing:
curl https://api.transactional.dev/observability/costs/export \
-H "Authorization: Bearer pk_xxx" \
-d '{"groupBy": "userId", "period": "2024-01"}'Next Steps
- Metrics - All metrics explained
- Performance - Latency optimization
- Caching - Enable caching
On This Page
- Overview
- Viewing Costs
- Dashboard Overview
- Cost Breakdown
- By Model
- By User
- By Trace Name
- Cost Calculation
- Per-Request Cost
- Model Pricing Reference
- Finding Cost Anomalies
- High-Cost Traces
- Cost Spikes
- Inefficient Patterns
- Cost Optimization Strategies
- 1. Model Selection
- 2. Prompt Optimization
- 3. Enable Caching
- 4. Limit Output Length
- 5. Use Streaming Wisely
- 6. Batch Processing
- Budget Management
- Setting Budgets
- Budget Alerts
- Budget Actions
- Cost Reports
- Automated Reports
- Custom Reports
- Report Contents
- Cost Attribution
- By Feature
- By Customer
- Next Steps