Overview

Observability provides detailed cost tracking for every LLM call. Understand where your money goes and find optimization opportunities.

Viewing Costs

Dashboard Overview

Navigate to Observability > Analytics to see:

Total Cost: Sum for selected period
Cost Trend: Daily/hourly breakdown
Cost by Model: Model comparison
Top Traces: Highest cost traces

Cost Breakdown

View costs by different dimensions:

By Model

Model	Requests	Input Tokens	Output Tokens	Cost
gpt-4o	10,000	2M	1M	$150
claude-3-5-sonnet	5,000	1M	500K	$55
gpt-3.5-turbo	20,000	1.5M	750K	$22

By User

User	Requests	Tokens	Cost
user-123	500	250K	$5
user-456	300	150K	$3
user-789	200	100K	$2

By Trace Name

Trace	Requests	Avg Tokens	Total Cost
chat-completion	15,000	800	$120
summarize	5,000	2,500	$62
code-review	2,000	3,000	$45

Cost Calculation

Per-Request Cost

Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

Example for gpt-4o:

Input: 1,000 tokens × $2.50/1M = $0.0025
Output: 500 tokens × $10.00/1M = $0.005
Total: $0.0075

Model Pricing Reference

Model	Input ($/1M)	Output ($/1M)
gpt-4o	$2.50	$10.00
gpt-4o-mini	$0.15	$0.60
gpt-4-turbo	$10.00	$30.00
o1	$15.00	$60.00
claude-3-5-sonnet	$3.00	$15.00
claude-3-opus	$15.00	$75.00
claude-3-haiku	$0.25	$1.25

Finding Cost Anomalies

High-Cost Traces

Identify expensive traces:

Go to Traces
Sort by Cost (descending)
Review top traces for optimization

Cost Spikes

Detect unusual spending:

View Cost Trend chart
Look for spikes
Drill into specific time periods
Identify root cause (traffic spike, new feature, bug)

Inefficient Patterns

Find waste:

-- High token count, low value
Traces with > 10K tokens but < 5s duration
 
-- Repeated identical requests (cache misses)
Same input, multiple requests, no caching
 
-- Expensive model for simple tasks
gpt-4o used for simple classification

Cost Optimization Strategies

1. Model Selection

Use the right model for the job:

Task	Recommended	Cost
Complex reasoning	gpt-4o	$$$
General chat	gpt-4o-mini	$
Simple tasks	gpt-3.5-turbo	$
Fast responses	claude-3-haiku	$

2. Prompt Optimization

Reduce token usage:

// Verbose (200 tokens)
const prompt = `Please analyze the following text and provide
a comprehensive summary that covers all the main points,
themes, and important details. Make sure to include...`;
 
// Optimized (50 tokens)
const prompt = `Summarize the key points of this text:`;

3. Enable Caching

Cache identical requests:

// Without caching: Every request costs money
// With caching: Identical requests are free
 
// Enable in AI Gateway Settings
// Cache hit rate of 50% = 50% cost savings

4. Limit Output Length

Set appropriate max_tokens:

// Don't pay for more than you need
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  max_tokens: 500,  // Limit output
  messages: [...],
});

5. Use Streaming Wisely

Streaming doesn't save tokens, but:

Improves perceived latency
Allows early termination
Better user experience

6. Batch Processing

Combine multiple queries:

// Instead of 10 requests for 10 questions
// Batch into 1 request
 
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{
    role: 'user',
    content: `Answer these questions:\n${questions.join('\n')}`
  }],
});

Budget Management

Setting Budgets

Configure spending limits:

Go to Settings > Budgets
Set limits:
- Daily budget: $100
- Monthly budget: $2,000
- Per-user limit: $10

Budget Alerts

Get notified before exceeding:

Go to Settings > Alerts
Create budget alert:
- 80% of daily budget
- 90% of monthly budget

Budget Actions

What happens when budget is reached:

Alert only: Send notification
Soft limit: Alert + warning to users
Hard limit: Block requests

Cost Reports

Automated Reports

Receive regular cost summaries:

Go to Settings > Reports
Enable Weekly Cost Report
Select recipients
Choose format (email, Slack)

Custom Reports

Create detailed cost analysis:

Go to Analytics > Reports
Click Create Report
Configure:
- Time period
- Dimensions (model, user, trace)
- Metrics (cost, tokens, requests)
Export as CSV/PDF

Report Contents

Weekly Cost Report - Jan 15-21, 2024

Summary:
- Total Cost: $1,234.56
- Change from last week: +12%
- Total Requests: 150,000
- Avg Cost/Request: $0.008

Top Models by Cost:
1. gpt-4o: $800 (65%)
2. claude-3-5-sonnet: $300 (24%)
3. gpt-3.5-turbo: $134 (11%)

Top Users by Cost:
1. user-enterprise-1: $200
2. user-enterprise-2: $150
...

Cost Optimization Opportunities:
- 20% of gpt-4o requests could use gpt-4o-mini
- Cache hit rate is 30% (target: 50%)
- 500 identical requests not cached

Cost Attribution

By Feature

Tag traces for feature-level cost tracking:

const trace = obs.trace({
  name: 'chat',
  tags: ['feature:chat-v2', 'team:product'],
});

View costs by tag in analytics.

By Customer

Track per-customer costs:

const trace = obs.trace({
  name: 'api-request',
  userId: customerId,
  metadata: {
    customerTier: 'enterprise',
    billingCode: 'CUST-123',
  },
});

Export for billing:

curl https://api.transactional.dev/observability/costs/export \
  -H "Authorization: Bearer pk_xxx" \
  -d '{"groupBy": "userId", "period": "2024-01"}'

Next Steps

Metrics - All metrics explained
Performance - Latency optimization
Caching - Enable caching

Cost Analysis