Transactional

Cost Analysis

Analyzing and optimizing LLM costs with Observability.

Overview

Observability provides detailed cost tracking for every LLM call. Understand where your money goes and find optimization opportunities.

Viewing Costs

Dashboard Overview

Navigate to Observability > Analytics to see:

  • Total Cost: Sum for selected period
  • Cost Trend: Daily/hourly breakdown
  • Cost by Model: Model comparison
  • Top Traces: Highest cost traces

Cost Breakdown

View costs by different dimensions:

By Model

ModelRequestsInput TokensOutput TokensCost
gpt-4o10,0002M1M$150
claude-3-5-sonnet5,0001M500K$55
gpt-3.5-turbo20,0001.5M750K$22

By User

UserRequestsTokensCost
user-123500250K$5
user-456300150K$3
user-789200100K$2

By Trace Name

TraceRequestsAvg TokensTotal Cost
chat-completion15,000800$120
summarize5,0002,500$62
code-review2,0003,000$45

Cost Calculation

Per-Request Cost

Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

Example for gpt-4o:

  • Input: 1,000 tokens × $2.50/1M = $0.0025
  • Output: 500 tokens × $10.00/1M = $0.005
  • Total: $0.0075

Model Pricing Reference

ModelInput ($/1M)Output ($/1M)
gpt-4o$2.50$10.00
gpt-4o-mini$0.15$0.60
gpt-4-turbo$10.00$30.00
o1$15.00$60.00
claude-3-5-sonnet$3.00$15.00
claude-3-opus$15.00$75.00
claude-3-haiku$0.25$1.25

Finding Cost Anomalies

High-Cost Traces

Identify expensive traces:

  1. Go to Traces
  2. Sort by Cost (descending)
  3. Review top traces for optimization

Cost Spikes

Detect unusual spending:

  1. View Cost Trend chart
  2. Look for spikes
  3. Drill into specific time periods
  4. Identify root cause (traffic spike, new feature, bug)

Inefficient Patterns

Find waste:

-- High token count, low value
Traces with > 10K tokens but < 5s duration
 
-- Repeated identical requests (cache misses)
Same input, multiple requests, no caching
 
-- Expensive model for simple tasks
gpt-4o used for simple classification

Cost Optimization Strategies

1. Model Selection

Use the right model for the job:

TaskRecommendedCost
Complex reasoninggpt-4o$$$
General chatgpt-4o-mini$
Simple tasksgpt-3.5-turbo$
Fast responsesclaude-3-haiku$

2. Prompt Optimization

Reduce token usage:

// Verbose (200 tokens)
const prompt = `Please analyze the following text and provide
a comprehensive summary that covers all the main points,
themes, and important details. Make sure to include...`;
 
// Optimized (50 tokens)
const prompt = `Summarize the key points of this text:`;

3. Enable Caching

Cache identical requests:

// Without caching: Every request costs money
// With caching: Identical requests are free
 
// Enable in AI Gateway Settings
// Cache hit rate of 50% = 50% cost savings

4. Limit Output Length

Set appropriate max_tokens:

// Don't pay for more than you need
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  max_tokens: 500,  // Limit output
  messages: [...],
});

5. Use Streaming Wisely

Streaming doesn't save tokens, but:

  • Improves perceived latency
  • Allows early termination
  • Better user experience

6. Batch Processing

Combine multiple queries:

// Instead of 10 requests for 10 questions
// Batch into 1 request
 
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{
    role: 'user',
    content: `Answer these questions:\n${questions.join('\n')}`
  }],
});

Budget Management

Setting Budgets

Configure spending limits:

  1. Go to Settings > Budgets
  2. Set limits:
    • Daily budget: $100
    • Monthly budget: $2,000
    • Per-user limit: $10

Budget Alerts

Get notified before exceeding:

  1. Go to Settings > Alerts
  2. Create budget alert:
    • 80% of daily budget
    • 90% of monthly budget

Budget Actions

What happens when budget is reached:

  • Alert only: Send notification
  • Soft limit: Alert + warning to users
  • Hard limit: Block requests

Cost Reports

Automated Reports

Receive regular cost summaries:

  1. Go to Settings > Reports
  2. Enable Weekly Cost Report
  3. Select recipients
  4. Choose format (email, Slack)

Custom Reports

Create detailed cost analysis:

  1. Go to Analytics > Reports
  2. Click Create Report
  3. Configure:
    • Time period
    • Dimensions (model, user, trace)
    • Metrics (cost, tokens, requests)
  4. Export as CSV/PDF

Report Contents

Weekly Cost Report - Jan 15-21, 2024

Summary:
- Total Cost: $1,234.56
- Change from last week: +12%
- Total Requests: 150,000
- Avg Cost/Request: $0.008

Top Models by Cost:
1. gpt-4o: $800 (65%)
2. claude-3-5-sonnet: $300 (24%)
3. gpt-3.5-turbo: $134 (11%)

Top Users by Cost:
1. user-enterprise-1: $200
2. user-enterprise-2: $150
...

Cost Optimization Opportunities:
- 20% of gpt-4o requests could use gpt-4o-mini
- Cache hit rate is 30% (target: 50%)
- 500 identical requests not cached

Cost Attribution

By Feature

Tag traces for feature-level cost tracking:

const trace = obs.trace({
  name: 'chat',
  tags: ['feature:chat-v2', 'team:product'],
});

View costs by tag in analytics.

By Customer

Track per-customer costs:

const trace = obs.trace({
  name: 'api-request',
  userId: customerId,
  metadata: {
    customerTier: 'enterprise',
    billingCode: 'CUST-123',
  },
});

Export for billing:

curl https://api.transactional.dev/observability/costs/export \
  -H "Authorization: Bearer pk_xxx" \
  -d '{"groupBy": "userId", "period": "2024-01"}'

Next Steps