Key Metrics
Understanding the key metrics in Observability analytics.
Overview
Observability tracks key metrics to help you understand your AI application's performance, cost, and reliability.
Volume Metrics
Trace Count
Total number of traces in a time period.
Use cases:
- Monitor usage trends
- Identify traffic spikes
- Capacity planning
Calculation: Count of all traces started in period.
Request Rate
Traces per minute/hour/day.
Use cases:
- Real-time monitoring
- Rate limit planning
- Anomaly detection
Calculation: Trace count / Time period
Generation Count
Number of LLM calls.
Use cases:
- Track API usage
- Cost attribution
- Optimization opportunities
Token Metrics
Prompt Tokens
Tokens in LLM inputs.
Use cases:
- Prompt optimization
- Context management
- Cost analysis
Breakdown by:
- Model
- Trace name
- Time period
Completion Tokens
Tokens in LLM outputs.
Use cases:
- Output length analysis
- Cost control
- Quality monitoring
Total Tokens
Sum of prompt + completion tokens.
Formula: totalTokens = promptTokens + completionTokens
Tokens per Request
Average tokens per generation.
Formula: avgTokens = totalTokens / generationCount
Benchmarks:
- Chat: 500-1500 tokens
- Summarization: 1000-3000 tokens
- Code generation: 1000-4000 tokens
Cost Metrics
Total Cost
Sum of all LLM costs.
Calculation:
cost = Σ(promptTokens × inputPrice + completionTokens × outputPrice)
Cost per Request
Average cost per generation.
Formula: avgCost = totalCost / generationCount
Cost by Model
Breakdown of cost by model:
| Model | Requests | Tokens | Cost | % of Total |
|---|---|---|---|---|
| gpt-4o | 10,000 | 5M | $100 | 60% |
| claude-3-5-sonnet | 5,000 | 2M | $50 | 30% |
| gpt-3.5-turbo | 20,000 | 3M | $15 | 10% |
Cost Trend
Cost over time:
- Daily trend
- Weekly comparison
- Month-over-month growth
Latency Metrics
Duration
Time from trace start to end.
Percentiles:
- p50: Median latency
- p95: 95th percentile
- p99: 99th percentile
Benchmarks:
| Model | p50 | p95 |
|---|---|---|
| gpt-4o | 1.5s | 4s |
| gpt-4o-mini | 0.8s | 2s |
| claude-3-5-sonnet | 1.2s | 3s |
Time to First Token (TTFT)
For streaming: time until first token arrives.
Use cases:
- User experience optimization
- Perceived performance
- Streaming configuration
Generation Latency
Time for individual LLM calls.
Breakdown:
- Network time
- Queue time (provider)
- Generation time
Error Metrics
Error Rate
Percentage of failed traces.
Formula: errorRate = errorCount / totalCount × 100
Threshold guidelines:
- < 1%: Healthy
- 1-5%: Investigate
-
5%: Critical
Error Count
Absolute number of errors.
Breakdown by:
- Error type
- Model
- Time
Error Types
Common error categories:
| Error | Description | Action |
|---|---|---|
| Rate Limit | Provider rate limit hit | Add fallback, increase limits |
| Timeout | Request exceeded timeout | Increase timeout, optimize prompt |
| Invalid Request | Bad request format | Fix request validation |
| Context Length | Exceeded context limit | Reduce prompt size |
| Content Filter | Content policy violation | Review content |
Cache Metrics
Cache Hit Rate
Percentage of requests served from cache.
Formula: hitRate = cacheHits / totalRequests × 100
Target: > 50% for cacheable workloads
Cache Savings
Cost saved due to caching.
Formula: savings = cacheHits × avgRequestCost
Cache Size
Total cached responses.
Considerations:
- Storage cost
- TTL settings
- Invalidation needs
Session Metrics
Sessions Count
Number of unique sessions.
Traces per Session
Average traces in a session.
Formula: avgTraces = totalTraces / sessionCount
Session Duration
Time from first to last trace.
Session Cost
Total cost per session.
User Metrics
Active Users
Unique userIds in period.
Requests per User
Average requests per user.
Cost per User
Average cost per user.
Use cases:
- Usage-based billing
- Abuse detection
- User segmentation
Model Metrics
Model Distribution
Percentage of requests by model:
gpt-4o: 45%
claude-3-5-sonnet: 30%
gpt-3.5-turbo: 25%
Model Performance
Compare models:
| Model | Avg Latency | Avg Cost | Error Rate |
|---|---|---|---|
| gpt-4o | 1.5s | $0.008 | 0.5% |
| claude-3-5-sonnet | 1.2s | $0.006 | 0.3% |
Setting Up Alerts
Create alerts for key metrics:
- Go to Settings > Alerts
- Click Create Alert
- Configure:
- Metric: e.g., Error Rate
- Condition: e.g., > 5%
- Window: e.g., 5 minutes
- Channel: Email, Slack, Webhook
Recommended alerts:
- Error rate > 5%
- Cost per hour > $100
- p95 latency > 10s
- Cache hit rate < 30%
Next Steps
- Cost Analysis - Detailed cost breakdown
- Performance - Latency optimization
- Dashboard - Visualization
On This Page
- Overview
- Volume Metrics
- Trace Count
- Request Rate
- Generation Count
- Token Metrics
- Prompt Tokens
- Completion Tokens
- Total Tokens
- Tokens per Request
- Cost Metrics
- Total Cost
- Cost per Request
- Cost by Model
- Cost Trend
- Latency Metrics
- Duration
- Time to First Token (TTFT)
- Generation Latency
- Error Metrics
- Error Rate
- Error Count
- Error Types
- Cache Metrics
- Cache Hit Rate
- Cache Savings
- Cache Size
- Session Metrics
- Sessions Count
- Traces per Session
- Session Duration
- Session Cost
- User Metrics
- Active Users
- Requests per User
- Cost per User
- Model Metrics
- Model Distribution
- Model Performance
- Setting Up Alerts
- Next Steps