Transactional

Key Metrics

Understanding the key metrics in Observability analytics.

Overview

Observability tracks key metrics to help you understand your AI application's performance, cost, and reliability.

Volume Metrics

Trace Count

Total number of traces in a time period.

Use cases:

  • Monitor usage trends
  • Identify traffic spikes
  • Capacity planning

Calculation: Count of all traces started in period.

Request Rate

Traces per minute/hour/day.

Use cases:

  • Real-time monitoring
  • Rate limit planning
  • Anomaly detection

Calculation: Trace count / Time period

Generation Count

Number of LLM calls.

Use cases:

  • Track API usage
  • Cost attribution
  • Optimization opportunities

Token Metrics

Prompt Tokens

Tokens in LLM inputs.

Use cases:

  • Prompt optimization
  • Context management
  • Cost analysis

Breakdown by:

  • Model
  • Trace name
  • Time period

Completion Tokens

Tokens in LLM outputs.

Use cases:

  • Output length analysis
  • Cost control
  • Quality monitoring

Total Tokens

Sum of prompt + completion tokens.

Formula: totalTokens = promptTokens + completionTokens

Tokens per Request

Average tokens per generation.

Formula: avgTokens = totalTokens / generationCount

Benchmarks:

  • Chat: 500-1500 tokens
  • Summarization: 1000-3000 tokens
  • Code generation: 1000-4000 tokens

Cost Metrics

Total Cost

Sum of all LLM costs.

Calculation:

cost = Σ(promptTokens × inputPrice + completionTokens × outputPrice)

Cost per Request

Average cost per generation.

Formula: avgCost = totalCost / generationCount

Cost by Model

Breakdown of cost by model:

ModelRequestsTokensCost% of Total
gpt-4o10,0005M$10060%
claude-3-5-sonnet5,0002M$5030%
gpt-3.5-turbo20,0003M$1510%

Cost Trend

Cost over time:

  • Daily trend
  • Weekly comparison
  • Month-over-month growth

Latency Metrics

Duration

Time from trace start to end.

Percentiles:

  • p50: Median latency
  • p95: 95th percentile
  • p99: 99th percentile

Benchmarks:

Modelp50p95
gpt-4o1.5s4s
gpt-4o-mini0.8s2s
claude-3-5-sonnet1.2s3s

Time to First Token (TTFT)

For streaming: time until first token arrives.

Use cases:

  • User experience optimization
  • Perceived performance
  • Streaming configuration

Generation Latency

Time for individual LLM calls.

Breakdown:

  • Network time
  • Queue time (provider)
  • Generation time

Error Metrics

Error Rate

Percentage of failed traces.

Formula: errorRate = errorCount / totalCount × 100

Threshold guidelines:

  • < 1%: Healthy
  • 1-5%: Investigate
  • 5%: Critical

Error Count

Absolute number of errors.

Breakdown by:

  • Error type
  • Model
  • Time

Error Types

Common error categories:

ErrorDescriptionAction
Rate LimitProvider rate limit hitAdd fallback, increase limits
TimeoutRequest exceeded timeoutIncrease timeout, optimize prompt
Invalid RequestBad request formatFix request validation
Context LengthExceeded context limitReduce prompt size
Content FilterContent policy violationReview content

Cache Metrics

Cache Hit Rate

Percentage of requests served from cache.

Formula: hitRate = cacheHits / totalRequests × 100

Target: > 50% for cacheable workloads

Cache Savings

Cost saved due to caching.

Formula: savings = cacheHits × avgRequestCost

Cache Size

Total cached responses.

Considerations:

  • Storage cost
  • TTL settings
  • Invalidation needs

Session Metrics

Sessions Count

Number of unique sessions.

Traces per Session

Average traces in a session.

Formula: avgTraces = totalTraces / sessionCount

Session Duration

Time from first to last trace.

Session Cost

Total cost per session.

User Metrics

Active Users

Unique userIds in period.

Requests per User

Average requests per user.

Cost per User

Average cost per user.

Use cases:

  • Usage-based billing
  • Abuse detection
  • User segmentation

Model Metrics

Model Distribution

Percentage of requests by model:

gpt-4o: 45%
claude-3-5-sonnet: 30%
gpt-3.5-turbo: 25%

Model Performance

Compare models:

ModelAvg LatencyAvg CostError Rate
gpt-4o1.5s$0.0080.5%
claude-3-5-sonnet1.2s$0.0060.3%

Setting Up Alerts

Create alerts for key metrics:

  1. Go to Settings > Alerts
  2. Click Create Alert
  3. Configure:
    • Metric: e.g., Error Rate
    • Condition: e.g., > 5%
    • Window: e.g., 5 minutes
    • Channel: Email, Slack, Webhook

Recommended alerts:

  • Error rate > 5%
  • Cost per hour > $100
  • p95 latency > 10s
  • Cache hit rate < 30%

Next Steps