Transactional
Tutorials
8 min read

Track Your LLM Costs in Real-Time Before They Surprise You

Set up real-time cost tracking for LLM API calls with token counting, dashboards, alert thresholds, and budget controls. Practical TypeScript examples included.

Transactional Team
Jan 24, 2026
8 min read
Share
Track Your LLM Costs in Real-Time Before They Surprise You

A common scenario: a team gets a $14,000 bill from OpenAI when they had budgeted $2,000. The culprit is a single endpoint passing entire conversation histories as context on every request, and nobody notices until the invoice arrives.

LLM costs are fundamentally different from traditional API costs. A single request can cost anywhere from $0.001 to $3.00 depending on the model, input size, and output length. Without real-time tracking, you are flying blind.

What You Will Learn

  • How to count tokens and calculate costs per provider
  • Building middleware for automatic cost tracking
  • Setting up alert thresholds and budget controls
  • Creating a cost dashboard with useful breakdowns

LLM Cost per 1M Input Tokens by Provider (Early 2026)

Claude Opus 4
15
Claude Sonnet 4
3
GPT-4o
2.5
GPT-4.1
2
Gemini 2.5 Pro
1.25
Claude Haiku 4.5
0.8
GPT-4o-mini
0.15

Token Counting Per Provider

Every provider charges differently. Here are the current rates for popular models as of early 2026:

// costs.ts - Cost per 1M tokens (input / output)
const MODEL_COSTS: Record<string, { input: number; output: number }> = {
  // OpenAI
  'gpt-4o': { input: 2.50, output: 10.00 },
  'gpt-4o-mini': { input: 0.15, output: 0.60 },
  'gpt-4.1': { input: 2.00, output: 8.00 },
  'gpt-4.1-mini': { input: 0.40, output: 1.60 },
 
  // Anthropic
  'claude-opus-4-6': { input: 15.00, output: 75.00 },
  'claude-sonnet-4-6': { input: 3.00, output: 15.00 },
  'claude-haiku-4-5': { input: 0.80, output: 4.00 },
 
  // Google
  'gemini-2.5-pro': { input: 1.25, output: 10.00 },
  'gemini-2.5-flash': { input: 0.15, output: 0.60 },
};
 
function calculateCost(
  model: string,
  inputTokens: number,
  outputTokens: number
): number {
  const rates = MODEL_COSTS[model];
  if (!rates) return 0;
 
  const inputCost = (inputTokens / 1_000_000) * rates.input;
  const outputCost = (outputTokens / 1_000_000) * rates.output;
  return inputCost + outputCost;
}

Counting Tokens Before the Request

You want to estimate costs before sending a request, not just after. Use the tokenizer libraries:

import { encodingForModel } from 'js-tiktoken';
 
function countTokens(text: string, model: string): number {
  const encoding = encodingForModel(model as any);
  const tokens = encoding.encode(text);
  return tokens.length;
}
 
// Example: estimate cost before calling
const prompt = buildPrompt(userMessage, conversationHistory);
const estimatedInputTokens = countTokens(prompt, 'gpt-4o');
const estimatedCost = calculateCost('gpt-4o', estimatedInputTokens, 500); // assume 500 output tokens
 
if (estimatedCost > MAX_COST_PER_REQUEST) {
  // Truncate context, switch to cheaper model, or reject
  console.warn(`Estimated cost $${estimatedCost.toFixed(4)} exceeds limit`);
}

Building Cost Tracking Middleware

The most reliable approach is middleware that wraps every LLM call. Here is a pattern that works with any provider:

// llm-tracker.ts
interface LLMUsageRecord {
  timestamp: Date;
  model: string;
  provider: string;
  inputTokens: number;
  outputTokens: number;
  cost: number;
  endpoint: string;
  userId?: string;
  latencyMs: number;
}
 
class LLMCostTracker {
  private records: LLMUsageRecord[] = [];
  private budgetLimitUsd: number;
  private alertThresholdPct: number;
  private onAlert: (message: string, currentSpend: number) => void;
 
  constructor(options: {
    budgetLimitUsd: number;
    alertThresholdPct?: number;
    onAlert?: (message: string, currentSpend: number) => void;
  }) {
    this.budgetLimitUsd = options.budgetLimitUsd;
    this.alertThresholdPct = options.alertThresholdPct ?? 80;
    this.onAlert = options.onAlert ?? console.warn;
  }
 
  track(record: LLMUsageRecord): void {
    this.records.push(record);
    this.checkBudget();
  }
 
  private checkBudget(): void {
    const totalSpend = this.getTotalSpend();
    const pct = (totalSpend / this.budgetLimitUsd) * 100;
 
    if (pct >= 100) {
      this.onAlert(
        `Budget exceeded: $${totalSpend.toFixed(2)} / $${this.budgetLimitUsd}`,
        totalSpend
      );
    } else if (pct >= this.alertThresholdPct) {
      this.onAlert(
        `Budget warning (${pct.toFixed(0)}%): $${totalSpend.toFixed(2)} / $${this.budgetLimitUsd}`,
        totalSpend
      );
    }
  }
 
  getTotalSpend(): number {
    return this.records.reduce((sum, r) => sum + r.cost, 0);
  }
 
  getSpendByModel(): Record<string, number> {
    return this.records.reduce(
      (acc, r) => {
        acc[r.model] = (acc[r.model] ?? 0) + r.cost;
        return acc;
      },
      {} as Record<string, number>
    );
  }
 
  getSpendByEndpoint(): Record<string, number> {
    return this.records.reduce(
      (acc, r) => {
        acc[r.endpoint] = (acc[r.endpoint] ?? 0) + r.cost;
        return acc;
      },
      {} as Record<string, number>
    );
  }
}

Wrapping API Calls

Create a wrapper that automatically tracks usage from the response:

// wrapper for OpenAI
import OpenAI from 'openai';
 
const client = new OpenAI();
const tracker = new LLMCostTracker({
  budgetLimitUsd: 2000,
  alertThresholdPct: 75,
  onAlert: (msg, spend) => {
    // Send to your alerting system (Slack, PagerDuty, email)
    notifyTeam(msg);
  },
});
 
async function trackedCompletion(
  params: OpenAI.ChatCompletionCreateParams,
  endpoint: string
): Promise<OpenAI.ChatCompletion> {
  const start = Date.now();
  const response = await client.chat.completions.create(params);
  const latencyMs = Date.now() - start;
 
  const usage = response.usage;
  if (usage) {
    tracker.track({
      timestamp: new Date(),
      model: params.model,
      provider: 'openai',
      inputTokens: usage.prompt_tokens,
      outputTokens: usage.completion_tokens,
      cost: calculateCost(
        params.model,
        usage.prompt_tokens,
        usage.completion_tokens
      ),
      endpoint,
      latencyMs,
    });
  }
 
  return response;
}

Setting Up Alert Thresholds

Budget alerts need to work at multiple levels:

// alerts.ts
interface AlertConfig {
  // Per-request limits
  maxCostPerRequest: number; // e.g., $0.50
 
  // Hourly limits (catch runaway loops)
  maxCostPerHour: number; // e.g., $50
 
  // Daily limits
  maxCostPerDay: number; // e.g., $200
 
  // Monthly budget
  monthlyBudget: number; // e.g., $2000
 
  // Per-user limits (prevent abuse)
  maxCostPerUserPerDay: number; // e.g., $5
}
 
const DEFAULT_ALERTS: AlertConfig = {
  maxCostPerRequest: 0.50,
  maxCostPerHour: 50,
  maxCostPerDay: 200,
  monthlyBudget: 2000,
  maxCostPerUserPerDay: 5,
};
 
function checkAlerts(
  record: LLMUsageRecord,
  recentRecords: LLMUsageRecord[],
  config: AlertConfig
): string[] {
  const alerts: string[] = [];
 
  // Per-request check
  if (record.cost > config.maxCostPerRequest) {
    alerts.push(
      `Single request cost $${record.cost.toFixed(4)} exceeds limit of $${config.maxCostPerRequest}`
    );
  }
 
  // Hourly check
  const lastHour = recentRecords.filter(
    (r) => r.timestamp > new Date(Date.now() - 3600_000)
  );
  const hourlyCost = lastHour.reduce((s, r) => s + r.cost, 0);
  if (hourlyCost > config.maxCostPerHour) {
    alerts.push(
      `Hourly spend $${hourlyCost.toFixed(2)} exceeds limit of $${config.maxCostPerHour}`
    );
  }
 
  return alerts;
}

Building a Cost Dashboard

Store usage records in your database and query them for dashboard views:

// Dashboard query examples
 
// Total spend by model (last 30 days)
const spendByModel = await db
  .select({
    model: llmUsage.model,
    totalCost: sql<number>`SUM(cost)`,
    requestCount: sql<number>`COUNT(*)`,
    avgLatency: sql<number>`AVG(latency_ms)`,
  })
  .from(llmUsage)
  .where(gte(llmUsage.timestamp, thirtyDaysAgo))
  .groupBy(llmUsage.model)
  .orderBy(desc(sql`SUM(cost)`));
 
// Cost trend by day
const dailyCosts = await db
  .select({
    date: sql<string>`DATE(timestamp)`,
    cost: sql<number>`SUM(cost)`,
    tokens: sql<number>`SUM(input_tokens + output_tokens)`,
  })
  .from(llmUsage)
  .where(gte(llmUsage.timestamp, thirtyDaysAgo))
  .groupBy(sql`DATE(timestamp)`)
  .orderBy(sql`DATE(timestamp)`);
 
// Top expensive endpoints
const costByEndpoint = await db
  .select({
    endpoint: llmUsage.endpoint,
    totalCost: sql<number>`SUM(cost)`,
    avgCost: sql<number>`AVG(cost)`,
    p99Cost: sql<number>`PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY cost)`,
  })
  .from(llmUsage)
  .where(gte(llmUsage.timestamp, thirtyDaysAgo))
  .groupBy(llmUsage.endpoint)
  .orderBy(desc(sql`SUM(cost)`));

The dashboard should show four things at a glance: current month spend vs budget, daily trend, breakdown by model, and top endpoints by cost.

Hard Budget Controls

Alerts are not enough. You need circuit breakers that actually stop spending:

async function trackedCompletionWithLimits(
  params: OpenAI.ChatCompletionCreateParams,
  endpoint: string
): Promise<OpenAI.ChatCompletion> {
  // Check budget before making the call
  const currentMonthSpend = await getMonthlySpend();
  if (currentMonthSpend >= MONTHLY_BUDGET) {
    throw new BudgetExceededError(
      `Monthly budget of $${MONTHLY_BUDGET} exhausted. Current: $${currentMonthSpend.toFixed(2)}`
    );
  }
 
  // Estimate and check per-request limit
  const estimatedCost = estimateRequestCost(params);
  if (estimatedCost > MAX_COST_PER_REQUEST) {
    // Try to reduce cost: truncate context or use cheaper model
    params = reduceCost(params);
  }
 
  return trackedCompletion(params, endpoint);
}

The Key Takeaway

LLM cost tracking is not optional. Build it in from day one, before you get the surprise invoice. The implementation is straightforward: count tokens, calculate costs, store records, set alerts, add circuit breakers.

If you want this handled out of the box, Transactional's LLM Observability gives you real-time cost dashboards, per-model breakdowns, and configurable budget alerts without writing the tracking infrastructure yourself. But even if you build it in-house, the patterns above will get you most of the way there.

Sources & References

  1. [1]OpenAI API PricingOpenAI
  2. [2]Anthropic API PricingAnthropic
  3. [3]OpenAI Tokenizer and tiktokenOpenAI
  4. [4]Google Gemini API PricingGoogle

Written by

Transactional Team

Share
Tags:
tutorial
llm
cost-tracking

YOUR AGENTS DESERVE
REAL INFRASTRUCTURE.

START BUILDING AGENTS THAT DO REAL WORK.

Deploy Your First Agent