A common scenario: a team gets a $14,000 bill from OpenAI when they had budgeted $2,000. The culprit is a single endpoint passing entire conversation histories as context on every request, and nobody notices until the invoice arrives.

LLM costs are fundamentally different from traditional API costs. A single request can cost anywhere from $0.001 to $3.00 depending on the model, input size, and output length. Without real-time tracking, you are flying blind.

What You Will Learn

How to count tokens and calculate costs per provider
Building middleware for automatic cost tracking
Setting up alert thresholds and budget controls
Creating a cost dashboard with useful breakdowns

LLM Cost per 1M Input Tokens by Provider (Early 2026)

Claude Opus 4

Claude Sonnet 4

GPT-4o

2.5

GPT-4.1

Gemini 2.5 Pro

1.25

Claude Haiku 4.5

0.8

GPT-4o-mini

0.15

Token Counting Per Provider

Every provider charges differently. Here are the current rates for popular models as of early 2026:

// costs.ts - Cost per 1M tokens (input / output)
const MODEL_COSTS: Record<string, { input: number; output: number }> = {
  // OpenAI
  'gpt-4o': { input: 2.50, output: 10.00 },
  'gpt-4o-mini': { input: 0.15, output: 0.60 },
  'gpt-4.1': { input: 2.00, output: 8.00 },
  'gpt-4.1-mini': { input: 0.40, output: 1.60 },
 
  // Anthropic
  'claude-opus-4-6': { input: 15.00, output: 75.00 },
  'claude-sonnet-4-6': { input: 3.00, output: 15.00 },
  'claude-haiku-4-5': { input: 0.80, output: 4.00 },
 
  // Google
  'gemini-2.5-pro': { input: 1.25, output: 10.00 },
  'gemini-2.5-flash': { input: 0.15, output: 0.60 },
};
 
function calculateCost(
  model: string,
  inputTokens: number,
  outputTokens: number
): number {
  const rates = MODEL_COSTS[model];
  if (!rates) return 0;
 
  const inputCost = (inputTokens / 1_000_000) * rates.input;
  const outputCost = (outputTokens / 1_000_000) * rates.output;
  return inputCost + outputCost;
}

Counting Tokens Before the Request

You want to estimate costs before sending a request, not just after. Use the tokenizer libraries:

import { encodingForModel } from 'js-tiktoken';
 
function countTokens(text: string, model: string): number {
  const encoding = encodingForModel(model as any);
  const tokens = encoding.encode(text);
  return tokens.length;
}
 
// Example: estimate cost before calling
const prompt = buildPrompt(userMessage, conversationHistory);
const estimatedInputTokens = countTokens(prompt, 'gpt-4o');
const estimatedCost = calculateCost('gpt-4o', estimatedInputTokens, 500); // assume 500 output tokens
 
if (estimatedCost > MAX_COST_PER_REQUEST) {
  // Truncate context, switch to cheaper model, or reject
  console.warn(`Estimated cost $${estimatedCost.toFixed(4)} exceeds limit`);
}

Building Cost Tracking Middleware

The most reliable approach is middleware that wraps every LLM call. Here is a pattern that works with any provider:

// llm-tracker.ts
interface LLMUsageRecord {
  timestamp: Date;
  model: string;
  provider: string;
  inputTokens: number;
  outputTokens: number;
  cost: number;
  endpoint: string;
  userId?: string;
  latencyMs: number;
}
 
class LLMCostTracker {
  private records: LLMUsageRecord[] = [];
  private budgetLimitUsd: number;
  private alertThresholdPct: number;
  private onAlert: (message: string, currentSpend: number) => void;
 
  constructor(options: {
    budgetLimitUsd: number;
    alertThresholdPct?: number;
    onAlert?: (message: string, currentSpend: number) => void;
  }) {
    this.budgetLimitUsd = options.budgetLimitUsd;
    this.alertThresholdPct = options.alertThresholdPct ?? 80;
    this.onAlert = options.onAlert ?? console.warn;
  }
 
  track(record: LLMUsageRecord): void {
    this.records.push(record);
    this.checkBudget();
  }
 
  private checkBudget(): void {
    const totalSpend = this.getTotalSpend();
    const pct = (totalSpend / this.budgetLimitUsd) * 100;
 
    if (pct >= 100) {
      this.onAlert(
        `Budget exceeded: $${totalSpend.toFixed(2)} / $${this.budgetLimitUsd}`,
        totalSpend
      );
    } else if (pct >= this.alertThresholdPct) {
      this.onAlert(
        `Budget warning (${pct.toFixed(0)}%): $${totalSpend.toFixed(2)} / $${this.budgetLimitUsd}`,
        totalSpend
      );
    }
  }
 
  getTotalSpend(): number {
    return this.records.reduce((sum, r) => sum + r.cost, 0);
  }
 
  getSpendByModel(): Record<string, number> {
    return this.records.reduce(
      (acc, r) => {
        acc[r.model] = (acc[r.model] ?? 0) + r.cost;
        return acc;
      },
      {} as Record<string, number>
    );
  }
 
  getSpendByEndpoint(): Record<string, number> {
    return this.records.reduce(
      (acc, r) => {
        acc[r.endpoint] = (acc[r.endpoint] ?? 0) + r.cost;
        return acc;
      },
      {} as Record<string, number>
    );
  }
}

Wrapping API Calls

Create a wrapper that automatically tracks usage from the response:

// wrapper for OpenAI
import OpenAI from 'openai';
 
const client = new OpenAI();
const tracker = new LLMCostTracker({
  budgetLimitUsd: 2000,
  alertThresholdPct: 75,
  onAlert: (msg, spend) => {
    // Send to your alerting system (Slack, PagerDuty, email)
    notifyTeam(msg);
  },
});
 
async function trackedCompletion(
  params: OpenAI.ChatCompletionCreateParams,
  endpoint: string
): Promise<OpenAI.ChatCompletion> {
  const start = Date.now();
  const response = await client.chat.completions.create(params);
  const latencyMs = Date.now() - start;
 
  const usage = response.usage;
  if (usage) {
    tracker.track({
      timestamp: new Date(),
      model: params.model,
      provider: 'openai',
      inputTokens: usage.prompt_tokens,
      outputTokens: usage.completion_tokens,
      cost: calculateCost(
        params.model,
        usage.prompt_tokens,
        usage.completion_tokens
      ),
      endpoint,
      latencyMs,
    });
  }
 
  return response;
}

Setting Up Alert Thresholds

Budget alerts need to work at multiple levels:

// alerts.ts
interface AlertConfig {
  // Per-request limits
  maxCostPerRequest: number; // e.g., $0.50
 
  // Hourly limits (catch runaway loops)
  maxCostPerHour: number; // e.g., $50
 
  // Daily limits
  maxCostPerDay: number; // e.g., $200
 
  // Monthly budget
  monthlyBudget: number; // e.g., $2000
 
  // Per-user limits (prevent abuse)
  maxCostPerUserPerDay: number; // e.g., $5
}
 
const DEFAULT_ALERTS: AlertConfig = {
  maxCostPerRequest: 0.50,
  maxCostPerHour: 50,
  maxCostPerDay: 200,
  monthlyBudget: 2000,
  maxCostPerUserPerDay: 5,
};
 
function checkAlerts(
  record: LLMUsageRecord,
  recentRecords: LLMUsageRecord[],
  config: AlertConfig
): string[] {
  const alerts: string[] = [];
 
  // Per-request check
  if (record.cost > config.maxCostPerRequest) {
    alerts.push(
      `Single request cost $${record.cost.toFixed(4)} exceeds limit of $${config.maxCostPerRequest}`
    );
  }
 
  // Hourly check
  const lastHour = recentRecords.filter(
    (r) => r.timestamp > new Date(Date.now() - 3600_000)
  );
  const hourlyCost = lastHour.reduce((s, r) => s + r.cost, 0);
  if (hourlyCost > config.maxCostPerHour) {
    alerts.push(
      `Hourly spend $${hourlyCost.toFixed(2)} exceeds limit of $${config.maxCostPerHour}`
    );
  }
 
  return alerts;
}

Building a Cost Dashboard

Store usage records in your database and query them for dashboard views:

// Dashboard query examples
 
// Total spend by model (last 30 days)
const spendByModel = await db
  .select({
    model: llmUsage.model,
    totalCost: sql<number>`SUM(cost)`,
    requestCount: sql<number>`COUNT(*)`,
    avgLatency: sql<number>`AVG(latency_ms)`,
  })
  .from(llmUsage)
  .where(gte(llmUsage.timestamp, thirtyDaysAgo))
  .groupBy(llmUsage.model)
  .orderBy(desc(sql`SUM(cost)`));
 
// Cost trend by day
const dailyCosts = await db
  .select({
    date: sql<string>`DATE(timestamp)`,
    cost: sql<number>`SUM(cost)`,
    tokens: sql<number>`SUM(input_tokens + output_tokens)`,
  })
  .from(llmUsage)
  .where(gte(llmUsage.timestamp, thirtyDaysAgo))
  .groupBy(sql`DATE(timestamp)`)
  .orderBy(sql`DATE(timestamp)`);
 
// Top expensive endpoints
const costByEndpoint = await db
  .select({
    endpoint: llmUsage.endpoint,
    totalCost: sql<number>`SUM(cost)`,
    avgCost: sql<number>`AVG(cost)`,
    p99Cost: sql<number>`PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY cost)`,
  })
  .from(llmUsage)
  .where(gte(llmUsage.timestamp, thirtyDaysAgo))
  .groupBy(llmUsage.endpoint)
  .orderBy(desc(sql`SUM(cost)`));

The dashboard should show four things at a glance: current month spend vs budget, daily trend, breakdown by model, and top endpoints by cost.

Hard Budget Controls

Alerts are not enough. You need circuit breakers that actually stop spending:

async function trackedCompletionWithLimits(
  params: OpenAI.ChatCompletionCreateParams,
  endpoint: string
): Promise<OpenAI.ChatCompletion> {
  // Check budget before making the call
  const currentMonthSpend = await getMonthlySpend();
  if (currentMonthSpend >= MONTHLY_BUDGET) {
    throw new BudgetExceededError(
      `Monthly budget of $${MONTHLY_BUDGET} exhausted. Current: $${currentMonthSpend.toFixed(2)}`
    );
  }
 
  // Estimate and check per-request limit
  const estimatedCost = estimateRequestCost(params);
  if (estimatedCost > MAX_COST_PER_REQUEST) {
    // Try to reduce cost: truncate context or use cheaper model
    params = reduceCost(params);
  }
 
  return trackedCompletion(params, endpoint);
}

The Key Takeaway

LLM cost tracking is not optional. Build it in from day one, before you get the surprise invoice. The implementation is straightforward: count tokens, calculate costs, store records, set alerts, add circuit breakers.

If you want this handled out of the box, Transactional's LLM Observability gives you real-time cost dashboards, per-model breakdowns, and configurable budget alerts without writing the tracking infrastructure yourself. But even if you build it in-house, the patterns above will get you most of the way there.

Track Your LLM Costs in Real-Time Before They Surprise You

What You Will Learn

Token Counting Per Provider

Counting Tokens Before the Request

Building Cost Tracking Middleware

Wrapping API Calls

Setting Up Alert Thresholds

Building a Cost Dashboard

Hard Budget Controls

The Key Takeaway

Sources & References

Related Posts

Webhooks Will Fail. Here are the Retry and Idempotency Patterns That Save You.

Your AI Agent Will Crash in Production. Plan for It.

Build an AI Chatbot with Persistent Memory in 30 Minutes

YOUR AGENTS DESERVE
REAL INFRASTRUCTURE.

Track Your LLM Costs in Real-Time Before They Surprise You

What You Will Learn

Token Counting Per Provider

Counting Tokens Before the Request

Building Cost Tracking Middleware

Wrapping API Calls

Setting Up Alert Thresholds

Building a Cost Dashboard

Hard Budget Controls

The Key Takeaway

Sources & References

Related Posts

Webhooks Will Fail. Here are the Retry and Idempotency Patterns That Save You.

Your AI Agent Will Crash in Production. Plan for It.

Build an AI Chatbot with Persistent Memory in 30 Minutes

YOUR AGENTS DESERVEREAL INFRASTRUCTURE.

YOUR AGENTS DESERVE
REAL INFRASTRUCTURE.