We Got Tired of Errors Reaching Users First. So We Built Error Tracking.
Error tracking built for AI-native applications. Groups errors by semantic similarity, integrates with LLM traces, and catches AI-specific failure modes that traditional tools miss.
Transactional Team
Jan 14, 2026
//
7 min read
Share
The Error That Traditional Monitoring Misses
Consider a common scenario: an AI support bot starts telling customers their invoices are overdue when they are not. No errors in the logs. No exceptions thrown. The model returns valid JSON with a 200 status code. It is just wrong.
In many cases, these issues are discovered because a customer reports them -- not because monitoring catches them, not because an error tracker fires, but because a human notices.
This is exactly the kind of failure that error tracking needs to understand for AI applications.
Sentry is good. Bugsnag is good. They catch exceptions, group them by stack trace, and show you where things broke. For traditional applications, that works.
But AI applications have failure modes that never throw exceptions:
Hallucinations: The model confidently states something false
Format violations: The model ignores your output schema
Context window overflow: The prompt is silently truncated
Quality degradation: Responses get worse over time without any code change
Cost spikes: A single request burns $2 worth of tokens due to a prompt bug
Stale cache hits: Cached responses become incorrect as underlying data changes
None of these produce a stack trace. None of them trigger a try/catch. All of them break your application.
We built error tracking that catches all of them.
What Makes It Different
AI-Specific Error Types
Beyond standard exceptions, we detect and categorize AI-specific errors:
// Standard errors (what traditional trackers catch)- RuntimeError, TypeError, NetworkError- HTTP 4xx/5xx from providers- Timeout errors// AI-specific errors (what we add)- HallucinationDetected // Response contradicts provided context- FormatViolation // Response does not match expected schema- QualityBelowThreshold // Quality score dropped below configured minimum- CostAnomaly // Request cost exceeds expected range- TokenLimitApproached // Prompt is within 10% of context window- PromptInjectionAttempt // Detected manipulation in user input- GroundingFailure // Response not supported by retrieved documents
Each error type has its own detection logic. Hallucination detection compares the response against the provided context using semantic similarity. Format violations validate against your defined output schema. Cost anomalies flag requests that cost more than 3x the rolling average.
Semantic Grouping
Traditional error trackers group errors by stack trace. Two errors with the same stack trace are the same error. Simple.
But AI errors do not have meaningful stack traces. A hallucination and a correct response follow the exact same code path. The stack trace is identical.
We group errors by semantic similarity instead. If 50 users get a hallucinated response about the same topic, those 50 errors are grouped together even though the exact text is different. The grouping considers:
Error type
Model and provider
Prompt version
Semantic similarity of the input
Semantic similarity of the output
This means your error dashboard shows "Support bot hallucinating about billing dates (47 occurrences)" instead of 47 separate error entries with identical stack traces.
LLM Trace Integration
Every error links directly to its LLM trace. Click an error and you see:
The exact prompt that was sent
The full response that was returned
Token counts and cost
Quality scores across all dimensions
The prompt version that was active
Whether the response was served from cache
This is the context that traditional error trackers cannot provide. When you see a hallucination error, you do not just know it happened -- you see exactly what the model was asked, what it said, and why it was wrong.
import Transactional from "@transactional/sdk";const client = new Transactional({ apiKey: "tx_live_..." });// Errors are automatically captured and linked to tracesconst response = await client.ai.chat({ model: "anthropic/claude-sonnet-4-20250514", messages: [...], validation: { schema: myOutputSchema, // Validate response format qualityThreshold: 0.85, // Flag low-quality responses costThreshold: 0.05, // Flag expensive requests groundingContext: retrievedDocs // Check for hallucinations }});
Real-Time Alerting With Context
Alerts include the context you need to act immediately:
What type of error occurred
How many users are affected
Which model and prompt version
A representative example with full trace
Whether it started after a prompt change or provider issue
Alert channels: email, Slack, PagerDuty, webhooks. Configure severity levels per error type. Hallucinations might be critical. Format violations might be warnings. You decide.
Quick Setup
With AI Gateway (Automatic)
If you use our AI Gateway, error tracking is built in. Enable it in the dashboard and configure your detection thresholds:
// In your AI Gateway settings{ errorTracking: { enabled: true, detectHallucinations: true, qualityThreshold: 0.85, costAnomalyMultiplier: 3.0, formatValidation: true, alertChannels: ["slack", "email"] }}
With Direct Provider Calls (SDK)
Add our error tracking middleware to any LLM client:
The error tracking dashboard is built around three views:
Error Feed shows errors in real-time, grouped semantically. Each group shows the error type, occurrence count, affected users, first/last seen, and status (new, acknowledged, resolved). Filter by error type, model, prompt version, severity, or time range.
Error Detail drills into a specific error group. See every occurrence, the full LLM trace for each, quality scores, and a timeline showing when the error started and how it is trending. If the error correlates with a prompt change or provider issue, we highlight that.
Trends shows error rates over time by type. Overlay with deployment events, prompt version changes, and provider incidents. This is where you see patterns: "hallucination rate jumped 3x after deploying prompt v4.1."
What This Changes
Without AI-aware error tracking, the typical process for finding AI bugs is:
Customer complains
Team searches logs for the request
Someone manually reads the LLM response
Someone decides if it was wrong
Team tries to figure out why
Team greps for similar cases
With AI-native error tracking:
Alert fires with full context
Click through to the trace
See exactly what went wrong and why
Fix the prompt or model configuration
Verify the fix in the quality dashboard
The time from "something is wrong" to "understanding the problem" goes from hours to seconds. The time from "problem understood" to "fix deployed" goes from days to minutes.
For AI applications, the error your users see is not an exception. It is a wrong answer delivered with confidence. Traditional error tracking was never built to catch that. This one was.
Explore the Error Tracking feature page to see the dashboard and start catching AI errors before your users do.