Your AI Agent Will Crash in Production. Plan for It.
Common AI agent failure modes and how to handle them: tool execution failures, context window overflow, infinite loops, and hallucinated function calls. Production-ready error patterns with code.
Transactional Team
Mar 3, 2026
>>
10 min read
Share
Consider a common production scenario: an AI agent works perfectly in testing, then crashes within hours of deployment.
The failure is simple: the agent calls a tool that returns a 50KB JSON response. The agent includes that response in its next prompt, which pushes the conversation past the context window limit. The API returns a 400 error. The agent's error handler retries the same request. Same error. Infinite loop. Dead.
AI agents fail in ways traditional software does not. The failure modes are novel, the debugging is harder, and the consequences range from wasted money to incorrect actions taken on behalf of users. Here is how to plan for it.
What You Will Learn
The five most common AI agent failure modes
Error boundaries that prevent cascading failures
Graceful degradation patterns
Logging and alerting for agent-specific issues
Production-ready code for each pattern
AI Agent Failure Modes in Production
38%Tool Execution Failures
24%Context Window Overflow
18%Infinite Loops
12%Hallucinated Calls
8%Cascading Failures
Failure Mode 1: Tool Execution Failures
The agent decides to call a tool. The tool fails. Now what?
Most agent frameworks just pass the error back to the LLM and hope it figures out what to do. Sometimes it does. Often it retries the exact same failing call, or hallucinates a workaround that makes things worse.
// tool-executor.tsinterface ToolResult { success: boolean; data?: unknown; error?: string;}class SafeToolExecutor { private maxToolRetries = 2; async executeTool( toolName: string, args: Record<string, unknown> ): Promise<ToolResult> { for (let attempt = 0; attempt <= this.maxToolRetries; attempt++) { try { const result = await this.runTool(toolName, args); // Validate the result is not too large for context const resultSize = JSON.stringify(result).length; if (resultSize > 10_000) { return { success: true, data: this.truncateResult(result, 10_000), }; } return { success: true, data: result }; } catch (error) { const err = error as Error; // Non-retryable errors if (this.isNonRetryable(err)) { return { success: false, error: `Tool "${toolName}" failed: ${err.message}. Do not retry this tool.`, }; } // Last attempt if (attempt === this.maxToolRetries) { return { success: false, error: `Tool "${toolName}" failed after ${attempt + 1} attempts: ${err.message}. Try a different approach.`, }; } // Wait before retry await new Promise((r) => setTimeout(r, 1000 * (attempt + 1)) ); } } return { success: false, error: 'Unexpected execution path' }; } private isNonRetryable(error: Error): boolean { return ( error.message.includes('not found') || error.message.includes('permission denied') || error.message.includes('invalid argument') || error.message.includes('401') || error.message.includes('403') || error.message.includes('404') ); } private truncateResult( result: unknown, maxLength: number ): unknown { const json = JSON.stringify(result); if (json.length <= maxLength) return result; // For arrays, return first N items if (Array.isArray(result)) { const truncated = []; let currentLength = 2; // [] for (const item of result) { const itemJson = JSON.stringify(item); if (currentLength + itemJson.length + 1 > maxLength) break; truncated.push(item); currentLength += itemJson.length + 1; // +1 for comma } return truncated; } // For objects, return a summary return { _truncated: true, _originalSize: json.length, _preview: json.substring(0, maxLength), }; } private async runTool( name: string, args: Record<string, unknown> ): Promise<unknown> { // Your actual tool execution logic throw new Error('Not implemented'); }}
The key decisions: truncate large results before they hit the context window, distinguish retryable from non-retryable errors, and give the LLM clear instructions about what happened.
Failure Mode 2: Context Window Overflow
Every LLM has a context limit. As conversations grow, tool results accumulate, and the context fills up. When you exceed the limit, the API rejects the request.
// context-manager.tsinterface Message { role: 'system' | 'user' | 'assistant' | 'tool'; content: string; tokens?: number;}class ContextManager { private maxTokens: number; private reservedForResponse: number; private messages: Message[] = []; constructor( maxContextTokens: number, reservedForResponse: number = 4096 ) { this.maxTokens = maxContextTokens; this.reservedForResponse = reservedForResponse; } addMessage(message: Message): void { message.tokens = this.estimateTokens(message.content); this.messages.push(message); this.pruneIfNeeded(); } getMessages(): Message[] { return this.messages; } private pruneIfNeeded(): void { const totalTokens = this.getTotalTokens(); const available = this.maxTokens - this.reservedForResponse; if (totalTokens <= available) return; // Strategy: Keep system message + last N messages // Summarize removed messages const systemMessages = this.messages.filter( (m) => m.role === 'system' ); const otherMessages = this.messages.filter( (m) => m.role !== 'system' ); // Remove oldest non-system messages until under limit let currentTokens = systemMessages.reduce( (sum, m) => sum + (m.tokens ?? 0), 0 ); const kept: Message[] = []; // Work backwards from most recent for (let i = otherMessages.length - 1; i >= 0; i--) { const msg = otherMessages[i]; if ( currentTokens + (msg.tokens ?? 0) > available ) { break; } kept.unshift(msg); currentTokens += msg.tokens ?? 0; } // Add a summary of dropped messages const droppedCount = otherMessages.length - kept.length; if (droppedCount > 0) { const summaryMsg: Message = { role: 'system', content: `[${droppedCount} earlier messages were pruned from context to stay within limits. The conversation continues from the remaining messages.]`, }; summaryMsg.tokens = this.estimateTokens(summaryMsg.content); this.messages = [...systemMessages, summaryMsg, ...kept]; } else { this.messages = [...systemMessages, ...kept]; } } private getTotalTokens(): number { return this.messages.reduce( (sum, m) => sum + (m.tokens ?? 0), 0 ); } private estimateTokens(text: string): number { // Rough estimate: 1 token per 4 characters return Math.ceil(text.length / 4); }}
Failure Mode 3: Infinite Loops
The agent gets stuck in a loop: call tool, get result, decide to call the same tool again with the same arguments. This burns tokens and money.
The agent calls a tool that does not exist, or passes arguments in the wrong format. This happens more often than you would think, especially with complex tool schemas.
// tool-validator.tsinterface ToolSchema { name: string; parameters: Record< string, { type: string; required?: boolean; enum?: string[]; } >;}class ToolValidator { private schemas: Map<string, ToolSchema>; constructor(schemas: ToolSchema[]) { this.schemas = new Map(schemas.map((s) => [s.name, s])); } validate( toolName: string, args: Record<string, unknown> ): { valid: boolean; error?: string } { const schema = this.schemas.get(toolName); if (!schema) { const available = Array.from(this.schemas.keys()).join(', '); return { valid: false, error: `Tool "${toolName}" does not exist. Available tools: ${available}`, }; } // Check required parameters for (const [paramName, paramSchema] of Object.entries( schema.parameters )) { if (paramSchema.required && !(paramName in args)) { return { valid: false, error: `Missing required parameter "${paramName}" for tool "${toolName}"`, }; } if (paramName in args) { const value = args[paramName]; // Type checking if ( paramSchema.type === 'string' && typeof value !== 'string' ) { return { valid: false, error: `Parameter "${paramName}" must be a string, got ${typeof value}`, }; } // Enum checking if ( paramSchema.enum && !paramSchema.enum.includes(value as string) ) { return { valid: false, error: `Parameter "${paramName}" must be one of: ${paramSchema.enum.join(', ')}`, }; } } } return { valid: true }; }}
Failure Mode 5: Cascading Failures
One failure triggers another. The agent fails to read a file, so it guesses the contents. The guess is wrong, so it writes incorrect data. The incorrect data causes downstream errors.
Error Boundaries
Wrap agent execution in error boundaries that limit the blast radius:
When the agent fails, do not show users a blank error screen. Fall back to something useful:
async function handleUserRequest( userId: string, message: string): Promise<string> { const result = await runAgentSafely(agent, message, { maxDurationMs: 30_000, onError: (err) => logAgentError(userId, err), }); if (result.success) { return result.response!; } // Graceful degradation tiers if (result.error?.includes('timed out')) { return "I'm taking longer than expected to process this. I've saved your request and will follow up shortly."; } if (result.error?.includes('loop')) { return "I got stuck trying to solve this. Let me connect you with a human who can help."; } // Generic fallback return "I wasn't able to complete this request. Here are some things you can try, or I can connect you with support.";}
The Takeaway
AI agents fail in fundamentally different ways than traditional software. Tool failures, context overflow, infinite loops, hallucinated calls, and cascading errors are not edge cases. They are guaranteed to happen in production.
Build the error boundaries from day one: limit tool calls, detect loops, manage context size, validate tool calls, set hard timeouts, and degrade gracefully. The code above handles the five most common failure modes.
If you want agent observability without building the infrastructure, Transactional's Error Tracking provides AI-specific error categorization, loop detection alerts, and cost anomaly monitoring. But the patterns above are the foundation regardless of your tooling.