Tracing AI Agents

Best practices for observing multi-step AI agent workflows.

Overview

AI agents are autonomous systems that use LLMs to plan, execute tools, and iterate toward goals. Proper tracing is essential for debugging, optimization, and understanding agent behavior.

Agent Architecture

User Request
    ↓
[Agent Loop]
├── Think: Plan next action
├── Act: Execute tool/action
├── Observe: Process result
└── Repeat until done
    ↓
Final Response

Tracing Agent Loops

Basic Agent Trace

import { getObservability } from '@transactional/observability';
 
async function runAgent(goal: string): Promise<string> {
  const obs = getObservability();
 
  // Create main trace for entire agent run
  const trace = obs.trace({
    name: 'agent-run',
    input: { goal },
    metadata: {
      agentType: 'react',
      maxIterations: 10,
    },
  });
 
  let iteration = 0;
  const maxIterations = 10;
 
  try {
    while (iteration < maxIterations) {
      iteration++;
 
      // Span for this iteration
      const iterationSpan = obs.observation({
        type: 'SPAN',
        name: `iteration-${iteration}`,
        input: { iteration },
      });
 
      // Think: LLM decides next action
      const thinkGeneration = obs.generation({
        name: 'think',
        modelName: 'gpt-4o',
        parentObservationId: iterationSpan.id,
        input: { prompt: buildPrompt(goal, history) },
      });
 
      const decision = await llm.decide();
 
      await thinkGeneration.end({
        output: decision,
        promptTokens: decision.usage.prompt_tokens,
        completionTokens: decision.usage.completion_tokens,
      });
 
      // Check if done
      if (decision.action === 'finish') {
        await iterationSpan.end({
          output: { status: 'complete', result: decision.result },
        });
        break;
      }
 
      // Act: Execute tool
      const actionSpan = obs.observation({
        type: 'SPAN',
        name: `action-${decision.action}`,
        parentObservationId: iterationSpan.id,
        input: { tool: decision.action, args: decision.args },
      });
 
      const result = await executeTool(decision.action, decision.args);
 
      await actionSpan.end({
        output: { result },
      });
 
      await iterationSpan.end({
        output: {
          action: decision.action,
          result: result,
        },
      });
    }
 
    const finalResult = synthesizeResult(history);
 
    await trace.end({
      output: {
        result: finalResult,
        iterations: iteration,
        toolsUsed: getToolsUsed(history),
      },
    });
 
    return finalResult;
  } catch (error) {
    await trace.error(error as Error);
    throw error;
  }
}

Tool Execution Tracking

Individual Tool Traces

async function executeTool(name: string, args: any): Promise<any> {
  const obs = getObservability();
 
  const span = obs.observation({
    type: 'SPAN',
    name: `tool-${name}`,
    input: { tool: name, args },
    metadata: {
      toolCategory: getToolCategory(name),
    },
  });
 
  try {
    const startTime = Date.now();
    const result = await tools[name](args);
    const duration = Date.now() - startTime;
 
    await span.end({
      output: {
        success: true,
        result,
        duration,
      },
    });
 
    return result;
  } catch (error) {
    await span.end({
      output: {
        success: false,
        error: error.message,
      },
    });
    throw error;
  }
}

Tool Categories

Track tools by category:

const toolCategories = {
  search: ['web_search', 'file_search', 'code_search'],
  action: ['write_file', 'run_code', 'send_email'],
  retrieval: ['read_file', 'fetch_url', 'query_db'],
};
 
const span = obs.observation({
  type: 'SPAN',
  name: `tool-${toolName}`,
  metadata: {
    category: getCategory(toolName),
    isDestructive: isDestructive(toolName),
    requiresApproval: requiresApproval(toolName),
  },
});

Reasoning Visualization

Thought Process Tracking

Capture agent reasoning:

const thinkGeneration = obs.generation({
  name: 'think',
  modelName: 'gpt-4o',
  input: {
    systemPrompt,
    userGoal: goal,
    history: formatHistory(history),
    availableTools: tools.map(t => t.name),
  },
  metadata: {
    thoughtType: 'planning',  // or 'reflection', 'correction'
  },
});

Decision Logging

Track decision points:

await thinkGeneration.end({
  output: {
    thought: decision.thought,
    action: decision.action,
    actionInput: decision.actionInput,
    confidence: decision.confidence,
    alternatives: decision.alternatives,  // Other considered actions
  },
});

Multi-Agent Systems

Agent Coordination

Trace multi-agent interactions:

async function runMultiAgent(task: string) {
  const obs = getObservability();
 
  const trace = obs.trace({
    name: 'multi-agent-task',
    input: { task },
    metadata: {
      agents: ['planner', 'executor', 'reviewer'],
    },
  });
 
  // Planner agent
  const plannerSpan = obs.observation({
    type: 'SPAN',
    name: 'agent-planner',
    input: { task },
  });
 
  const plan = await plannerAgent.plan(task);
 
  await plannerSpan.end({
    output: { plan },
  });
 
  // Executor agent
  const executorSpan = obs.observation({
    type: 'SPAN',
    name: 'agent-executor',
    input: { plan },
  });
 
  const result = await executorAgent.execute(plan);
 
  await executorSpan.end({
    output: { result },
  });
 
  // Reviewer agent
  const reviewerSpan = obs.observation({
    type: 'SPAN',
    name: 'agent-reviewer',
    input: { task, result },
  });
 
  const feedback = await reviewerAgent.review(task, result);
 
  await reviewerSpan.end({
    output: { feedback, approved: feedback.approved },
  });
 
  await trace.end({
    output: {
      result,
      feedback,
      approved: feedback.approved,
    },
  });
}

Error Recovery

Tracking Retries

Monitor error recovery:

async function executeWithRetry(action: string, args: any, maxRetries = 3) {
  const obs = getObservability();
 
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    const span = obs.observation({
      type: 'SPAN',
      name: `action-attempt-${attempt}`,
      input: { action, args, attempt },
    });
 
    try {
      const result = await executeTool(action, args);
      await span.end({ output: { success: true, result } });
      return result;
    } catch (error) {
      await span.end({
        output: {
          success: false,
          error: error.message,
          willRetry: attempt < maxRetries,
        },
      });
 
      if (attempt === maxRetries) throw error;
    }
  }
}

Self-Correction

Track when agent corrects itself:

if (toolResult.error) {
  const correctionGeneration = obs.generation({
    name: 'self-correction',
    modelName: 'gpt-4o',
    input: {
      previousAction: lastAction,
      error: toolResult.error,
      goal,
    },
    metadata: {
      correctionType: 'error-recovery',
    },
  });
 
  const correction = await llm.analyzeError(toolResult.error);
 
  await correctionGeneration.end({
    output: {
      analysis: correction.analysis,
      newStrategy: correction.newStrategy,
    },
  });
}

Performance Monitoring

Agent Efficiency Metrics

Track agent performance:

await trace.end({
  output: { result },
  metadata: {
    iterations: iterationCount,
    toolCalls: toolCallCount,
    totalTokens: totalTokens,
    totalCost: totalCost,
    efficiency: result ? 'success' : 'failure',
    timeToFirstAction: timeToFirstAction,
    totalDuration: totalDuration,
  },
});

Alerts

Set up alerts for:

  • Max iterations exceeded
  • Stuck loops (same action repeated)
  • High token usage
  • Frequent failures

Best Practices

1. Hierarchical Traces

Trace: agent-run
├── Span: iteration-1
│   ├── Generation: think
│   └── Span: tool-search
├── Span: iteration-2
│   ├── Generation: think
│   └── Span: tool-write
└── ...

2. Rich Metadata

trace({
  name: 'agent-run',
  metadata: {
    agentVersion: '2.0',
    modelConfig: { temperature: 0.7 },
    toolsAvailable: toolNames,
    maxBudget: { iterations: 10, tokens: 50000 },
  },
});

3. Track All Decisions

Every LLM call should be a generation:

  • Planning decisions
  • Tool selection
  • Error analysis
  • Final synthesis

4. Monitor Resource Usage

const resourceMonitor = {
  tokensUsed: 0,
  toolCalls: 0,
  iterations: 0,
  checkBudget: () => {
    if (this.tokensUsed > 50000) return 'token_limit';
    if (this.iterations > 10) return 'iteration_limit';
    return 'ok';
  },
};

Next Steps