Transactional

Generations

Tracking LLM calls with generations in Observability.

What is a Generation?

A generation represents a single LLM call within your application. It captures the model, input, output, token usage, and cost - everything needed to understand and debug your AI interactions.

Generation Structure

Generation: openai-completion
├── modelName: gpt-4o
├── input: { messages: [...] }
├── output: { content: '...' }
├── promptTokens: 150
├── completionTokens: 50
├── totalTokens: 200
├── cost: $0.0035
├── duration: 1200ms
└── metadata: { temperature: 0.7 }

Creating Generations

const obs = getObservability();
 
// Create generation before LLM call
const generation = obs.generation({
  name: 'chat-completion',
  modelName: 'gpt-4o',
  input: {
    messages: [
      { role: 'system', content: 'You are helpful.' },
      { role: 'user', content: 'Hello!' },
    ],
  },
  metadata: {
    temperature: 0.7,
    maxTokens: 1000,
  },
});
 
// Make the LLM call
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
  temperature: 0.7,
});
 
// End generation with output
await generation.end({
  output: response.choices[0].message,
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
});

Generation Properties

PropertyTypeDescription
idstringUnique identifier
namestringHuman-readable name
modelNamestringModel identifier (e.g., gpt-4o)
inputobjectInput to the LLM
outputobjectLLM output
promptTokensnumberInput tokens
completionTokensnumberOutput tokens
totalTokensnumberTotal tokens used
costnumberCalculated cost
durationnumberLatency in ms
startTimeDateWhen call started
endTimeDateWhen call completed
metadataobjectAdditional context

Token Counting

Automatic from Provider

Pass token counts from the provider response:

await generation.end({
  output: response.choices[0].message,
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
});

Estimation

If token counts aren't available, the SDK estimates:

// SDK will estimate based on text length
await generation.end({
  output: { content: 'Response text here...' },
  // No token counts provided - will be estimated
});

Cost Calculation

Costs are calculated automatically based on model pricing:

ModelInput ($/1M)Output ($/1M)
gpt-4o$2.50$10.00
gpt-4o-mini$0.15$0.60
claude-3-5-sonnet$3.00$15.00
claude-3-haiku$0.25$1.25
// Cost = (promptTokens * inputPrice) + (completionTokens * outputPrice)
// For gpt-4o with 1000 prompt, 500 completion:
// Cost = (1000 * $2.50/1M) + (500 * $10.00/1M) = $0.0025 + $0.005 = $0.0075

Generation with Streaming

Track streaming generations:

const generation = obs.generation({
  name: 'streaming-completion',
  modelName: 'gpt-4o',
  input: { messages: [...] },
});
 
const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
  stream: true,
});
 
let fullContent = '';
let usage = { prompt_tokens: 0, completion_tokens: 0 };
 
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || '';
  fullContent += content;
  process.stdout.write(content);
 
  // Some providers include usage in final chunk
  if (chunk.usage) {
    usage = chunk.usage;
  }
}
 
await generation.end({
  output: { content: fullContent },
  promptTokens: usage.prompt_tokens,
  completionTokens: usage.completion_tokens,
});

Generation with Function Calling

Track tool/function calls:

const generation = obs.generation({
  name: 'function-calling',
  modelName: 'gpt-4o',
  input: {
    messages: [...],
    tools: [{ type: 'function', function: { name: 'get_weather' } }],
  },
});
 
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
  tools: [...],
});
 
// Capture tool calls in output
await generation.end({
  output: {
    content: response.choices[0].message.content,
    toolCalls: response.choices[0].message.tool_calls,
  },
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
});

Nested Generations

Create hierarchies for complex workflows:

// Parent trace
const trace = obs.trace({ name: 'rag-pipeline' });
 
// First generation: embedding
const embedGeneration = obs.generation({
  name: 'embed-query',
  modelName: 'text-embedding-3-small',
  input: { text: query },
});
 
const embedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: query,
});
 
await embedGeneration.end({
  output: { dimensions: embedding.data[0].embedding.length },
  promptTokens: embedding.usage.prompt_tokens,
});
 
// Second generation: completion
const completionGeneration = obs.generation({
  name: 'generate-answer',
  modelName: 'gpt-4o',
  input: { messages: [...] },
});
 
const response = await openai.chat.completions.create({...});
 
await completionGeneration.end({
  output: response.choices[0].message,
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
});
 
await trace.end({ output: { answer: response.choices[0].message.content } });

Generation Metadata

Add context for debugging:

const generation = obs.generation({
  name: 'completion',
  modelName: 'gpt-4o',
  input: { messages: [...] },
  metadata: {
    // Model parameters
    temperature: 0.7,
    maxTokens: 1000,
    topP: 0.9,
 
    // Application context
    feature: 'chat-v2',
    promptVersion: '1.2.0',
 
    // User context
    userTier: 'premium',
  },
});

Viewing Generations

In Trace View

Generations appear as children of traces:

  1. Go to Traces
  2. Click a trace
  3. See generations in the timeline
  4. View input/output, tokens, cost

Generation List

Filter and search generations:

  1. Go to Generations
  2. Filter by:
    • Model
    • Date range
    • Cost threshold
    • Token count

Generation Detail

Click a generation to see:

  • Full input (messages, parameters)
  • Full output
  • Token breakdown
  • Latency breakdown
  • Cost calculation

Best Practices

1. Always Track Token Usage

// Good - pass actual token counts
await generation.end({
  output: response.choices[0].message,
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
});
 
// Less accurate - tokens estimated
await generation.end({
  output: response.choices[0].message,
});

2. Use Descriptive Names

// Good
generation({ name: 'summarize-document' });
generation({ name: 'extract-entities' });
generation({ name: 'generate-code' });
 
// Bad
generation({ name: 'llm' });
generation({ name: 'call' });

3. Include Model Parameters

generation({
  name: 'completion',
  modelName: 'gpt-4o',
  input: { messages: [...] },
  metadata: {
    temperature: 0.7,
    maxTokens: 1000,
    // Helps debug output quality
  },
});

4. Handle Errors

const generation = obs.generation({...});
 
try {
  const response = await openai.chat.completions.create({...});
  await generation.end({ output: response, ... });
} catch (error) {
  await generation.error(error);
  throw error;
}

Next Steps