What is a Generation?

A generation represents a single LLM call within your application. It captures the model, input, output, token usage, and cost - everything needed to understand and debug your AI interactions.

Generation Structure

Generation: openai-completion
├── modelName: gpt-4o
├── input: { messages: [...] }
├── output: { content: '...' }
├── promptTokens: 150
├── completionTokens: 50
├── totalTokens: 200
├── cost: $0.0035
├── duration: 1200ms
└── metadata: { temperature: 0.7 }

Creating Generations

const obs = getObservability();
 
// Create generation before LLM call
const generation = obs.generation({
  name: 'chat-completion',
  modelName: 'gpt-4o',
  input: {
    messages: [
      { role: 'system', content: 'You are helpful.' },
      { role: 'user', content: 'Hello!' },
    ],
  },
  metadata: {
    temperature: 0.7,
    maxTokens: 1000,
  },
});
 
// Make the LLM call
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
  temperature: 0.7,
});
 
// End generation with output
await generation.end({
  output: response.choices[0].message,
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
});

Generation Properties

Property	Type	Description
`id`	string	Unique identifier
`name`	string	Human-readable name
`modelName`	string	Model identifier (e.g., `gpt-4o`)
`input`	object	Input to the LLM
`output`	object	LLM output
`promptTokens`	number	Input tokens
`completionTokens`	number	Output tokens
`totalTokens`	number	Total tokens used
`cost`	number	Calculated cost
`duration`	number	Latency in ms
`startTime`	Date	When call started
`endTime`	Date	When call completed
`metadata`	object	Additional context

Token Counting

Automatic from Provider

Pass token counts from the provider response:

await generation.end({
  output: response.choices[0].message,
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
});

Estimation

If token counts aren't available, the SDK estimates:

// SDK will estimate based on text length
await generation.end({
  output: { content: 'Response text here...' },
  // No token counts provided - will be estimated
});

Cost Calculation

Costs are calculated automatically based on model pricing:

Model	Input ($/1M)	Output ($/1M)
gpt-4o	$2.50	$10.00
gpt-4o-mini	$0.15	$0.60
claude-3-5-sonnet	$3.00	$15.00
claude-3-haiku	$0.25	$1.25

// Cost = (promptTokens * inputPrice) + (completionTokens * outputPrice)
// For gpt-4o with 1000 prompt, 500 completion:
// Cost = (1000 * $2.50/1M) + (500 * $10.00/1M) = $0.0025 + $0.005 = $0.0075

Generation with Streaming

Track streaming generations:

const generation = obs.generation({
  name: 'streaming-completion',
  modelName: 'gpt-4o',
  input: { messages: [...] },
});
 
const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
  stream: true,
});
 
let fullContent = '';
let usage = { prompt_tokens: 0, completion_tokens: 0 };
 
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || '';
  fullContent += content;
  process.stdout.write(content);
 
  // Some providers include usage in final chunk
  if (chunk.usage) {
    usage = chunk.usage;
  }
}
 
await generation.end({
  output: { content: fullContent },
  promptTokens: usage.prompt_tokens,
  completionTokens: usage.completion_tokens,
});

Generation with Function Calling

Track tool/function calls:

const generation = obs.generation({
  name: 'function-calling',
  modelName: 'gpt-4o',
  input: {
    messages: [...],
    tools: [{ type: 'function', function: { name: 'get_weather' } }],
  },
});
 
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
  tools: [...],
});
 
// Capture tool calls in output
await generation.end({
  output: {
    content: response.choices[0].message.content,
    toolCalls: response.choices[0].message.tool_calls,
  },
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
});

Nested Generations

Create hierarchies for complex workflows:

// Parent trace
const trace = obs.trace({ name: 'rag-pipeline' });
 
// First generation: embedding
const embedGeneration = obs.generation({
  name: 'embed-query',
  modelName: 'text-embedding-3-small',
  input: { text: query },
});
 
const embedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: query,
});
 
await embedGeneration.end({
  output: { dimensions: embedding.data[0].embedding.length },
  promptTokens: embedding.usage.prompt_tokens,
});
 
// Second generation: completion
const completionGeneration = obs.generation({
  name: 'generate-answer',
  modelName: 'gpt-4o',
  input: { messages: [...] },
});
 
const response = await openai.chat.completions.create({...});
 
await completionGeneration.end({
  output: response.choices[0].message,
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
});
 
await trace.end({ output: { answer: response.choices[0].message.content } });

Generation Metadata

Add context for debugging:

const generation = obs.generation({
  name: 'completion',
  modelName: 'gpt-4o',
  input: { messages: [...] },
  metadata: {
    // Model parameters
    temperature: 0.7,
    maxTokens: 1000,
    topP: 0.9,
 
    // Application context
    feature: 'chat-v2',
    promptVersion: '1.2.0',
 
    // User context
    userTier: 'premium',
  },
});

Viewing Generations

In Trace View

Generations appear as children of traces:

Go to Traces
Click a trace
See generations in the timeline
View input/output, tokens, cost

Generation List

Filter and search generations:

Go to Generations
Filter by:
- Model
- Date range
- Cost threshold
- Token count

Generation Detail

Click a generation to see:

Full input (messages, parameters)
Full output
Token breakdown
Latency breakdown
Cost calculation

Best Practices

1. Always Track Token Usage

// Good - pass actual token counts
await generation.end({
  output: response.choices[0].message,
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
});
 
// Less accurate - tokens estimated
await generation.end({
  output: response.choices[0].message,
});

2. Use Descriptive Names

// Good
generation({ name: 'summarize-document' });
generation({ name: 'extract-entities' });
generation({ name: 'generate-code' });
 
// Bad
generation({ name: 'llm' });
generation({ name: 'call' });

3. Include Model Parameters

generation({
  name: 'completion',
  modelName: 'gpt-4o',
  input: { messages: [...] },
  metadata: {
    temperature: 0.7,
    maxTokens: 1000,
    // Helps debug output quality
  },
});

4. Handle Errors

const generation = obs.generation({...});
 
try {
  const response = await openai.chat.completions.create({...});
  await generation.end({ output: response, ... });
} catch (error) {
  await generation.error(error);
  throw error;
}

Next Steps

Spans - Custom spans for non-LLM work
Traces - Parent trace context
Cost Analysis - Analyze LLM costs