Generations
Tracking LLM calls with generations in Observability.
What is a Generation?
A generation represents a single LLM call within your application. It captures the model, input, output, token usage, and cost - everything needed to understand and debug your AI interactions.
Generation Structure
Generation: openai-completion
├── modelName: gpt-4o
├── input: { messages: [...] }
├── output: { content: '...' }
├── promptTokens: 150
├── completionTokens: 50
├── totalTokens: 200
├── cost: $0.0035
├── duration: 1200ms
└── metadata: { temperature: 0.7 }
Creating Generations
const obs = getObservability();
// Create generation before LLM call
const generation = obs.generation({
name: 'chat-completion',
modelName: 'gpt-4o',
input: {
messages: [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' },
],
},
metadata: {
temperature: 0.7,
maxTokens: 1000,
},
});
// Make the LLM call
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [...],
temperature: 0.7,
});
// End generation with output
await generation.end({
output: response.choices[0].message,
promptTokens: response.usage?.prompt_tokens,
completionTokens: response.usage?.completion_tokens,
});Generation Properties
| Property | Type | Description |
|---|---|---|
id | string | Unique identifier |
name | string | Human-readable name |
modelName | string | Model identifier (e.g., gpt-4o) |
input | object | Input to the LLM |
output | object | LLM output |
promptTokens | number | Input tokens |
completionTokens | number | Output tokens |
totalTokens | number | Total tokens used |
cost | number | Calculated cost |
duration | number | Latency in ms |
startTime | Date | When call started |
endTime | Date | When call completed |
metadata | object | Additional context |
Token Counting
Automatic from Provider
Pass token counts from the provider response:
await generation.end({
output: response.choices[0].message,
promptTokens: response.usage?.prompt_tokens,
completionTokens: response.usage?.completion_tokens,
});Estimation
If token counts aren't available, the SDK estimates:
// SDK will estimate based on text length
await generation.end({
output: { content: 'Response text here...' },
// No token counts provided - will be estimated
});Cost Calculation
Costs are calculated automatically based on model pricing:
| Model | Input ($/1M) | Output ($/1M) |
|---|---|---|
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| claude-3-5-sonnet | $3.00 | $15.00 |
| claude-3-haiku | $0.25 | $1.25 |
// Cost = (promptTokens * inputPrice) + (completionTokens * outputPrice)
// For gpt-4o with 1000 prompt, 500 completion:
// Cost = (1000 * $2.50/1M) + (500 * $10.00/1M) = $0.0025 + $0.005 = $0.0075Generation with Streaming
Track streaming generations:
const generation = obs.generation({
name: 'streaming-completion',
modelName: 'gpt-4o',
input: { messages: [...] },
});
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [...],
stream: true,
});
let fullContent = '';
let usage = { prompt_tokens: 0, completion_tokens: 0 };
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
fullContent += content;
process.stdout.write(content);
// Some providers include usage in final chunk
if (chunk.usage) {
usage = chunk.usage;
}
}
await generation.end({
output: { content: fullContent },
promptTokens: usage.prompt_tokens,
completionTokens: usage.completion_tokens,
});Generation with Function Calling
Track tool/function calls:
const generation = obs.generation({
name: 'function-calling',
modelName: 'gpt-4o',
input: {
messages: [...],
tools: [{ type: 'function', function: { name: 'get_weather' } }],
},
});
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [...],
tools: [...],
});
// Capture tool calls in output
await generation.end({
output: {
content: response.choices[0].message.content,
toolCalls: response.choices[0].message.tool_calls,
},
promptTokens: response.usage?.prompt_tokens,
completionTokens: response.usage?.completion_tokens,
});Nested Generations
Create hierarchies for complex workflows:
// Parent trace
const trace = obs.trace({ name: 'rag-pipeline' });
// First generation: embedding
const embedGeneration = obs.generation({
name: 'embed-query',
modelName: 'text-embedding-3-small',
input: { text: query },
});
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query,
});
await embedGeneration.end({
output: { dimensions: embedding.data[0].embedding.length },
promptTokens: embedding.usage.prompt_tokens,
});
// Second generation: completion
const completionGeneration = obs.generation({
name: 'generate-answer',
modelName: 'gpt-4o',
input: { messages: [...] },
});
const response = await openai.chat.completions.create({...});
await completionGeneration.end({
output: response.choices[0].message,
promptTokens: response.usage?.prompt_tokens,
completionTokens: response.usage?.completion_tokens,
});
await trace.end({ output: { answer: response.choices[0].message.content } });Generation Metadata
Add context for debugging:
const generation = obs.generation({
name: 'completion',
modelName: 'gpt-4o',
input: { messages: [...] },
metadata: {
// Model parameters
temperature: 0.7,
maxTokens: 1000,
topP: 0.9,
// Application context
feature: 'chat-v2',
promptVersion: '1.2.0',
// User context
userTier: 'premium',
},
});Viewing Generations
In Trace View
Generations appear as children of traces:
- Go to Traces
- Click a trace
- See generations in the timeline
- View input/output, tokens, cost
Generation List
Filter and search generations:
- Go to Generations
- Filter by:
- Model
- Date range
- Cost threshold
- Token count
Generation Detail
Click a generation to see:
- Full input (messages, parameters)
- Full output
- Token breakdown
- Latency breakdown
- Cost calculation
Best Practices
1. Always Track Token Usage
// Good - pass actual token counts
await generation.end({
output: response.choices[0].message,
promptTokens: response.usage?.prompt_tokens,
completionTokens: response.usage?.completion_tokens,
});
// Less accurate - tokens estimated
await generation.end({
output: response.choices[0].message,
});2. Use Descriptive Names
// Good
generation({ name: 'summarize-document' });
generation({ name: 'extract-entities' });
generation({ name: 'generate-code' });
// Bad
generation({ name: 'llm' });
generation({ name: 'call' });3. Include Model Parameters
generation({
name: 'completion',
modelName: 'gpt-4o',
input: { messages: [...] },
metadata: {
temperature: 0.7,
maxTokens: 1000,
// Helps debug output quality
},
});4. Handle Errors
const generation = obs.generation({...});
try {
const response = await openai.chat.completions.create({...});
await generation.end({ output: response, ... });
} catch (error) {
await generation.error(error);
throw error;
}Next Steps
- Spans - Custom spans for non-LLM work
- Traces - Parent trace context
- Cost Analysis - Analyze LLM costs
On This Page
- What is a Generation?
- Generation Structure
- Creating Generations
- Generation Properties
- Token Counting
- Automatic from Provider
- Estimation
- Cost Calculation
- Generation with Streaming
- Generation with Function Calling
- Nested Generations
- Generation Metadata
- Viewing Generations
- In Trace View
- Generation List
- Generation Detail
- Best Practices
- 1. Always Track Token Usage
- 2. Use Descriptive Names
- 3. Include Model Parameters
- 4. Handle Errors
- Next Steps