Build an AI Chatbot with Persistent Memory in 30 Minutes
Step-by-step tutorial for building an AI chatbot that remembers previous conversations using vector storage, embeddings, and context retrieval. TypeScript code throughout.
Transactional Team
Feb 16, 2026
>>
12 min read
Share
Most AI chatbots have the memory of a goldfish. Every conversation starts from zero. The user explained their project three days ago, and today the bot asks "What are you working on?" again.
A common pitfall with early chatbot deployments is that they are technically impressive yet practically limited, because they cannot remember that a customer already described their problem in a previous conversation. Adding persistent memory to a chatbot can dramatically improve resolution rates -- in typical deployments, jumping from around 34% to 71%.
Here is how to build a chatbot with persistent memory from scratch, in about 30 minutes.
What You Will Build
A chatbot that:
Remembers previous conversations with each user
Retrieves relevant past context automatically
Uses vector embeddings for semantic search over memory
Stores memories in PostgreSQL with pgvector
Memory Chatbot Architecture
pgvectorVector Storage
1536-dimEmbedding Size
34% to 71%Resolution Rate Improvement
5-7Optimal Memories per Prompt
Prerequisites
Node.js 20+
PostgreSQL with the pgvector extension
An OpenAI API key (for embeddings and completions)
Step 1: Set Up the Database
First, enable pgvector and create the tables:
-- Enable the vector extensionCREATE EXTENSION IF NOT EXISTS vector;-- Conversations tableCREATE TABLE conversations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id VARCHAR(255) NOT NULL, started_at TIMESTAMP DEFAULT NOW(), ended_at TIMESTAMP);-- Messages tableCREATE TABLE messages ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), conversation_id UUID REFERENCES conversations(id), role VARCHAR(20) NOT NULL, -- 'user' or 'assistant' content TEXT NOT NULL, created_at TIMESTAMP DEFAULT NOW());-- Memory embeddings tableCREATE TABLE memories ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id VARCHAR(255) NOT NULL, conversation_id UUID REFERENCES conversations(id), content TEXT NOT NULL, summary TEXT, -- condensed version for context embedding vector(1536), -- OpenAI text-embedding-3-small dimension created_at TIMESTAMP DEFAULT NOW());-- Index for fast similarity searchCREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Step 2: Create the Embedding Service
Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts produce similar vectors, which enables semantic search over memories.
The memory store handles saving and retrieving memories. The key operation is similarity search: given a new message, find the most relevant past memories.
// memory-store.tsimport { Pool } from 'pg';import { createEmbedding } from './embedding';const pool = new Pool({ connectionString: process.env.DATABASE_URL,});interface Memory { id: string; userId: string; content: string; summary: string; similarity: number; createdAt: Date;}export async function storeMemory( userId: string, conversationId: string, content: string, summary: string): Promise<void> { const embedding = await createEmbedding(content); await pool.query( `INSERT INTO memories (user_id, conversation_id, content, summary, embedding) VALUES ($1, $2, $3, $4, $5)`, [userId, conversationId, content, summary, JSON.stringify(embedding)] );}export async function retrieveRelevantMemories( userId: string, query: string, limit: number = 5): Promise<Memory[]> { const queryEmbedding = await createEmbedding(query); const result = await pool.query( `SELECT id, user_id, content, summary, 1 - (embedding <=> $1::vector) AS similarity, created_at FROM memories WHERE user_id = $2 ORDER BY embedding <=> $1::vector LIMIT $3`, [JSON.stringify(queryEmbedding), userId, limit] ); return result.rows.map((row) => ({ id: row.id, userId: row.user_id, content: row.content, summary: row.summary, similarity: row.similarity, createdAt: row.created_at, }));}
The <=> operator computes cosine distance. Lower distance means higher similarity. We convert to similarity by subtracting from 1.
Step 4: Summarize Conversations for Memory
Raw conversation transcripts are too verbose for memory storage. Summarize each conversation into key facts:
// summarizer.tsimport OpenAI from 'openai';const openai = new OpenAI();interface Message { role: 'user' | 'assistant'; content: string;}export async function summarizeConversation( messages: Message[]): Promise<string> { const transcript = messages .map((m) => `${m.role}: ${m.content}`) .join('\n'); const response = await openai.chat.completions.create({ model: 'gpt-4o-mini', messages: [ { role: 'system', content: `Summarize this conversation into key facts about the user and the outcome. Focus on:- User preferences and requirements mentioned- Problems discussed and solutions provided- Decisions made- Any follow-up itemsBe concise. Use bullet points. Include only information that would be useful in future conversations.`, }, { role: 'user', content: transcript, }, ], max_tokens: 300, }); return response.choices[0].message.content ?? '';}
Step 5: Build the Chatbot
Now tie everything together. The chatbot retrieves relevant memories before responding:
// chatbot.tsimport OpenAI from 'openai';import { retrieveRelevantMemories, storeMemory } from './memory-store';import { summarizeConversation } from './summarizer';const openai = new OpenAI();interface ChatMessage { role: 'user' | 'assistant'; content: string;}export class MemoryChatbot { private userId: string; private conversationId: string; private messages: ChatMessage[] = []; constructor(userId: string, conversationId: string) { this.userId = userId; this.conversationId = conversationId; } async chat(userMessage: string): Promise<string> { // 1. Retrieve relevant memories const memories = await retrieveRelevantMemories( this.userId, userMessage, 5 ); // 2. Filter by similarity threshold const relevantMemories = memories.filter((m) => m.similarity > 0.3); // 3. Build the system prompt with memory context const systemPrompt = this.buildSystemPrompt(relevantMemories); // 4. Add user message to history this.messages.push({ role: 'user', content: userMessage }); // 5. Call the LLM const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'system', content: systemPrompt }, ...this.messages, ], max_tokens: 1000, }); const assistantMessage = response.choices[0].message.content ?? ''; // 6. Add assistant response to history this.messages.push({ role: 'assistant', content: assistantMessage }); return assistantMessage; } private buildSystemPrompt( memories: { summary: string; createdAt: Date }[] ): string { let prompt = `You are a helpful assistant. Be concise and direct.`; if (memories.length > 0) { prompt += `\n\nYou have the following memories from previous conversations with this user:\n`; for (const memory of memories) { const date = memory.createdAt.toISOString().split('T')[0]; prompt += `\n[${date}] ${memory.summary}`; } prompt += `\n\nUse these memories to provide personalized, context-aware responses. Reference past conversations naturally when relevant.`; } return prompt; } async endConversation(): Promise<void> { if (this.messages.length < 2) return; // Summarize and store the conversation const summary = await summarizeConversation(this.messages); const fullContent = this.messages .map((m) => `${m.role}: ${m.content}`) .join('\n'); await storeMemory( this.userId, this.conversationId, fullContent, summary ); }}
Step 6: Add the API Layer
Expose the chatbot through a simple HTTP endpoint:
// server.tsimport express from 'express';import { randomUUID } from 'crypto';import { MemoryChatbot } from './chatbot';const app = express();app.use(express.json());// Store active chatbot sessionsconst sessions = new Map<string, MemoryChatbot>();app.post('/chat', async (req, res) => { const { userId, message, sessionId } = req.body; // Get or create session let bot = sessions.get(sessionId); if (!bot) { const conversationId = randomUUID(); bot = new MemoryChatbot(userId, conversationId); sessions.set(sessionId, bot); } const response = await bot.chat(message); res.json({ response });});app.post('/chat/end', async (req, res) => { const { sessionId } = req.body; const bot = sessions.get(sessionId); if (bot) { await bot.endConversation(); sessions.delete(sessionId); } res.json({ ok: true });});app.listen(3000, () => { console.log('Chatbot server running on port 3000');});
Step 7: Test Memory Retrieval
Let us verify the memory system works with a quick test:
// test-memory.tsimport { MemoryChatbot } from './chatbot';async function testMemory() { // First conversation const bot1 = new MemoryChatbot('user-123', 'conv-1'); await bot1.chat('I am building a React app with Next.js 15'); await bot1.chat('I prefer Tailwind CSS over styled-components'); await bot1.chat( 'My main challenge is server component data fetching' ); await bot1.endConversation(); console.log('First conversation stored.\n'); // Second conversation - should remember the first const bot2 = new MemoryChatbot('user-123', 'conv-2'); const response = await bot2.chat( 'Can you help me with my project?' ); console.log('Bot response:', response); // Should reference Next.js, Tailwind, and data fetching // without the user having to repeat any of it}testMemory();
When the user says "Can you help me with my project?" in the second conversation, the bot retrieves the stored memories from the first conversation. It knows the user is building a React/Next.js app with Tailwind, and that they were struggling with server component data fetching. No repetition needed.
Performance Considerations
A few important lessons from production deployments:
Batch your embeddings. If you are storing multiple memories, use createEmbeddings (plural) instead of calling createEmbedding in a loop. One API call with 10 texts is cheaper and faster than 10 separate calls.
Set a similarity threshold. Not every memory is relevant. A threshold of 0.3 works well for cosine similarity. Below that, the memories are likely noise.
Limit memory context size. Do not dump 50 memories into the system prompt. Five to seven relevant memories is usually the sweet spot. Beyond that, you are paying for tokens that dilute the signal.
Use summaries, not raw transcripts. Storing and retrieving full conversation transcripts wastes tokens and reduces retrieval quality. Summaries are cheaper to embed, faster to retrieve, and produce better results.
What You Built
In about 30 minutes, you have built a chatbot that genuinely remembers users across conversations. The core pattern is simple: embed, store, retrieve, inject. The same pattern works whether you are building a customer support bot, a personal assistant, or an AI tutor.
If you want managed memory infrastructure instead of running your own pgvector setup, Transactional's Memory module provides the vector storage, embedding pipeline, and retrieval API out of the box. But the patterns above will serve you well regardless of how you host it.