Tutorials

12 min read

Build an AI Chatbot with Persistent Memory in 30 Minutes

Step-by-step tutorial for building an AI chatbot that remembers previous conversations using vector storage, embeddings, and context retrieval. TypeScript code throughout.

Transactional Team

Feb 16, 2026

12 min read

Build an AI Chatbot with Persistent Memory in 30 Minutes

Most AI chatbots have the memory of a goldfish. Every conversation starts from zero. The user explained their project three days ago, and today the bot asks "What are you working on?" again.

A common pitfall with early chatbot deployments is that they are technically impressive yet practically limited, because they cannot remember that a customer already described their problem in a previous conversation. Adding persistent memory to a chatbot can dramatically improve resolution rates -- in typical deployments, jumping from around 34% to 71%.

Here is how to build a chatbot with persistent memory from scratch, in about 30 minutes.

What You Will Build

A chatbot that:

Remembers previous conversations with each user
Retrieves relevant past context automatically
Uses vector embeddings for semantic search over memory
Stores memories in PostgreSQL with pgvector

Memory Chatbot Architecture

pgvectorVector Storage

1536-dimEmbedding Size

34% to 71%Resolution Rate Improvement

5-7Optimal Memories per Prompt

Prerequisites

Node.js 20+
PostgreSQL with the pgvector extension
An OpenAI API key (for embeddings and completions)

Step 1: Set Up the Database

First, enable pgvector and create the tables:

-- Enable the vector extension
CREATE EXTENSION IF NOT EXISTS vector;
 
-- Conversations table
CREATE TABLE conversations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id VARCHAR(255) NOT NULL,
  started_at TIMESTAMP DEFAULT NOW(),
  ended_at TIMESTAMP
);
 
-- Messages table
CREATE TABLE messages (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  conversation_id UUID REFERENCES conversations(id),
  role VARCHAR(20) NOT NULL, -- 'user' or 'assistant'
  content TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);
 
-- Memory embeddings table
CREATE TABLE memories (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id VARCHAR(255) NOT NULL,
  conversation_id UUID REFERENCES conversations(id),
  content TEXT NOT NULL,
  summary TEXT, -- condensed version for context
  embedding vector(1536), -- OpenAI text-embedding-3-small dimension
  created_at TIMESTAMP DEFAULT NOW()
);
 
-- Index for fast similarity search
CREATE INDEX ON memories USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

Step 2: Create the Embedding Service

Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts produce similar vectors, which enables semantic search over memories.

// embedding.ts
import OpenAI from 'openai';
 
const openai = new OpenAI();
 
export async function createEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  });
  return response.data[0].embedding;
}
 
export async function createEmbeddings(
  texts: string[]
): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
  });
  return response.data.map((d) => d.embedding);
}

Step 3: Build the Memory Store

The memory store handles saving and retrieving memories. The key operation is similarity search: given a new message, find the most relevant past memories.

// memory-store.ts
import { Pool } from 'pg';
import { createEmbedding } from './embedding';
 
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
});
 
interface Memory {
  id: string;
  userId: string;
  content: string;
  summary: string;
  similarity: number;
  createdAt: Date;
}
 
export async function storeMemory(
  userId: string,
  conversationId: string,
  content: string,
  summary: string
): Promise<void> {
  const embedding = await createEmbedding(content);
 
  await pool.query(
    `INSERT INTO memories (user_id, conversation_id, content, summary, embedding)
     VALUES ($1, $2, $3, $4, $5)`,
    [userId, conversationId, content, summary, JSON.stringify(embedding)]
  );
}
 
export async function retrieveRelevantMemories(
  userId: string,
  query: string,
  limit: number = 5
): Promise<Memory[]> {
  const queryEmbedding = await createEmbedding(query);
 
  const result = await pool.query(
    `SELECT
       id,
       user_id,
       content,
       summary,
       1 - (embedding <=> $1::vector) AS similarity,
       created_at
     FROM memories
     WHERE user_id = $2
     ORDER BY embedding <=> $1::vector
     LIMIT $3`,
    [JSON.stringify(queryEmbedding), userId, limit]
  );
 
  return result.rows.map((row) => ({
    id: row.id,
    userId: row.user_id,
    content: row.content,
    summary: row.summary,
    similarity: row.similarity,
    createdAt: row.created_at,
  }));
}

The <=> operator computes cosine distance. Lower distance means higher similarity. We convert to similarity by subtracting from 1.

Step 4: Summarize Conversations for Memory

Raw conversation transcripts are too verbose for memory storage. Summarize each conversation into key facts:

// summarizer.ts
import OpenAI from 'openai';
 
const openai = new OpenAI();
 
interface Message {
  role: 'user' | 'assistant';
  content: string;
}
 
export async function summarizeConversation(
  messages: Message[]
): Promise<string> {
  const transcript = messages
    .map((m) => `${m.role}: ${m.content}`)
    .join('\n');
 
  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `Summarize this conversation into key facts about the user and the outcome. Focus on:
- User preferences and requirements mentioned
- Problems discussed and solutions provided
- Decisions made
- Any follow-up items
 
Be concise. Use bullet points. Include only information that would be useful in future conversations.`,
      },
      {
        role: 'user',
        content: transcript,
      },
    ],
    max_tokens: 300,
  });
 
  return response.choices[0].message.content ?? '';
}

Step 5: Build the Chatbot

Now tie everything together. The chatbot retrieves relevant memories before responding:

// chatbot.ts
import OpenAI from 'openai';
import { retrieveRelevantMemories, storeMemory } from './memory-store';
import { summarizeConversation } from './summarizer';
 
const openai = new OpenAI();
 
interface ChatMessage {
  role: 'user' | 'assistant';
  content: string;
}
 
export class MemoryChatbot {
  private userId: string;
  private conversationId: string;
  private messages: ChatMessage[] = [];
 
  constructor(userId: string, conversationId: string) {
    this.userId = userId;
    this.conversationId = conversationId;
  }
 
  async chat(userMessage: string): Promise<string> {
    // 1. Retrieve relevant memories
    const memories = await retrieveRelevantMemories(
      this.userId,
      userMessage,
      5
    );
 
    // 2. Filter by similarity threshold
    const relevantMemories = memories.filter((m) => m.similarity > 0.3);
 
    // 3. Build the system prompt with memory context
    const systemPrompt = this.buildSystemPrompt(relevantMemories);
 
    // 4. Add user message to history
    this.messages.push({ role: 'user', content: userMessage });
 
    // 5. Call the LLM
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [
        { role: 'system', content: systemPrompt },
        ...this.messages,
      ],
      max_tokens: 1000,
    });
 
    const assistantMessage =
      response.choices[0].message.content ?? '';
 
    // 6. Add assistant response to history
    this.messages.push({ role: 'assistant', content: assistantMessage });
 
    return assistantMessage;
  }
 
  private buildSystemPrompt(
    memories: { summary: string; createdAt: Date }[]
  ): string {
    let prompt = `You are a helpful assistant. Be concise and direct.`;
 
    if (memories.length > 0) {
      prompt += `\n\nYou have the following memories from previous conversations with this user:\n`;
      for (const memory of memories) {
        const date = memory.createdAt.toISOString().split('T')[0];
        prompt += `\n[${date}] ${memory.summary}`;
      }
      prompt += `\n\nUse these memories to provide personalized, context-aware responses. Reference past conversations naturally when relevant.`;
    }
 
    return prompt;
  }
 
  async endConversation(): Promise<void> {
    if (this.messages.length < 2) return;
 
    // Summarize and store the conversation
    const summary = await summarizeConversation(this.messages);
 
    const fullContent = this.messages
      .map((m) => `${m.role}: ${m.content}`)
      .join('\n');
 
    await storeMemory(
      this.userId,
      this.conversationId,
      fullContent,
      summary
    );
  }
}

Step 6: Add the API Layer

Expose the chatbot through a simple HTTP endpoint:

// server.ts
import express from 'express';
import { randomUUID } from 'crypto';
import { MemoryChatbot } from './chatbot';
 
const app = express();
app.use(express.json());
 
// Store active chatbot sessions
const sessions = new Map<string, MemoryChatbot>();
 
app.post('/chat', async (req, res) => {
  const { userId, message, sessionId } = req.body;
 
  // Get or create session
  let bot = sessions.get(sessionId);
  if (!bot) {
    const conversationId = randomUUID();
    bot = new MemoryChatbot(userId, conversationId);
    sessions.set(sessionId, bot);
  }
 
  const response = await bot.chat(message);
  res.json({ response });
});
 
app.post('/chat/end', async (req, res) => {
  const { sessionId } = req.body;
  const bot = sessions.get(sessionId);
 
  if (bot) {
    await bot.endConversation();
    sessions.delete(sessionId);
  }
 
  res.json({ ok: true });
});
 
app.listen(3000, () => {
  console.log('Chatbot server running on port 3000');
});

Step 7: Test Memory Retrieval

Let us verify the memory system works with a quick test:

// test-memory.ts
import { MemoryChatbot } from './chatbot';
 
async function testMemory() {
  // First conversation
  const bot1 = new MemoryChatbot('user-123', 'conv-1');
  await bot1.chat('I am building a React app with Next.js 15');
  await bot1.chat('I prefer Tailwind CSS over styled-components');
  await bot1.chat(
    'My main challenge is server component data fetching'
  );
  await bot1.endConversation();
 
  console.log('First conversation stored.\n');
 
  // Second conversation - should remember the first
  const bot2 = new MemoryChatbot('user-123', 'conv-2');
  const response = await bot2.chat(
    'Can you help me with my project?'
  );
  console.log('Bot response:', response);
  // Should reference Next.js, Tailwind, and data fetching
  // without the user having to repeat any of it
}
 
testMemory();

When the user says "Can you help me with my project?" in the second conversation, the bot retrieves the stored memories from the first conversation. It knows the user is building a React/Next.js app with Tailwind, and that they were struggling with server component data fetching. No repetition needed.

Performance Considerations

A few important lessons from production deployments:

Batch your embeddings. If you are storing multiple memories, use createEmbeddings (plural) instead of calling createEmbedding in a loop. One API call with 10 texts is cheaper and faster than 10 separate calls.

Set a similarity threshold. Not every memory is relevant. A threshold of 0.3 works well for cosine similarity. Below that, the memories are likely noise.

Limit memory context size. Do not dump 50 memories into the system prompt. Five to seven relevant memories is usually the sweet spot. Beyond that, you are paying for tokens that dilute the signal.

Use summaries, not raw transcripts. Storing and retrieving full conversation transcripts wastes tokens and reduces retrieval quality. Summaries are cheaper to embed, faster to retrieve, and produce better results.

What You Built

In about 30 minutes, you have built a chatbot that genuinely remembers users across conversations. The core pattern is simple: embed, store, retrieve, inject. The same pattern works whether you are building a customer support bot, a personal assistant, or an AI tutor.

If you want managed memory infrastructure instead of running your own pgvector setup, Transactional's Memory module provides the vector storage, embedding pipeline, and retrieval API out of the box. But the patterns above will serve you well regardless of how you host it.

Sources & References

[1]OpenAI Embeddings Guide — OpenAI
[2]OpenAI Chat Completions API — OpenAI
[3]pgvector - Open-Source Vector Similarity Search for Postgres — pgvector
[4]LangChain Memory Documentation — LangChain

Written by

Transactional Team

Tags:

tutorial

memory

chatbot

Tutorials

Webhooks Will Fail. Here are the Retry and Idempotency Patterns That Save You.

Practical patterns for reliable webhook delivery: exponential backoff with jitter, idempotency keys, dead letter queues, and signature verification. TypeScript code included.

Transactional TeamMar 7, 2026

Industry Insights

We Evaluated 12 LLM Observability Tools. Most of Them Do Not Matter.

A practical evaluation of LLM observability tools across tracing, cost tracking, quality monitoring, and prompt management. What matters, what is marketing, and what to actually look for.

Transactional TeamMar 5, 2026

Case Studies

An Enterprise Team Was Shipping Hallucinations to Users. Traces Showed Them Where.

How an enterprise company with AI-powered customer support reduced hallucination rates from 8% to 0.3% and cut AI issue MTTR from days to minutes using LLM observability and trace-level analysis.

Transactional TeamMar 4, 2026

YOUR AGENTS DESERVE
REAL INFRASTRUCTURE.

START BUILDING AGENTS THAT DO REAL WORK.

Deploy Your First Agent