A voice agent that forgets you between calls starts every conversation from zero. It can't greet you by name, recall what you asked last time, or pick up where you left off. Personalization, retrieval-augmented generation over a knowledge base, and memory that carries across sessions all need somewhere durable to live. MongoDB Atlas gives you one home for all three. Flexible schemas hold profiles and session reports, an aggregation pipeline does the heavy lifting, and $vectorSearch runs next to the rest of your data.
This guide walks through five integration patterns for wiring Atlas into a LiveKit voice agent. Every snippet is taken from a working starter kit you can clone.
Why persistent state matters in voice
Voice runs on a tighter latency budget than chat. A two-second pause that reads as "thinking" in a chat window reads as "broken" out loud, so anything you ask the agent to remember or look up has to fit inside one user-stops-speaking to agent-starts-speaking round trip. A bloated system prompt loses to a focused one every time.
Three jobs in particular benefit from moving out of the prompt and into a database.
- Personalization. The agent should know who is on the call before it says hello.
- Knowledge. The agent should answer questions about your product or domain without memorizing everything.
- Memory. The agent should remember what was said last time, last week, last quarter.
Atlas fits all of this because the document model maps cleanly to profiles and transcripts, and vector search lives right beside that data.
The five integration points
LiveKit Agents exposes a small set of hooks and lifecycle callbacks that map one-to-one onto database operations. Mix and match.
| Pattern | LiveKit hook | MongoDB feature |
|---|---|---|
| 1. RAG as a function tool | @function_tool | $vectorSearch aggregation |
| 2. Agentic memory | @function_tool | $vectorSearch with filter fields |
| 3. Identify + pre-load | Agent dispatch metadata + entrypoint | find_one_and_update upsert on users |
| 4. Function-tool CRUD | @function_tool | Any PyMongo async op |
| 5. Session persistence | on_session_end callback | insert_one on sessions |
The starter puts all five in a single MongoAgent class.
Setting up the project
Both halves are bootstrapped from official LiveKit templates.
1lk agent init agent --template agent-starter-python2lk agent init frontend --template agent-starter-react
The agent template adds pymongo>=4.13 for the async client and voyageai for embeddings. No provider plugins for STT, LLM, or TTS are needed because LiveKit Inference handles those through your LiveKit credentials.
Five environment variables, three LiveKit credentials, the MongoDB connection string, and a Voyage API key.
1LIVEKIT_URL=2LIVEKIT_API_KEY=3LIVEKIT_API_SECRET=4MONGODB_URI=5VOYAGE_API_KEY=
The voice pipeline
Before we get to MongoDB, here is the agent's pipeline. Everything is configured through inference.STT, inference.LLM, and inference.TTS, so you swap providers by changing a string.
1session = AgentSession(2stt=inference.STT(model="deepgram/nova-3", language="multi"),3llm=inference.LLM(model="openai/gpt-5.3-chat-latest"),4tts=inference.TTS(5model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"6),7vad=ctx.proc.userdata["vad"],8turn_handling=TurnHandlingOptions(9turn_detection=MultilingualModel(),10preemptive_generation={"enabled": True},11),12)
None of that is MongoDB-specific, but it sets the timing budget the database work has to live within. The agent class is registered with the AgentServer under a name the frontend dispatches to, with an on_session_end callback that Pattern 5 covers.
Pattern 1: RAG as a function tool
The LLM knows when it needs facts. Small talk, confirmations, and greetings don't need a knowledge-base lookup, and running a vector search on every turn burns embedding calls you didn't need. A function tool lets the model decide.
The tradeoff is latency. A tool call adds a hop on top of the vector search, and in voice that hop is dead air. To hide it, borrow a tiny pattern from LiveKit's user feedback guide. Schedule a verbal status update on a short delay, then cancel it if the search finishes first.
The vector search itself lives in a small helper.
1async def _vector_search_knowledge(2db: AsyncDatabase, query: str, limit: int = 33) -> list[dict]:4"""Run the knowledge vector search and return {title, content} docs."""5query_embedding = await embed_text(query, input_type="query")6pipeline = [7{8"$vectorSearch": {9"index": "knowledge_embedding_index",10"path": "embedding",11"queryVector": query_embedding,12"numCandidates": 100,13"limit": limit,14}15},16{"$project": {"title": 1, "content": 1, "_id": 0}},17]18cursor = await db.knowledge.aggregate(pipeline)19return await cursor.to_list(length=limit)
$vectorSearch takes a pre-computed query vector, so we embed the query through Voyage first. The PyMongo async API treats aggregate() as a coroutine that returns an AsyncCursor, so the helper awaits both db.knowledge.aggregate(pipeline) and cursor.to_list(length=limit).
The tool itself is dominated by the status-update pattern, not the database call.
1@function_tool()2async def search_knowledge(3self, context: RunContext, query: str4) -> str:5"""Search the shared knowledge base for facts the user asks about."""67async def _speak_status_update(delay: float = 0.5) -> None:8await asyncio.sleep(delay)9await context.session.generate_reply(10instructions=(11f"You are searching the knowledge base for '{query}' "12"but it is taking a moment. Give the user a brief, "13"one-sentence update that you are looking it up."14)15)1617status_task = asyncio.create_task(_speak_status_update(0.5))18try:19db = await get_db()20results = await _vector_search_knowledge(db, query, limit=3)21finally:22status_task.cancel()23return json.dumps({"results": results})
The timer fires only if the search takes longer than 500ms. On a fast path the task is cancelled in the finally block before asyncio.sleep resolves, so the user never hears a filler phrase. On a slow path the model says something like "just a moment, looking that up" and then answers normally once search_knowledge returns.
The vector index is created once on Atlas with SearchIndexModel.
1SearchIndexModel(2definition={3"fields": [4{5"type": "vector",6"path": "embedding",7"numDimensions": EMBEDDING_DIMENSIONS,8"similarity": "cosine",9},10{"type": "filter", "path": "user_id"},11{"type": "filter", "path": "tenant_id"},12]13},14name=name,15type="vectorSearch",16)
Filter fields matter. $vectorSearch lets you pre-filter candidates with a filter clause, but only on fields declared in the index. Pattern 2 uses those filters to keep one user's memories out of another user's recall.
Pattern 2: Agentic memory as tools
RAG handles knowledge that exists ahead of time. Memory handles knowledge the agent picks up during conversation. The pattern that works best for voice is to expose memory as tools and let the LLM decide what to persist. Five tools cover most cases. remember_detail(memory_type, content) stores or replaces a slot, recall_detail(memory_type) returns it, forget_detail(memory_type) deletes it, search_memories(query) runs hybrid vector and text search, and list_user_memories() returns every slot for this user.
Identity-like fields (name, email, preferred language, timezone) belong on the user's profile document in users, not in free-form memory slots, so Pattern 3 can load them at session start without iterating memories. A sixth tool, update_profile(field, value), writes an allow-listed set of profile fields directly to users.
Memory is modeled as slots. Each (user_id, tenant_id, memory_type) triple holds at most one value, so writing the same label twice replaces the previous entry. That matches how voice agents actually use memory (the user's current favorite color, not a log of every color).
1async def remember(2db: AsyncDatabase,3user_id: str,4tenant_id: str,5memory_type: str,6content: str,7) -> str:8embedding = await embed_text(9f"{memory_type}: {content}", input_type="document"10)11now = _now()12await db.memories.update_one(13{**_scope(user_id, tenant_id), "memory_type": memory_type},14{15"$set": {16"content": content,17"embedding": embedding,18"updated_at": now,19},20"$setOnInsert": {"created_at": now},21},22upsert=True,23)24return f"Remembered ({memory_type}): {content}"
The embedding covers "{memory_type}: {content}" rather than content alone, so a short slot like {memory_type: "name", content: "Jesse"} still encodes that "Jesse" is a name. A unique compound index on (user_id, tenant_id, memory_type) enforces the one-value-per-slot rule under concurrent writes.
Hybrid retrieval with $rankFusion
Exact-label recall only works when the LLM knows which label it used. Ask "what's my favorite color?" and the fact might be stored as color_preference, favorite_color, or user_color. The fix is hybrid retrieval. $rankFusion (MongoDB 8.0+) runs $vectorSearch and $search text pipelines in parallel, then merges them with Reciprocal Rank Fusion.
1pipeline = [2{3"$rankFusion": {4"input": {5"pipelines": {6"vectorSearch": [7{8"$vectorSearch": {9"index": "memories_embedding_index",10"path": "embedding",11"queryVector": query_embedding,12"numCandidates": 100,13"limit": 30,14"filter": scope,15}16}17],18"textSearch": [19{20"$search": {21"index": "memories_text_index",22"compound": {23"should": [24{"text": {"query": query, "path": "memory_type", "fuzzy": {}}},25{"text": {"query": query, "path": "content", "fuzzy": {}}},26]27},28}29},30{"$match": scope},31{"$limit": 30},32],33}34},35"combination": {"weights": {"vectorSearch": 0.7, "textSearch": 0.3}},36}37},38{"$limit": limit},39{"$project": {"_id": 0, "memory_type": 1, "content": 1}},40]
The 0.7 / 0.3 weighting biases toward semantic match while keeping lexical precision for direct hits. Results come back as {memory_type, content} pairs so the LLM can follow up with recall_detail or forget_detail. Both indexes are declared in db/indexes.py. $rankFusion is an 8.0 stage, so the starter needs MongoDB 8.0+ (M10+ runs 8.0 by default).
Pattern 3: Identify the user, then pre-load their context
Before we can load a profile we need a stable id, and it has to reach the agent before it speaks.
LiveKit gives you three places for that data. Job metadata is the right one for session-start identity because ctx.job.metadata is available before ctx.connect(), and the external data docs say to do any network calls in the entrypoint before ctx.connect() so the frontend doesn't render an agent participant that isn't listening yet. Participant attributes don't resolve until after connect, so reach for them when identity changes mid-call.
Server: one httpOnly cookie
The token route owns identity. On first visit it reads lk_mongo_user_cookie, mints a UUID if nothing is there, stamps the id onto the agent dispatch entry, and sends the cookie back.
1// app/api/token/route.ts2import { NextRequest, NextResponse } from 'next/server';3import { AccessToken, type AccessTokenOptions, type VideoGrant } from 'livekit-server-sdk';4import { RoomAgentDispatch, RoomConfiguration } from '@livekit/protocol';56const COOKIE_NAME = 'lk_mongo_user_cookie';7const COOKIE_MAX_AGE = 60 * 60 * 24 * 365;8const AGENT_NAME = process.env.AGENT_NAME;910export async function POST(req: NextRequest) {11let userId = req.cookies.get(COOKIE_NAME)?.value;12const isNewCookie = !userId;13if (!userId) userId = crypto.randomUUID();1415const metadata = JSON.stringify({ user_id: userId, tenant_id: 'default' });16const roomConfig = AGENT_NAME17? new RoomConfiguration({18agents: [new RoomAgentDispatch({ agentName: AGENT_NAME, metadata })],19})20: new RoomConfiguration();2122const participantToken = await createParticipantToken(23{ identity: `voice_assistant_user_${Math.floor(Math.random() * 10_000)}`, name: 'user' },24`voice_assistant_room_${Math.floor(Math.random() * 10_000)}`,25roomConfig,26);2728const res = NextResponse.json({ serverUrl: LIVEKIT_URL, roomName, participantName: 'user', participantToken });29if (isNewCookie) {30res.cookies.set({31name: COOKIE_NAME,32value: userId,33httpOnly: true,34sameSite: 'lax',35secure: process.env.NODE_ENV === 'production',36path: '/',37maxAge: COOKIE_MAX_AGE,38});39}40return res;41}
The cookie is httpOnly, so JavaScript on the page can't read or forge it. Same-origin fetch attaches cookies by default, so TokenSource.endpoint('/api/token') ships the cookie on every token request without extra config. Any room_config the client sends in the body is ignored. The server builds its own RoomConfiguration and stamps the verified id onto agents[0].metadata via RoomAgentDispatch, matching the custom-auth Node.js example.
Read metadata on the agent
Parse ctx.job.metadata before ctx.connect() so preload_user runs in parallel with the connection handshake.
1@server.rtc_session(agent_name="my-agent", on_session_end=on_session_end)2async def my_agent(ctx: JobContext) -> None:3meta: dict[str, str] = {}4if ctx.job.metadata:5try:6meta = json.loads(ctx.job.metadata)7except json.JSONDecodeError:8logger.warning("ctx.job.metadata was not valid JSON; using defaults")910user_id = meta.get("user_id", DEFAULT_USER_ID)11tenant_id = meta.get("tenant_id", DEFAULT_TENANT_ID)12ctx.proc.userdata["user_id"] = user_id13ctx.proc.userdata["tenant_id"] = tenant_id1415initial_ctx = await preload_user(user_id, tenant_id)16# ... build session, start it, connect
DEFAULT_USER_ID is the fallback for console mode (uv run src/agent.py console), where there is no frontend. Stashing the id on ctx.proc.userdata gives on_session_end a place to find it on hangup without threading it through as a parameter.
Pre-load the profile
preload_user does two things. It upserts the users row so every visitor has a stable profile document, and it reads back the document plus all memory slots for this (user_id, tenant_id) scope. Both land in the ChatContext as assistant messages before the LLM speaks.
1async def preload_user(user_id: str, tenant_id: str) -> ChatContext:2"""Pattern 3: upsert the user row, then seed the chat context."""3db = await get_db()4now = _now()5user = await db.users.find_one_and_update(6{"user_id": user_id},7{8"$set": {"last_seen_at": now},9"$setOnInsert": {"user_id": user_id, "created_at": now},10},11upsert=True,12return_document=ReturnDocument.AFTER,13)1415chat_ctx = ChatContext()16name = user.get("name")17email = user.get("email")18prefs = user.get("preferences", {})19if name or email or prefs:20chat_ctx.add_message(21role="assistant",22content=(23f"User profile: name={name or 'unknown'}, "24f"email={email or 'unknown'}, preferences={prefs}."25),26)27else:28chat_ctx.add_message(29role="assistant",30content=(31f"No stored profile fields yet for user_id {user_id}. "32"Greet them as a new user."33),34)3536memories = await list_memories(db, user_id, tenant_id)37if memories:38lines = "\n".join(39f"- {m['memory_type']}: {m['content']}" for m in memories40)41chat_ctx.add_message(42role="assistant",43content=f"Remembered facts from prior sessions:\n{lines}",44)45return chat_ctx
find_one_and_update with upsert=True creates the document if missing, stamps last_seen_at, and returns the post-write state in one round trip. The memory pass closes the loop with Pattern 2. A slot the agent wrote last Tuesday is in context on Wednesday without any extra tool calls.
The agent's on_enter calls self.session.generate_reply with an instruction to greet by name if the profile or remembered facts contain one. All of this has to finish before the agent speaks, with on the order of a few hundred milliseconds of headroom on top of TTS warmup.
This is not authentication
The server owns the id, but it's still anonymous. Some things to keep in mind.
- Clearing the cookie resets identity. Different browser, private window, or manual delete produces a fresh id and a fresh profile.
- Production swap is one file. Replace the cookie read in
/api/token/route.tswith your session lookup (Better-Auth, Clerk, Supabase) and fall through to the cookie only for guests. The starter ships with aNODE_ENV !== 'development'throw at the top of the route as a tripwire to delete on the same edit. - Migrating guests to logins. One
updateMany({ user_id: cookieId }, { $set: { user_id: authedId } })acrossusers,memories, andsessionsmerges the history onto the real account. - The agent doesn't care.
ctx.job.metadatareads the same either way.
Pattern 4: Function-tool CRUD
For data the agent reads or writes on demand, @function_tool is the right surface. The example here looks up an order by ID.
1@function_tool()2async def lookup_order(self, context: RunContext, order_id: str) -> str:3"""Look up an order by its ID. Returns items, total, and status."""4db = await get_db()5order = await db.orders.find_one({"order_id": order_id})6if not order:7raise ToolError(f"Order {order_id} not found.")8return json.dumps(9{10"order_id": order["order_id"],11"items": order["items"],12"total": order["total"],13"status": order["status"],14}15)
ToolError signals a recoverable failure to the LLM. The message is fed back into the model so it can apologize and ask for a different ID instead of crashing the call. For tools that mutate data, call context.disallow_interruptions() and return a confirmation string the model can read back.
Pattern 5: Session persistence with on_session_end
When a call ends, on_session_end hands you a JobContext to call ctx.make_session_report() and write the result somewhere durable. Here it lands in a sessions collection.
1async def on_session_end(ctx: JobContext) -> None:2"""Pattern 5: persist a session report to MongoDB on hangup."""3try:4report = ctx.make_session_report()5db = await get_db()6user_id = ctx.proc.userdata.get("user_id", DEFAULT_USER_ID)7tenant_id = ctx.proc.userdata.get("tenant_id", DEFAULT_TENANT_ID)8await db.sessions.insert_one(9{10"session_id": ctx.room.name,11"user_id": user_id,12"tenant_id": tenant_id,13"room_name": ctx.room.name,14"report": report.to_dict(),15}16)17logger.info("Persisted session report for %s", ctx.room.name)18except Exception:19logger.exception("Failed to persist session report")20finally:21await aclose()
user_id and tenant_id come from ctx.proc.userdata, where Pattern 3 stashed them. Hangup-time code reads the same id that preload set.
report.to_dict() returns a JSON-friendly snapshot you can drop into MongoDB without custom serialization. A few hundred of those gives you a corpus you can aggregate right in place.
Putting the database in front of the agent
The starter's db/client.py is short on purpose.
1from pymongo import AsyncMongoClient2from pymongo.asynchronous.database import AsyncDatabase34_client: AsyncMongoClient | None = None56async def get_mongo_client() -> AsyncMongoClient:7global _client8if _client is None:9uri = os.getenv("MONGODB_URI")10if not uri:11raise RuntimeError("MONGODB_URI environment variable is not set.")12_client = AsyncMongoClient(uri)13return _client1415async def get_db(db_name: str | None = None) -> AsyncDatabase:16client = await get_mongo_client()17return client[db_name or os.getenv("MONGODB_DB", DEFAULT_DB_NAME)]
A single AsyncMongoClient per process. PyMongo handles pooling under the hood. For explicit lifecycle ownership, construct the client in prewarm on server.setup_fnc instead. LiveKit forks a process per job, so size your Atlas connection ceiling against replicas × concurrent jobs × maxPoolSize.
Running the starter kit
The agent side needs deps synced, an env file, the one-time model download, and the two MongoDB init scripts.
1cd agent2uv sync3cp .env.example .env.local4# fill in MONGODB_URI, VOYAGE_API_KEY, LIVEKIT_* in .env.local5uv run src/agent.py download-files # one-time: VAD + turn detector models6uv run -m db.indexes # collections and vector indexes7uv run -m db.seed # sample users, orders, knowledge8uv run src/agent.py console
Vector indexes need a minute or two to become queryable on Atlas after creation, so retry if a search returns nothing immediately.
What's in the frontend
The frontend is a sibling Next.js App Router project from agent-starter-react. It mints tokens server-side (Pattern 3) and runs the LiveKit client that joins the room and plays audio.
1frontend/2├── app/3│ ├── api/token/route.ts # mints tokens, owns the user cookie4│ ├── layout.tsx5│ └── page.tsx # renders <App appConfig={APP_CONFIG_DEFAULTS} />6├── components/7│ ├── app/8│ │ ├── app.tsx # TokenSource + useSession + provider9│ │ └── view-controller.tsx # welcome <-> active session10│ ├── agents-ui/11│ │ ├── agent-session-provider.tsx # SessionProvider + RoomAudioRenderer12│ │ ├── start-audio-button.tsx # browser autoplay gate13│ │ └── blocks/agent-session-view-01/ # tiles, transcript, controls14│ └── ui/ # shadcn primitives15├── lib/utils.ts # shared client helpers16├── hooks/ # useDebug, useAgentErrors17└── app-config.ts # feature toggles + agentName dispatch
Connection wiring lives in components/app/app.tsx. TokenSource.endpoint('/api/token') and useSession(tokenSource, { agentName }) from @livekit/components-react own the room lifecycle. The agentName in app-config.ts has to match the agent_name on the Python side for dispatch to reach the agent. AgentSessionProvider wraps SessionProvider with RoomAudioRenderer, which plays the agent's TTS through the page.
Browser autoplay policies block playback until a user gesture, so the kit ships a StartAudioButton calling useStartAudio() on first click.
1cd ../frontend2pnpm install3cp .env.example .env.local # fill in LIVEKIT_* (same values as the agent)4pnpm dev
Both apps share LiveKit credentials. A root package.json exposes pnpm setup, pnpm db:init, pnpm db:seed, and pnpm dev for a single-command runbook.
Where to go from here
We kept the starter intentionally minimal. A few directions worth exploring.
- Swap
users,orders, andknowledgefor collections that match your domain, and rewrite the seed script. - Add or remove
@function_toolmethods onMongoAgentto expose your own database operations to the LLM. - Try
voyage-3-largefor higher-quality embeddings orvoyage-multilingual-2for non-English content. For latency-sensitive flows,voyage-3.5-liteat 512 dimensions is faster than 1024. - Look at the LiveKit
personal_shopperexample for a multi-agent shape with handoffs.
Clone the starter, point it at an Atlas cluster, and you should be talking to a memory-equipped agent in under ten minutes.