Skip to main content
close

Build a healthcare intake assistant with an Anam avatar

This tutorial walks you through building a voice AI healthcare intake assistant that uses a talking avatar powered by Anam. A patient opens the app, clicks Start, and speaks with Liv — a lip-synced Anam avatar who guides them through a medical intake form one field at a time. As the user answers each question, the agent calls function tools that push updates to the form on screen using RPC (remote procedure call), a method that lets software trigger actions or exchange data between processes or systems in real time. This keeps both the avatar and the form in sync as the conversation progresses.

For the full source code, see the anam example.

What you'll build

By the end of this tutorial, you'll have a working app that:

  • Renders a talking Anam avatar in the browser, lip-synced to the agent's voice.
  • Guides a patient through a full intake form using voice, one field at a time.
  • Updates each form field in real time as the patient speaks, using LiveKit RPC.
  • Handles a final confirmation loop and lets the patient submit the form by voice.
  • Works with either a Python or TypeScript backend — both share the same frontend.

Prerequisites

Before you start, make sure you have:

The agent uses LiveKit Inference for STT, LLM, and TTS, so you don't need separate API keys from Deepgram, OpenAI, or ElevenLabs. You do need an Anam API key for the avatar.

Step 1: Get the code

Use pnpm dlx degit to download just the anam folder without cloning the entire repository:

1
pnpm dlx degit livekit-examples/python-agents-examples/complex-agents/avatars/anam anam
2
cd anam

The directory has three parts:

  • agent-py/ — the Python LiveKit agent
  • agent-ts/ — the TypeScript LiveKit agent (same functionality, different language)
  • frontend/ — a Next.js frontend that works with either backend

You only need to run one backend. Pick the one you're comfortable with.

Step 2: Set up the agent

Python backend

Install dependencies from the existing pyproject.toml:

1
cd agent-py
2
uv sync

Copy .env.example to .env.local and fill in your values:

1
cp .env.example .env.local
1
LIVEKIT_URL=wss://<project-subdomain>.livekit.cloud
2
LIVEKIT_API_KEY=<your_api_key>
3
LIVEKIT_API_SECRET=<your_api_secret>
4
ANAM_API_KEY=<your_anam_api_key>

Before the first run, download the VAD and turn detector models:

1
uv run python src/agent.py download-files

TypeScript backend

Install dependencies:

1
cd agent-ts
2
pnpm install

Copy .env.example to .env.local and fill in the same values:

1
cp .env.example .env.local

Before the first run, download the VAD and turn detector models:

1
pnpm run download-files

Get your LiveKit credentials from the LiveKit Cloud dashboard under Settings > API Keys. Get your Anam API key from the Anam Lab.

Step 3: Understand the agent

This section walks through the key parts of agent-py/src/agent.py. The TypeScript version in agent-ts/src/agent.ts follows the same design.

Voice pipeline

The agent uses LiveKit Inference to wire up Deepgram for STT, OpenAI for the LLM, and ElevenLabs for TTS:

1
session = AgentSession(
2
stt=inference.STT(model="deepgram/nova-3", language="multi"),
3
llm=inference.LLM(model="openai/gpt-4o-mini"),
4
tts=inference.TTS(
5
model="elevenlabs/eleven_turbo_v2_5",
6
voice="cgSgspJ2msm6clMCkdW9", # Jessica (ElevenLabs)
7
sample_rate=16000, # Required for Anam avatar compatibility
8
),
9
turn_detection=MultilingualModel(),
10
vad=ctx.proc.userdata["vad"],
11
preemptive_generation=True,
12
)

The sample_rate=16000 on the TTS is a hard requirement for Anam. Anam's lip-sync engine expects audio at 16kHz. If you change the TTS model or provider, make sure it outputs at that sample rate.

Function tools as an RPC bridge

The agent has three function tools that act as a bridge to the frontend form. When the patient answers a question, the LLM calls the appropriate tool, which performs an RPC call to the frontend to update the form in real time.

Python (agent-py/src/agent.py):

1
@function_tool
2
async def update_field(
3
self,
4
context: RunContext,
5
field_name: Annotated[str, Field(description="The field ID to update: fullName, dob, address, phone, emergencyName, emergencyRelationship, emergencyPhone, medications, allergies, reasonForVisit")],
6
value: Annotated[str, Field(description="The value to set for the field.")],
7
):
8
"""Update a form field on the patient intake form."""
9
if field_name not in VALID_FIELD_NAMES:
10
raise llm.LLMToolException(f"Invalid field name: {field_name}")
11
payload = json.dumps({"fieldName": field_name, "value": value})
12
response = await perform_rpc_to_frontend(self._ctx, "updateField", payload)
13
return response
14
15
@function_tool
16
async def get_form_state(self, context: RunContext):
17
"""Get the current state of all form fields."""
18
response = await perform_rpc_to_frontend(self._ctx, "getFormState", "{}")
19
return response
20
21
@function_tool
22
async def submit_form(self, context: RunContext):
23
"""Submit the completed intake form."""
24
response = await perform_rpc_to_frontend(self._ctx, "submitForm", "{}")
25
context.session.say("Your form has been submitted. You will be contacted soon. Thank you.")
26
return response

Each tool calls perform_rpc_to_frontend, which uses local_participant.perform_rpc() to call a named method on the frontend participant:

1
async def perform_rpc_to_frontend(ctx: JobContext, method: str, payload: str) -> str:
2
destination_identity = get_remote_participant_identity(ctx)
3
response = await ctx.room.local_participant.perform_rpc(
4
destination_identity=destination_identity,
5
method=method,
6
payload=payload,
7
response_timeout=5.0,
8
)
9
return response

The get_remote_participant_identity function skips any participant whose identity starts with anam-, because Anam joins the room as a separate participant and you only want to target the actual user:

1
def get_remote_participant_identity(ctx: JobContext) -> str:
2
for participant in ctx.room.remote_participants.values():
3
if not participant.identity.startswith("anam-"):
4
return participant.identity
5
raise llm.LLMToolException("No remote participant found")

TypeScript (agent-ts/src/agent.ts) follows the same pattern using llm.tool():

1
const updateField = llm.tool({
2
description: 'Update a form field on the patient intake form.',
3
parameters: z.object({
4
fieldName: z.string().describe('The field ID to update: fullName, dob, ...'),
5
value: z.string().describe('The value to set for the field'),
6
}),
7
execute: async ({ fieldName, value }) => {
8
const response = await performRpcToFrontend(
9
'updateField',
10
JSON.stringify({ fieldName, value })
11
);
12
return response;
13
},
14
});

Both backends use the exact same RPC method names (updateField, getFormState, submitForm) and the same JSON payload format, so the single Next.js frontend works with either.

Attaching the Anam avatar

After the session starts, create an AvatarSession and start it:

1
# Start the session first
2
await session.start(
3
agent=IntakeAssistant(ctx),
4
room=ctx.room,
5
room_options=room_io.RoomOptions(...),
6
)
7
8
# Create Anam avatar — must happen AFTER session.start()
9
avatar = anam.AvatarSession(
10
persona_config=anam.PersonaConfig(
11
name="Liv",
12
avatarId="071b0286-4cce-4808-bee2-e642f1062de3",
13
),
14
)
15
16
await avatar.start(session, room=ctx.room)

Note that session.start() comes first here. Anam needs to connect to an active audio stream for lip-sync, so the session must be running before you attach the avatar. This is different from some other avatar plugins where the avatar starts first. Once avatar.start() resolves, Anam has joined the room as a separate participant and is publishing a video track.

You can use any stock avatar from the Anam Avatar Gallery or create a custom one in Anam Lab. The avatarId in PersonaConfig is the ID you copy from the gallery or lab.

Agent name

Both backends register the same agent name with LiveKit so the frontend can dispatch to either one.

Python (agent-py/src/agent.py) — set on the entrypoint decorator:

1
@server.rtc_session(agent_name="Anam-Demo")
2
async def intake_agent(ctx: JobContext):
3
...

TypeScript (agent-ts/src/main.ts) — set in ServerOptions:

1
cli.runApp(
2
new ServerOptions({
3
agent: fileURLToPath(import.meta.url),
4
agentName: 'Anam-Demo',
5
}),
6
);

If you rename the agent, update both the backend and the AGENT_NAME variable in the frontend .env.local.

Instruction design

The system prompt is structured to enforce strict one-field-at-a-time behavior. Each turn asks for exactly one piece of information, confirms it, then moves on. This keeps the conversation predictable and ensures update_field is called with clean, confirmed values rather than unverified user input. The confirmation step also handles corrections cleanly — if the patient says "actually, it's spelled differently," the agent updates the same field before moving forward.

Step 4: Set up the frontend

Install dependencies and configure the environment:

1
cd frontend
2
cp .env.example .env.local
3
pnpm install

Fill in .env.local:

1
LIVEKIT_API_KEY=<your_api_key>
2
LIVEKIT_API_SECRET=<your_api_secret>
3
LIVEKIT_URL=wss://<project-subdomain>.livekit.cloud
4
AGENT_NAME=Anam-Demo

AGENT_NAME must match the agent name registered in the backend — both backends use Anam-Demo as shown in Step 3.

How RPC handlers work

The frontend registers three RPC methods using the useRpcHandlers hook (frontend/hooks/useRpcHandlers.ts). These are the methods the agent calls:

1
room.registerRpcMethod('updateField', async (data: RpcInvocationData) => {
2
const { fieldName, value } = JSON.parse(data.payload) as {
3
fieldName: string;
4
value: string;
5
};
6
if (!isValidFieldName(fieldName)) {
7
throw new RpcError(1500, 'Invalid field name', JSON.stringify({ fieldName }));
8
}
9
setFormData((prev) => ({ ...prev, [fieldName]: value }));
10
return JSON.stringify({ success: true, fieldName, value });
11
});
12
13
room.registerRpcMethod('getFormState', async () => {
14
return JSON.stringify(formDataRef.current);
15
});
16
17
room.registerRpcMethod('submitForm', async () => {
18
setIsSubmitted(true);
19
return JSON.stringify({ success: true });
20
});

When updateField is called, it updates the local React state, which immediately re-renders the corresponding input on screen. getFormState reads from a ref rather than state directly to avoid stale closure issues. submitForm flips the isSubmitted flag, which replaces the form with a confirmation message.

The hook registers on connect and cleans up on disconnect:

1
return () => {
2
room.unregisterRpcMethod('updateField');
3
room.unregisterRpcMethod('getFormState');
4
room.unregisterRpcMethod('submitForm');
5
};

Rendering the avatar

The AvatarPanel component (frontend/components/app/avatar-panel.tsx) locates the Anam video track and renders it with <VideoTrack>:

1
const remoteParticipants = useRemoteParticipants();
2
const worker = remoteParticipants.find(
3
(p) =>
4
p.kind === ParticipantKind.AGENT &&
5
p.attributes['lk.publish_on_behalf'] === agent?.identity
6
);
7
8
const workerTracks = useParticipantTracks(
9
[Track.Source.Camera, Track.Source.ScreenShare],
10
worker?.identity
11
);
12
13
const trackRef =
14
workerTracks.find((t) => t.source === Track.Source.Camera) ?? ...;

Anam joins the room as a worker participant publishing video on behalf of the agent. The component finds this worker by checking the lk.publish_on_behalf attribute, then gets its camera track. Once trackRef resolves, the avatar video renders inside a <VideoTrack>.

Step 5: Run the app

Start the agent in one terminal (choose one backend):

1
# Python
2
cd agent-py
3
uv run python src/agent.py dev
4
5
# TypeScript
6
cd agent-ts
7
pnpm run dev

Start the frontend in another terminal:

1
cd frontend
2
pnpm dev

Open http://localhost:3000 and click Start intake. Liv will introduce herself and begin walking you through the form.

How it works

Here's the full session flow:

  1. The user opens the app and clicks Start intake. The frontend calls /api/connection-details, which mints a LiveKit JWT and dispatches the Anam-Demo agent.
  2. The agent starts an AgentSession with Deepgram STT, GPT-4o-mini LLM, and ElevenLabs TTS at 16kHz.
  3. session.start() runs first, activating the voice pipeline. Then avatar.start() connects Anam to the room as a separate participant and begins publishing a video track. The frontend's AvatarPanel detects the new track and renders the avatar video.
  4. session.generate_reply() triggers the agent's opening message. Liv greets the user and asks for their full name.
  5. The user speaks. LiveKit routes the audio through Deepgram Nova 3 (STT), GPT-4o-mini (LLM), and ElevenLabs Jessica (TTS). Anam receives the TTS audio and lip-syncs the avatar.
  6. When the user answers a question, the LLM calls update_field. The agent performs an RPC call to updateField on the frontend, which updates the form in React state and re-renders the input.
  7. This continues for all ten fields. Before submission, the agent calls get_form_state to verify what's been captured, then reads it back to the user for confirmation.
  8. When the user confirms, the LLM calls submit_form. The agent performs an RPC call to submitForm, which marks the form as submitted in the frontend. The avatar delivers a closing message, and the session ends.

Summary

The key techniques in this app:

  • Starting session.start() before avatar.start() so Anam can connect to the live audio stream for lip-sync.
  • Setting sample_rate=16000 on the ElevenLabs TTS, which is required for Anam avatar compatibility.
  • Using function tools that call local_participant.perform_rpc() to push form updates from the agent to the frontend in real time.
  • Registering RPC handlers on the frontend with room.registerRpcMethod() to receive those updates and update React state.
  • Filtering out anam- prefixed participants when targeting the user with RPC calls, since Anam joins as a separate room participant.

For more information, see: