Skip to main content

How to Use the Supervisor Pattern for Multi-Agent Voice AI Systems

The supervisor pattern is the most intuitive multi-agent architecture for voice AI. One "boss" agent that delegates to specialists and synthesizes results into a single conversation.


That coordination problem is more common than it sounds.

You've got a billing agent, a booking agent, and a tech support agent. A caller asks a question that touches two of those domains in one sentence. Without coordination, you're stuck building separate phone trees. The supervisor pattern is how you solve that.

What Is the Supervisor Pattern?

The supervisor pattern is a hierarchical multi-agent architecture. One agent sits at the top. It receives the user's request, breaks it into subtasks, routes each subtask to the right specialist agent, monitors progress, and combines everything into a single response.

Think of it as a project manager who never does the work directly but knows exactly who to assign each piece to.

This pattern goes by several names, including coordinator, orchestrator, manager, and hub-and-spoke. The concept is the same. One agent controls the flow. Specialist agents do the work.

For voice agents specifically, the supervisor is what lets you build a single phone number or voice interface that handles multiple departments. Instead of "press 1 for billing, press 2 for support," the supervisor listens to what the caller needs and routes them to the right specialist agent automatically.


How It Works

The supervisor follows a five-step loop:

StepWhat HappensVoice Agent Example
1. ReceiveThe supervisor gets the user's request and full conversation contextThe caller says "I need to reschedule my appointment and check my balance"
2. DecomposeBreaks the request into subtasks and identifies which specialists are neededTwo subtasks are identified. Scheduling goes to the booking agent and account lookup goes to the billing agent.
3. DelegateRoutes each subtask to the right worker agent, handing off conversation context for continuityPasses customer ID and date preferences to the booking agent
4. MonitorTracks progress, handles failures, re-plans if neededIf the booking agent can't find availability, the supervisor asks the caller for alternate dates
5. SynthesizeCombines all agent outputs into one coherent response"Your appointment is moved to Thursday at 2 PM, and your current balance is $142.50"

You control what context each specialist receives at handoff time. The example below passes prior conversation history through so the caller does not have to repeat themselves. For tighter scoping in production, you can pass a truncated copy instead. See Context Overload in the pitfalls section for that pattern.


Static vs. Dynamic Supervisors

There are two main variants, and the choice between them shapes your entire architecture.

Static Supervisor

The routing logic is predefined. The supervisor follows a fixed graph of agent relationships. If the user mentions billing, it always goes to the billing agent. If the user mentions booking, it always goes to the booking agent.

Pros: Predictable, easy to test, fast routing, lower token cost.

Cons: Can not handle ambiguous or novel requests that fall outside the predefined graph.

Dynamic Supervisor

The supervisor uses LLM reasoning to decide which agents to call, in what order, and whether to re-plan. It reads the user's request, thinks about which specialist is best, and routes accordingly.

Pros: Handles edge cases, adapts to unexpected requests, more flexible.

Cons: Harder to debug, slower (the LLM reasoning step adds latency), higher token cost.

The Hybrid Approach (Recommended for Voice)

For real-time voice, the best approach is a hybrid. Use a lightweight intent classifier for common, well-defined intents (billing, booking, support) and fall back to full LLM reasoning only for ambiguous or multi-domain requests. This gives you the speed of static routing for 80% of calls and the flexibility of dynamic routing for the rest.


When to Use the Supervisor Pattern

The supervisor pattern is the right choice when:

  • Your agent handles multiple domains. A voice agent that does billing, booking, and tech support needs a way to route between them. The supervisor is the natural fit.
  • You need centralized oversight. Enterprise deployments need visible reasoning, traceable outputs, and audit trails. The supervisor provides a single point where all decisions are logged.
  • Quality matters more than raw speed. The supervisor can validate specialist outputs before responding to the user. If the billing agent returns suspicious data, the supervisor can catch it.
  • You want one conversational interface over multiple backends. Instead of building separate agents for each department, build one supervisor that presents a unified experience.

When to Avoid It

The supervisor is not always the right call:

  • You only have one specialist. If your agent does one thing (like appointment booking), skip the supervisor layer entirely. It adds complexity and latency for no benefit.
  • Latency is your primary constraint. The supervisor adds a reasoning step before every delegation. For simple, single-domain voice queries where every millisecond counts, direct routing is faster.
  • You are working with strict token budgets. Supervisor reasoning loops can consume 2-3x more tokens than direct patterns. The LLM processes the full context at the supervisor level before passing a subset to the specialist.
  • Your scale could overload the orchestrator. At very high concurrency, the single supervisor becomes a bottleneck. Consider distributing supervisors by domain at that point.

The Latency Problem (and How to Solve It)

Latency is the number one concern with the supervisor pattern in voice. The supervisor adds a reasoning step before every delegation, which can add hundreds of milliseconds. In a text chat, that is fine. In a real-time voice conversation, it can make the experience feel broken.

Here are the proven techniques for keeping latency low:

1. Use a Small, Fast Model for the Supervisor

The supervisor's job is routing, not deep reasoning. Use a lightweight model (or a fine-tuned classifier) for the supervisor layer. Save the larger, more capable models for the specialist agents that need them.

2. Fan Out in Parallel

If the supervisor identifies two independent subtasks (check balance AND retrieve appointment slots), run them simultaneously. Total latency drops from Task A + Task B to max(Task A, Task B).

3. Pre-Classify Common Intents

For the most frequent intents in your voice app (billing, support, booking), a lightweight classifier can route directly to the right agent without the supervisor's full reasoning loop. Only fall back to the supervisor for ambiguous or multi-domain requests.

4. Stream Everything

Do not wait for the full supervisor response before starting downstream work. Stream partial results through the pipeline so the user hears a response as quickly as possible.

5. Scope Context Aggressively

Each specialist agent should only receive the data it needs. Dumping the full conversation history to every worker wastes tokens and slows down inference. Pass customer ID, extracted intent, and relevant entities only.


Building a Supervisor Voice Agent with LiveKit

LiveKit's Agents Framework maps directly to the supervisor pattern. Here is how the pieces fit together.

Core Building Blocks

AgentSession is the orchestrator. It manages the voice pipeline, collects user input, invokes the LLM, and emits events. Each session can compose one or more Agent instances.

Agent classes define distinct personas. The supervisor is one Agent with its own instructions and tools. Each specialist (billing, booking, tech support) is another Agent with its own instructions and tools.

@function_tool handoffs are how the supervisor delegates. A tool decorated with @function_tool() can return a different Agent instance, triggering an automatic handoff. The LLM decides when to delegate based on the tool descriptions you provide. For more on structuring agent handoffs, see the workflows documentation.

chat_ctx.copy() controls what conversation history carries into each handoff. Pass it to the next agent so the caller does not have to repeat themselves. Use exclude_instructions=True to strip the current agent's system prompt from the copy, and chain .truncate() to limit how many turns are included.

userdata on the session stores shared state (customer ID, collected data, account info) accessible to all agents via session.userdata.

Example: Multi-Department Voice Agent

Here is a supervisor that routes between billing and booking specialists. First, the agent server and session setup shows where the LLM, STT, and TTS models are configured:

1
from dataclasses import dataclass
2
from livekit import agents, rtc
3
from livekit.agents import AgentServer, AgentSession, Agent, function_tool, RunContext, room_io
4
from livekit.plugins import noise_cancellation, silero
5
from livekit.plugins.turn_detector.multilingual import MultilingualModel
6
7
@dataclass
8
class SessionData:
9
customer_id: str | None = None
10
11
server = AgentServer()
12
13
@server.rtc_session(agent_name="customer-service")
14
async def entrypoint(ctx: agents.JobContext):
15
# LLM, STT, and TTS are configured on AgentSession, not on the Agent.
16
# This default LLM is inherited by all agents unless they override it.
17
session = AgentSession[SessionData](
18
stt="deepgram/nova-3:multi",
19
llm="openai/gpt-4.1",
20
tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
21
vad=silero.VAD.load(),
22
turn_detection=MultilingualModel(),
23
userdata=SessionData(),
24
)
25
26
await session.start(
27
room=ctx.room,
28
agent=SupervisorAgent(), # The supervisor is the initial active agent
29
room_options=room_io.RoomOptions(
30
audio_input=room_io.AudioInputOptions(
31
noise_cancellation=lambda params: noise_cancellation.BVCTelephony()
32
if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
33
else noise_cancellation.BVC(),
34
),
35
),
36
)
37
38
await session.generate_reply(
39
instructions="Greet the user and offer your assistance."
40
)
41
42
if __name__ == "__main__":
43
agents.cli.run_app(server)

Next, the agent definitions show how each agent defines its own instructions, tools, and handoff behavior. The billing_api and calendar_api calls are stand-ins for your own backend integrations and are not part of the LiveKit SDK:

1
from livekit.agents import Agent, function_tool, RunContext
2
3
# billing_api and calendar_api are placeholders for your own backend integrations.
4
5
class BillingAgent(Agent):
6
def __init__(self):
7
super().__init__(
8
instructions="""You are a billing specialist. Help customers
9
with account balances, payment history, and invoice questions.
10
Be concise and confirm amounts clearly.""",
11
)
12
13
async def on_enter(self) -> None:
14
await self.session.generate_reply(
15
instructions="Introduce yourself as a billing specialist and ask how you can help."
16
)
17
18
@function_tool()
19
async def check_balance(self, context: RunContext[SessionData], customer_id: str):
20
"""Look up the current account balance for a customer."""
21
balance = await billing_api.get_balance(customer_id)
22
context.userdata.customer_id = customer_id
23
return {"balance": balance}
24
25
@function_tool()
26
async def get_payment_history(self, context: RunContext, customer_id: str):
27
"""Retrieve recent payment history for a customer."""
28
history = await billing_api.get_payments(customer_id)
29
return {"payments": history}
30
31
class BookingAgent(Agent):
32
def __init__(self):
33
super().__init__(
34
instructions="""You are a scheduling specialist. Help customers
35
book, reschedule, or cancel appointments. Always confirm the
36
date and time before finalizing.""",
37
)
38
39
async def on_enter(self) -> None:
40
await self.session.generate_reply(
41
instructions="Introduce yourself as a scheduling specialist and ask how you can help."
42
)
43
44
@function_tool()
45
async def check_availability(self, context: RunContext, date: str):
46
"""Check available appointment slots for a given date."""
47
slots = await calendar_api.get_slots(date)
48
return {"available_slots": slots}
49
50
@function_tool()
51
async def book_appointment(self, context: RunContext, date: str, time: str):
52
"""Book an appointment at the specified date and time."""
53
confirmation = await calendar_api.book(date, time)
54
return {"confirmation_id": confirmation.id}
55
56
class SupervisorAgent(Agent):
57
def __init__(self):
58
super().__init__(
59
instructions="""You are a friendly customer service supervisor.
60
Listen to what the caller needs and route them to the right
61
specialist. If they need billing help, transfer to the billing
62
agent. If they need to schedule something, transfer to the
63
booking agent. For simple greetings or general questions,
64
handle them yourself.""",
65
llm="openai/gpt-4.1-mini", # Override: fast model for routing decisions
66
)
67
68
@function_tool()
69
async def transfer_to_billing(self, context: RunContext):
70
"""Transfer the caller to the billing specialist for account
71
balances, payments, and invoice questions."""
72
# Passes conversation history to the specialist but strips the supervisor's instructions,
73
# so the specialist starts with its own persona. Chain .truncate(max_items=6) for tighter scoping.
74
return BillingAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to billing"
75
76
@function_tool()
77
async def transfer_to_booking(self, context: RunContext):
78
"""Transfer the caller to the scheduling specialist for
79
booking, rescheduling, or canceling appointments."""
80
return BookingAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to scheduling"

A few things to notice:

  • AgentSession configures the pipeline defaults. The LLM (gpt-4.1), STT, TTS, and VAD are all set on AgentSession. Every agent inherits these defaults unless it explicitly overrides them.
  • Agents can override plugins. The supervisor overrides with llm="openai/gpt-4.1-mini" because routing decisions are simple and speed matters. The specialist agents inherit the session's default gpt-4.1 for more complex domain reasoning (billing calculations, scheduling logic).
  • Handoffs return a tuple. The @function_tool methods return (Agent, "message"), the Agent instance to hand off to, and a string the LLM uses to inform the caller about the transfer.
  • chat_ctx.copy(exclude_instructions=True) passes conversation history without the supervisor's persona. The specialist receives all prior turns so the caller does not have to repeat themselves, but it starts with its own instructions rather than the supervisor's. Chain .truncate(max_items=6) to carry only the last few turns for even tighter scoping.
  • on_enter triggers a greeting. When a specialist takes over, on_enter fires automatically, prompting the agent to introduce itself.
  • userdata stores shared session state. The example passes a SessionData instance so any agent can read or write it via context.userdata. Use this for things like customer ID, collected form data, or flags set earlier in the call.
  • Model strings like deepgram/nova-3:multi and openai/gpt-4.1 route through LiveKit Inference, giving you access to 50+ STT, LLM, and TTS models with no separate API keys required.

Optimizing the Supervisor for Low Latency

LiveKit's framework gives you several tools to keep the supervisor fast:

Per-agent plugin overrides. Each specialist can use different LLM, STT, or TTS providers. Use a fast, small model for the supervisor (routing decisions) and a larger model for specialists that need deeper reasoning.

Built-in streaming. LiveKit streams at every pipeline stage (VAD, STT, LLM, TTS). The supervisor's reasoning step overlaps with audio processing rather than blocking it.

Agent dispatch. For complex deployments, multiple agent processes can be dispatched to the same room via the server API. This enables a true multi-process supervisor architecture where the supervisor and specialists run as separate services.


Real-World Use Cases

The supervisor pattern shows up in production across several industries:

Customer Service

A single voice number that handles billing, tech support, returns, and general inquiries. The supervisor classifies the caller's intent and routes to the right department. No IVR menus. No "press 1 for..."

Healthcare

A medical office triage agent where the supervisor routes between appointment scheduling, prescription refills, nurse advice, and insurance verification. Each specialist has access to different backend systems and follows different compliance rules. LiveKit has a medical office triage example that demonstrates this architecture.

Drive-Through Ordering

A fast-food drive-through agent where the supervisor coordinates between an order-taking agent, a menu lookup agent, and a payment agent. The supervisor tracks the overall order state while each specialist handles its piece. See the drive-thru example for a working implementation.

Enterprise Copilots

BASF Coatings built a system called "Marketmind" using a multi-agent supervisor to serve over 1,000 sales reps. The supervisor receives natural language queries via Microsoft Teams and routes between a structured data agent (SQL queries over sales metrics) and an unstructured data agent (vector search over Salesforce visit reports). It synthesizes results into a unified answer.

Front Desk / Receptionist

A voice agent for a hotel, medical office, or professional services firm. The supervisor handles greetings and general questions directly, then routes to specialists for reservations, billing, or appointment management. The front-desk booking example shows this pattern in action.


Supervisor vs. Other Multi-Agent Patterns

The supervisor is one of several multi-agent patterns. Here is how it compares:

PatternHow It Differs from SupervisorChoose It When...
The Handoff Pattern for Voice Agents That Replaces IVR MenusDirect agent-to-agent transfers without a central coordinator. Simpler, less overhead.You have clear, non-overlapping domains and do not need centralized oversight
Sequential Pipeline Architecture for Voice AgentsFixed linear chain where output flows forward. No dynamic routing.Your process is always the same steps in the same order (like the VAD to STT to LLM to TTS voice pipeline)
The ReAct Pattern for Voice Agents and How AI Agents Think, Act, and RespondSingle agent that loops through think, act, observe cycles with tools. No delegation to other agents.One agent with multiple tools is sufficient and you do not need separate specialist personas
The Human-in-the-Loop (HITL) Pattern for Voice AgentsAgent pauses for human approval at critical decision points. Often combined with supervisor.High-stakes or regulated actions require human oversight before execution

In practice, these patterns are often combined. A supervisor might use ReAct internally for its routing decisions. A specialist agent might use Human-in-the-Loop for high-value transactions. The sequential pipeline (VAD, STT, LLM, TTS) runs underneath all of them.


Common Pitfalls and How to Avoid Them

1. The Infinite Loop

Supervisor agents can enter loops where they keep delegating back and forth between specialists without reaching a resolution. Set a maximum iteration count (10 is a reasonable default) and implement state repetition detection. If the supervisor has seen the same state three times, force a fallback response.

2. Context Overload

Passing unfiltered context to every specialist wastes tokens and can degrade accuracy. The example above uses chat_ctx.copy(exclude_instructions=True) so specialists get prior conversation turns without inheriting the supervisor's system prompt. For tighter control, chain .truncate(max_items=6) to carry only the last few turns, or build a summary and inject it as a system message when the specialist enters. The supervisor holds the full context. Specialists get a focused slice.

3. Too Many Specialists

When you have 10 or more specialist agents, the supervisor's LLM struggles to pick the right one. The tool descriptions blur together. Group related specialists under sub-supervisors (a "Supervisor of Supervisors" pattern) or use an intent classifier to narrow the field before the supervisor reasons.

4. Single Point of Failure

The supervisor is a bottleneck by design. If it goes down, everything stops. Build in health checks, timeouts, and fallback behavior. If the supervisor fails to respond within a threshold, route the caller to a default agent or a human.

5. Ignoring the Handoff Experience

In voice, the transition between supervisor and specialist must be invisible to the caller. No awkward pauses. No repeated context. No "please hold while I transfer you." The handoff should feel like the same agent just got smarter about the topic.


Key Takeaways

  • The supervisor gives you a single conversational interface over multiple specialist agents. No IVR menus, no rigid routing
  • Hybrid routing works best for voice. Use a fast intent classifier for common requests and save full LLM reasoning for the rest
  • Latency is the main tradeoff. Solve it with a lightweight supervisor model, parallel delegation, and aggressive context scoping
  • LiveKit's Agent class, @function_tool handoffs, and chat_ctx.copy() give you precise control over what conversation history carries into each specialist
  • In practice, the supervisor combines with other patterns. ReAct handles routing decisions, Human-in-the-Loop covers high-stakes actions

Getting Started

If you want to build a supervisor-based voice agent with LiveKit, here's the path:

  1. Start with the Agents quickstart. Get a single-agent voice app running first. The quickstart guide walks you through the basics in minutes.
  2. Add a second agent. Create a specialist Agent class with its own instructions and tools. Add a @function_tool to your main agent that returns the specialist.
  3. Test in the Agent Playground. Use the Agent Playground or prototype the flow without code using Agent Builder. When you're ready for production, deploy to LiveKit Cloud with one click.
  4. Add more specialists. As your use case grows, add more specialist agents. Keep each one focused on a single domain.
  5. Optimize for latency. Profile your supervisor's routing time with Agent Observability. If it is too slow, switch to a smaller model for the supervisor or add a pre-classifier for common intents.

The supervisor pattern is the most common multi-agent architecture in production for a reason. It is intuitive, debuggable, and maps directly to how organizations actually work. For voice agents, it turns a single-purpose bot into a full-service assistant.

Give it a try and let us know what you're building.