How to Implement the Supervisor Pattern for Voice AI

The supervisor pattern keeps one agent in control of the entire session while delegating focused work to specialist tasks that return typed results.

A voice agent that handles billing, scheduling, and triage has an ai agent orchestration problem. One approach is to hand off between separate agents, where each agent takes full control in turn. Another is to keep a single agent in charge and have it delegate focused work to specialist tasks that return structured results. The second approach is the supervisor pattern.

In LiveKit, the supervisor pattern means one Agent stays in control of the session while delegating discrete work to Tasks. Tasks are not separate agents. They are short-lived, focused units of work that run inside the agent's session, take temporary control of the conversation, and return typed results. The supervisor receives those results and decides what happens next. This matters for voice because the single agent session preserves conversation history, turn detection, and real-time responsiveness throughout the entire call.

What is the supervisor pattern for AI agent orchestration?

The supervisor pattern is an AI agent orchestration approach where a central coordinator delegates discrete work to specialists, collects structured results, and makes routing decisions. In other frameworks this is also called the orchestrator-worker pattern (Anthropic) or central coordinator (LangGraph). The concept is the same. One entity controls the flow, specialist workers do the focused work, and results flow back to the coordinator.

In LiveKit, the coordinator is an Agent and the specialist workers are Tasks. The Agent delegates, the Tasks execute and return typed results, and the Agent decides what to do next based on those results. Tasks are not separate agents. They run inside the same session, share the voice pipeline, and exist only until they complete their goal.

Think of a phone receptionist. The receptionist (supervisor) answers the call, understands the caller's needs, then routes the call to an appropriate specialist (triage, scheduling, billing). But the receptionist doesn't disappear. When the specialist finishes, the receptionist re-engages, thanks the caller, and may route to another specialist if needed. The receptionist maintains context of the entire conversation and makes routing decisions.

Contrast this with a handoff, where the original agent transfers full control to another agent. The original agent no longer participates. The new agent must be explicitly told what came before. If multiple specialists need to be involved, each handoff must preserve and pass context forward. This works for sequential expertise switching but doesn't give you a central authority that stays aware of the full workflow.

Voice systems often favor the supervisor pattern because the caller expects a single entity to understand the entire call and make routing decisions in real-time.

When to use supervisor vs. handoff

The choice depends on your control flow.

Use the supervisor pattern when your application needs:

A central agent to remain aware of the full conversation
The ability to route between multiple specialist tasks based on context
Typed, validated results returned to the coordinating agent
A single agent to apply safety checks or guardrails across all operations
Context preservation without explicit summarization between steps

Use handoffs when:

One agent's role is completely finished and another agent's role begins
You're switching between distinct expertise areas with no need for the original agent to be involved
You want to isolate context to reduce LLM token usage
Agents operate independently without needing the original agent to make decisions

Tasks: LiveKit's delegation primitive

LiveKit's Task API provides the building block for supervisor patterns. A Task is a focused unit of work that takes temporary control of the session and returns a typed result. Tasks are not Agents. In LiveKit's architecture, an Agent holds long-lived control of a session and can hand off to other Agents. A Task is short-lived, runs inside the agent's session, and exists only until it completes its goal. This distinction matters because it means the supervisor agent never loses control of the session. There is no agent swap, no additional LLM routing step, and no latency penalty from spinning up a separate agent. The Task does its work and returns structured data to the supervisor.

When a supervisor delegates to a Task, the supervisor calls the Task and awaits its completion. The Task runs within the same agent session, preserving turn detection, interruption handling, and voice responsiveness. The supervisor receives the result and continues the conversation.

Here's how the supervisor delegates to a task:

1@function_tool()
2async def start_triage(self) -> str:
3    """Delegate to triage task."""
4    logger.info("Supervisor: delegating to TriageTask")
5    result = await TriageTask(chat_ctx=self.chat_ctx)
6
7    # Supervisor receives the typed result
8    await self.session.generate_reply(
9        instructions=(
10            f"The customer's issue is: {result.issue_category} "
11            f"(urgency: {result.urgency_level}). "
12            f"Summary: {result.brief_summary}. "
13            f"Now proceed to schedule an appointment."
14        )
15    )
16    return f"Triage complete: {result.issue_category}"

The supervisor passes its conversation history (self.chat_ctx) to the task. The task sees everything the supervisor has already heard. When the task completes, it returns a structured result (in this case, a TriageResult dataclass with issue_category, urgency_level, and brief_summary). The supervisor receives this result and uses it to inform the next step.

Building a specialist task

A task inherits from AgentTask[ResultType]. It defines its own specialized instructions, tools, and result type. Instructions are passed to the parent constructor so the task knows its role from the start. When the task finishes its work, it calls complete() with a typed result.

Here's a triage task:

1class TriageTask(AgentTask[TriageResult]):
2    """
3    Specialized task for intake/triage: collects issue information and
4    categorizes the problem.
5    """
6
7    def __init__(self, chat_ctx):
8        super().__init__(
9            instructions=(
10                "You are a specialized intake specialist. Your job is to understand "
11                "the caller's issue, ask clarifying questions, and categorize the problem. "
12                "Be empathetic and concise. Once you understand the issue, use the "
13                "submit_triage function to return your assessment."
14            ),
15            chat_ctx=chat_ctx
16        )
17        self.issue_category = None
18        self.urgency_level = None
19        self.summary = None
20
21    @function_tool()
22    async def submit_triage(
23        self, category: str, urgency: str, summary: str
24    ) -> str:
25        """Submit triage assessment back to the supervisor."""
26        self.issue_category = category
27        self.urgency_level = urgency
28        self.summary = summary
29        self.complete(
30            TriageResult(
31                issue_category=category,
32                urgency_level=urgency,
33                brief_summary=summary,
34            )
35        )
36        return "Triage assessment submitted"

The task's instructions tell it what role to play when it takes control. The task defines its own tool (submit_triage). When the tool is called, the task collects the result and calls self.complete(). This returns control to the supervisor and sends the typed TriageResult back to the supervisor.

The supervisor then receives result.issue_category, result.urgency_level, and result.brief_summary as structured data, ready to use directly without parsing.

TaskGroup for sequential workflows

When you need multiple sequential tasks with shared context, use TaskGroup. This is more efficient than calling tasks individually. TaskGroup is part of the livekit.agents.beta.workflows module.

1from livekit.agents.beta.workflows import TaskGroup
2
3@function_tool()
4async def run_full_workflow_with_task_group(self) -> str:
5    """Execute multiple tasks in sequence with context sharing."""
6    logger.info("Supervisor: starting TaskGroup workflow")
7
8    # Create a task group
9    task_group = TaskGroup()
10
11    # Add tasks in sequence
12    task_group.add(
13        lambda: TriageTask(chat_ctx=self.chat_ctx),
14        id="triage"
15    )
16    task_group.add(
17        lambda: SchedulingTask(chat_ctx=self.chat_ctx, issue_category="general"),
18        id="scheduling"
19    )
20    task_group.add(
21        lambda: BillingTask(chat_ctx=self.chat_ctx, urgency_level="standard"),
22        id="billing"
23    )
24
25    # Execute the task group
26    results = await task_group
27
28    # Access results by task ID
29    triage_result = results.task_results["triage"]
30    scheduling_result = results.task_results["scheduling"]
31    billing_result = results.task_results["billing"]
32
33    # Supervisor merges all results
34    summary = (
35        f"Complete workflow summary: "
36        f"Issue: {triage_result.issue_category}, "
37        f"Appointment: {scheduling_result.scheduled_date} with {scheduling_result.specialist_name}, "
38        f"Total cost: ${billing_result.total:.2f}"
39    )
40
41    await self.session.generate_reply(
42        instructions=(
43            f"Confirm all details with the customer: {summary}. "
44            f"Make sure they're satisfied."
45        )
46    )
47
48    return summary

TaskGroup executes tasks in order. Each task has access to the full conversation history (passed via chat_ctx). When all tasks complete, the supervisor receives a results object. Access each task's result by its ID. The supervisor then merges the results back into the conversation with a single generate_reply() call.

This is much cleaner than calling start_triage, start_scheduling, start_billing individually.

Context preservation during delegation

A key advantage of supervisor patterns is context preservation. When the supervisor delegates to a task, the task sees the entire conversation history. The task inherits knowledge of everything discussed before.

In the code, this happens via chat_ctx=self.chat_ctx. The supervisor's conversation context is passed to the task. The task operates within that context, and when the task completes, the supervisor continues the same conversation. All conversation history is automatically merged back into the main conversation flow.

For long conversations, you may need to summarize context before delegation to avoid LLM token limits. The Agents framework provides context summarization utilities for this. As a rule of thumb, if the conversation is longer than 20 exchanges, consider summarizing before complex multi-task workflows.

The supervisor agent structure

The supervisor is a regular Agent that defines tools for delegating to tasks. The supervisor's instructions set its role as a coordinator:

1class SupervisorAgent(Agent):
2    """Central coordinating agent for customer service workflow."""
3
4    def __init__(self):
5        super().__init__(
6            instructions=(
7                "You are a professional customer service supervisor. Your role is to "
8                "greet customers, understand their needs, and delegate to the right specialist. "
9                "Always maintain a friendly, professional tone. Be aware of the customer's "
10                "complete journey through this call."
11            )
12        )
13
14    @function_tool()
15    async def start_triage(self) -> str:
16        """Delegate to triage task."""
17        logger.info("Supervisor: delegating to TriageTask")
18        result = await TriageTask(chat_ctx=self.chat_ctx)
19        await self.session.generate_reply(
20            instructions=(
21                f"The customer's issue is: {result.issue_category} "
22                f"(urgency: {result.urgency_level}). "
23                f"Summary: {result.brief_summary}. "
24                f"Now proceed to schedule an appointment."
25            )
26        )
27        return f"Triage complete: {result.issue_category}"

The supervisor defines tools like start_triage, start_scheduling, start_billing that the LLM calls when it needs to delegate. Each tool instantiates a task, awaits its completion, and continues the conversation with the result.

Entrypoint and session setup

The supervisor is started like any LiveKit agent using the standard entrypoint pattern. All AI providers (STT, LLM, TTS) are accessed through LiveKit's Inference Gateway, so you only need your LiveKit credentials:

1from livekit.agents import inference
2from livekit.plugins import silero
3
4async def entrypoint(ctx: JobContext) -> None:
5    """Main agent entrypoint."""
6    await ctx.connect()
7
8    # Create agent session using LiveKit Inference Gateway
9    session = AgentSession(
10        vad=silero.VAD.load(),
11        stt=inference.STT("deepgram/nova-3"),
12        llm=inference.LLM("openai/gpt-4.1-mini"),
13        tts=inference.TTS("deepgram/aura-asteria-en"),
14    )
15
16    # Start the supervisor agent
17    await session.start(
18        agent=SupervisorAgent(),
19        room=ctx.room,
20        room_input_options=RoomInputOptions(),
21    )

The inference.STT(), inference.LLM(), and inference.TTS() calls route through LiveKit's Inference Gateway. You specify the provider and model as a string (like "openai/gpt-4.1-mini") and LiveKit handles routing, failover, and co-location with the transport layer. No separate provider API keys needed.

The supervisor remains the session's primary agent throughout, with tasks executing within the same session context and preserving all turn detection and voice responsiveness.

Real-world example: multi-department phone system

In a multi-department phone system, the supervisor greets the caller, delegates to a triage task that categorizes the issue, and based on the triage result delegates to an appropriate task (scheduling, billing, etc.), with the supervisor seeing the entire call.

Here's the flow:

Caller connects to the phone system
Supervisor agent greets the caller and asks about their issue
Supervisor delegates to TriageTask, passing the conversation history
TriageTask categorizes the issue and returns a typed result
Supervisor receives the result and delegates to the appropriate specialist task (SchedulingTask, BillingTask, etc.)
Specialist task handles the work and returns a result
Supervisor receives the result and continues the call with the customer
The supervisor maintains context throughout, so escalations or routing changes happen with full awareness of the call history

This pattern works because the supervisor maintains full awareness of the conversation throughout. Each task knows what came before, and the supervisor can route intelligently based on typed results rather than trying to parse natural language.

Supervisor vs. other multi-agent patterns

The supervisor is one of several multi-agent patterns. Here is how it compares:

Pattern	How it differs from supervisor	Choose it when...
The handoff pattern for voice agents	Direct agent-to-agent transfers without a central coordinator. Simpler, less overhead. The original agent exits the conversation.	You have clear, non-overlapping domains and do not need centralized oversight
Sequential pipeline architecture for voice agents	Fixed linear chain where output flows forward. No dynamic routing.	Your process is always the same steps in the same order (like the VAD to STT to LLM to TTS voice pipeline)
The ReAct pattern for voice agents	Single agent that loops through think, act, observe cycles with tools. No delegation to other agents.	One agent with multiple tools is sufficient and you do not need separate specialist personas
The human-in-the-loop pattern for voice agents	Agent pauses for human approval at critical decision points. Often combined with supervisor.	High-stakes or regulated actions require human oversight before execution

In practice, these patterns combine. A supervisor might use ReAct internally for its routing decisions. A specialist task might use human-in-the-loop for high-value transactions. The sequential pipeline (VAD, STT, LLM, TTS) runs underneath all of them.

Observability and debugging

Supervisor patterns are transparent to observe because the supervisor stays the primary agent throughout. You can log task transitions and access full session transcripts with timing breakdowns. This helps you identify bottlenecks in the workflow and debug task execution issues.

Performance considerations

Supervisor patterns are efficient for voice systems because the supervisor remains the session's primary agent, eliminating the need to swap out the main agent and preserving turn-taking, interruption handling, and voice responsiveness at the session level.

The main cost of supervisor patterns is LLM context accumulation. Each task sees the full conversation history, and for workflows with many sequential tasks this can accumulate quickly. For long conversations, summarize before delegation to avoid hitting LLM token limits.

Key takeaways

The supervisor pattern is the right choice for AI agent orchestration when a single agent needs to maintain awareness of the entire workflow, route between specialist tasks, and apply oversight across all steps.

In LiveKit, this pattern works because Tasks run inside the agent's session. There is no agent swap overhead, no extra LLM routing step, and no latency penalty from coordinating separate agents. The single agent session manages turn detection and interruption handling throughout, and each Task returns typed results that the supervisor uses to make routing decisions.

If you're building a voice agent that needs intelligent routing, centralized decision-making, or multi-step workflows with shared context, the supervisor pattern with Tasks gives you the control flow without the coordination cost. LiveKit's Task and TaskGroup APIs make this straightforward to implement.

Getting started

If you want to build a supervisor-based voice agent with LiveKit, here's the path:

Start with the Agents quickstart. Get a single-agent voice app running first. The quickstart guide walks you through the basics in minutes.
Add a Task. Create an AgentTask[ResultType] subclass with its own instructions and tools. Have your main agent delegate to it with a @function_tool that awaits the task and receives a typed result.
Test in the Agent Playground. Use the Agent Playground to test the conversation flow. When you're ready for production, deploy to LiveKit Cloud.
Add more specialist tasks. As your use case grows, add more tasks. Keep each one focused on a single domain and returning a single typed result.
Sequence tasks with TaskGroup. When your workflow has multiple sequential steps, use TaskGroup to chain tasks together with shared context.

Give it a try and let us know what you're building.