The Handoff Pattern for Voice Agents That Replaces IVR Menus

The handoff pattern is the most practical multi-agent architecture for voice AI. It replaces rigid IVR trees with intelligent, LLM-powered routing, and it's how modern voice agents decide who handles what, mid-conversation.

Every developer building a voice agent eventually hits the same question: what happens when the conversation needs to go somewhere else?

Maybe the caller asked about billing, but your agent only handles tech support. Maybe the AI reached the limits of what it can resolve and a human needs to step in. Maybe the caller just said, "Let me talk to a person."

This is the handoff pattern for voice agents, and it's arguably the single most important multi-agent design pattern for production voice AI. It's the modern replacement for IVR menu trees ("press 1 for sales, press 2 for support"), powered by LLM intent detection instead of rigid button presses.

If you've ever built or used a customer service voice system, you've encountered this pattern. The difference now is that AI makes it actually work well.

What Is the Handoff Pattern?#

The handoff pattern is a multi-agent architecture where one agent dynamically transfers control of a conversation to another agent (or a human) mid-interaction, based on real-time intent detection.

Instead of forcing callers through a decision tree, a triage agent listens to what the caller actually says, classifies intent using an LLM, and routes to the right specialist, whether that's another AI agent or a live person.

There are two primary flavors:

Type	How It Works	Best For
Agent → Agent (Routing)	A triage agent classifies intent and dispatches to a specialist AI agent (billing, booking, tech support, etc.)	Multi-domain voice apps, customer support, any scenario with distinct verticals
Agent → Human (Escalation)	AI recognizes it can't resolve the issue and transfers to a live person with full context	Compliance-sensitive workflows, emotionally charged callers, high-stakes decisions

In practice, most production systems use both. The triage agent routes to specialist AI agents for routine tasks, and any of those agents can escalate to a human when needed.

Why IVR Is Broken (and Why Handoffs Fix It)#

IVR systems were revolutionary in 1990. In 2026, they're a liability. According to Metrigy's CX Optimization 2025-26 study, 37.6% of companies plan to fully replace IVRs with AI triage agents. Among their Research Success Group (companies seeing the highest measurable improvements from AI), that number jumps to 62.5%.

The problem with traditional IVR is structural:

Rigid menu trees force callers into predefined paths that rarely match their actual intent
"Press 1" fatigue drives callers to pound the zero button or yell "agent" repeatedly
No context passing. When a caller finally reaches a human, they start from scratch.
Maintenance overhead. Every new product, department, or workflow requires rebuilding the menu tree.

The handoff pattern solves all of these by replacing the static tree with a conversational triage agent that understands natural language and routes dynamically.

The key insight is that a well-designed handoff system makes routing invisible to the caller. They describe their problem in natural language, and the right specialist, whether AI or human, picks up seamlessly, with full context.

How the Handoff Pattern Works#

At its core, the handoff follows a straightforward flow:

1. Triage and Intent Detection#

The caller connects and speaks to a triage agent, a lightweight AI whose only job is to understand what the caller needs and route them to the right place.

Unlike IVR, the triage agent uses LLM-powered intent classification. It doesn't need the caller to pick from a menu. It listens, asks clarifying questions if needed, and makes a routing decision.

2. Context Packaging#

Before transferring, the triage agent packages everything it knows:

Detected intent and confidence score
Extracted entities (account number, order ID, dates)
Conversation transcript so far
Caller sentiment and emotional state
Any data already retrieved (account lookup, order status)

This context package is what separates a good handoff from a frustrating one. The receiving agent, whether AI or human, should never ask the caller to repeat themselves.

3. Transfer Execution#

The transfer itself can happen in two ways:

Cold transfer. The caller is immediately connected to the new agent. Fast, but the receiving agent only has the context package to work with.
Warm transfer. The triage agent privately briefs the receiving agent (or human) before connecting the caller. Slower, but creates a much better experience for complex issues.

4. Specialist Handling#

The specialist agent (or human) picks up with full context and handles the request. If the conversation drifts into another domain, the specialist can trigger another handoff, and routing is not limited to the first triage step.

When to Trigger a Handoff#

Modern handoff systems go well beyond rigid keyword matching. The best implementations use multiple trigger signals:

Intent-Based Triggers#

The most common trigger is when the caller's request falls outside the current agent's domain. A tech support agent hearing "I want to cancel my subscription" should route to retention or billing.

Sentiment and Emotional Triggers#

Voice carries emotional signals that text doesn't. Frustration, urgency, and escalating language should trigger escalation to a human, ideally detected from tone and prosody, not just words.

Confidence Thresholds#

When the AI's confidence in its intent classification drops below a threshold, it's better to route than guess. A confidently wrong agent is worse than a brief transfer.

Explicit User Requests#

The caller says "Let me talk to a person" or "Transfer me to billing." This should always be honored immediately, with no friction and no "let me try to help first."

Complexity Boundaries#

Some requests require multi-step reasoning, access to systems the current agent doesn't have, or judgment calls that exceed AI capabilities. Recognize these early.

Regulatory and Compliance Mandates#

In healthcare (HIPAA), finance (SOX, PCI), and government (FedRAMP), certain actions legally require human oversight. The handoff pattern enforces this automatically.

The "Don't Make Them Repeat Themselves" Problem#

Research consistently shows that the #1 source of frustration in agent transfers is having to repeat information. Whether the transfer is AI-to-AI or AI-to-human, context loss destroys the experience.

According to Metrigy's CX Optimization 2025-26 Consumer Views study, 84.7% of consumers still prefer interacting with a human over an AI agent, but 46% will use AI agents in select circumstances, especially when it means getting directed to the right person faster. The key is ensuring a human option exists and the transition won't be painful.

Solving this requires:

Structured context passing. Don't just forward a transcript. Pass a structured summary of detected intent, extracted entities, sentiment score, actions already taken, and the caller's stated goal.
Pre-handoff confirmation. Before transferring, tell the caller what's happening: "I'm going to connect you with a billing specialist. I'll share everything we've discussed so you won't need to repeat yourself."
Human-facing context display. When routing to a live agent, show them the AI's summary, confidence scores, and full transcript in a dashboard. The human should be ready to help in 10-15 seconds.

Building Handoffs with LiveKit#

LiveKit's agent framework supports the handoff pattern natively through its Agent class and @function_tool decorator, making it straightforward to build multi-agent voice systems with intelligent routing.

Agent-to-Agent Routing#

In LiveKit, each specialist is defined as a separate Agent with its own instructions, tools, and personality. The triage agent uses @function_tool methods that return a different Agent instance, triggering an automatic handoff:

1from livekit.agents import Agent, function_tool, RunContext
2
3
4class BillingAgent(Agent):
5    def __init__(self):
6        super().__init__(
7            instructions="You are a billing specialist for Acme Corp. "
8                         "Help callers with invoices, payments, and subscription changes."
9        )
10
11
12class SupportAgent(Agent):
13    def __init__(self):
14        super().__init__(
15            instructions="You are a technical support specialist for Acme Corp. "
16                         "Help callers troubleshoot bugs, outages, and product issues."
17        )
18
19
20class SalesAgent(Agent):
21    def __init__(self):
22        super().__init__(
23            instructions="You are a sales specialist for Acme Corp. "
24                         "Help callers with pricing, plans, and demo requests."
25        )
26
27
28class TriageAgent(Agent):
29    def __init__(self):
30        super().__init__(
31            instructions="""You are a receptionist for Acme Corp.
32            Listen to what the caller needs and route them to the
33            right department. Do NOT try to handle requests yourself."""
34        )
35
36    @function_tool()
37    async def transfer_to_billing(self, context: RunContext):
38        """Transfer the caller to the billing department.
39        Use when the caller asks about invoices, payments,
40        charges, refunds, or subscription changes."""
41        return BillingAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to billing"
42
43    @function_tool()
44    async def transfer_to_support(self, context: RunContext):
45        """Transfer to technical support.
46        Use when the caller reports bugs, outages,
47        or needs help using the product."""
48        return SupportAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to technical support"
49
50    @function_tool()
51    async def transfer_to_sales(self, context: RunContext):
52        """Transfer to the sales team.
53        Use when the caller asks about pricing, plans,
54        new features, or wants a demo."""
55        return SalesAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to sales"

Each tool passes self.chat_ctx.copy(exclude_instructions=True) to the next agent. .copy() hands the specialist the full conversation history, and exclude_instructions=True strips out the triage agent's persona so the specialist starts fresh with its own instructions rather than inheriting the receptionist's. Without that flag, the previous agent's system prompt would bleed into the new one. For more on context preservation and handoff options, see the workflows documentation.

Agent-to-Human Warm Transfer#

For escalation to a live person, LiveKit provides a complete warm transfer workflow that includes a private consultation room where an AI agent can brief the human before connecting the caller. The sip_call_to parameter takes the supervisor's phone number, and chat_ctx passes the full conversation history so the briefing agent can summarize everything:

1from livekit.agents import Agent, function_tool, RunContext
2from livekit.agents.beta.workflows import WarmTransferTask
3
4SUPERVISOR_PHONE = "+15551234567"
5SIP_TRUNK_ID = "your-sip-trunk-id"
6
7class SupportAgent(Agent):
8    @function_tool()
9    async def escalate_to_human(self, context: RunContext):
10        """Transfer to a human supervisor when:
11        - The customer explicitly asks for a person
12        - The issue involves a billing dispute over $500
13        - Sentiment indicates high frustration
14        - The problem requires access to internal tools"""
15        await self.session.say(
16            "I'm connecting you with a specialist now. "
17            "I'll share our conversation so you won't "
18            "need to repeat anything.",
19            allow_interruptions=False,
20        )
21        await context.wait_for_playout()
22        result = await WarmTransferTask(
23            sip_call_to=SUPERVISOR_PHONE,
24            sip_trunk_id=SIP_TRUNK_ID,
25            chat_ctx=self.chat_ctx,
26        )
27        return result

allow_interruptions=False on session.say() prevents the caller from talking over the hold message. context.wait_for_playout() is required here because you can't directly await a speech handle inside a function tool. It's the correct way to pause until the spoken line finishes before the transfer starts.

What happens under the hood:

The caller is placed on hold (audio I/O disabled, optional hold music)
A private consultation room is created for the AI to brief the human
The human supervisor is dialed in via SIP and receives a full context summary
The supervisor is then moved into the caller's room, and the caller and human are connected
Both AI agents disconnect, leaving a clean human-to-human call

This is the warm transfer, where the human is fully briefed before they ever speak to the caller. No "can you tell me what's going on?" No starting from zero.

Bidirectional Handoffs#

Handoffs don't just go one direction. LiveKit supports:

AI → Human. Standard escalation when AI can't resolve.
Human → AI. After resolving a complex issue, the human can hand back to the AI for remaining routine steps (e.g., scheduling a follow-up, sending a confirmation). In LiveKit, this works via the SIP integration: the human agent triggers a transfer back into a LiveKit room where a new AI agent is initialized with the accumulated conversation context, so it picks up with full awareness of what was already resolved.
AI → AI → Human. Chained routing through multiple specialist agents before escalation.

Latency Optimization#

The handoff itself must feel seamless in real-time voice. Dead air during routing signals to the caller that something went wrong. LiveKit handles this with:

Filler speech during transfer, such as "One moment while I connect you..."
Lightweight triage models (e.g., GPT-4o-mini) for fast intent classification
Per-agent plugin overrides, where each specialist can use different LLM, STT, or TTS providers optimized for their domain
Streaming at every pipeline stage to minimize perceived latency

Architecture: Handoff vs. Other Multi-Agent Patterns#

The handoff pattern is one of several ways to coordinate multiple agents. Here's how it compares:

Pattern	How It Routes	Control Flow	Best For
Handoff / Routing	Triage agent classifies intent, transfers control entirely to specialist	One agent active at a time	Multi-domain voice apps, IVR replacement
How to Use the Supervisor Pattern for Multi-Agent Voice AI Systems	Direct agent-to-agent transfers without a central coordinator.	Supervisor stays in control	Complex multi-step workflows requiring oversight
Sequential Pipeline Architecture for Voice Agents	Fixed chain: output flows forward.	Predetermined linear flow	The voice stack itself (VAD → STT → LLM → TTS)
The ReAct Pattern for Voice Agents and How AI Agents Think, Act, and Respond	Single agent loops through think, act, observe cycles with tools.	Dynamic, self-directed	Tool-calling agents that query APIs and databases

When to choose Handoff over Supervisor. If your agents handle independent domains (billing, support, sales) and don't need to collaborate on a single request, Handoff is simpler and faster. If a single request requires coordinating multiple specialists simultaneously, use Supervisor.

Best Practices for Production Handoffs#

Design Your Tool Descriptions Carefully#

The LLM decides when to hand off based on the @function_tool description. Vague descriptions lead to misrouting. Be explicit about what triggers each handoff, including specific keywords, scenarios, and boundary conditions.

Don't Over-Route#

If there's only one specialist agent, skip the triage layer entirely. A router with one destination is just unnecessary latency. The handoff pattern shines when there are 3+ distinct domains.

Handle Mid-Conversation Drift#

Callers naturally change topics. A billing question might evolve into a cancellation request. Each specialist agent should be able to trigger its own handoffs when the conversation drifts outside its domain.

Test the Boundaries, Not Just the Happy Path#

The handoff boundary is where most friction and failure lives. Test:

Ambiguous intents that could go to multiple specialists
Rapid topic switching within a single call
Edge cases where no specialist fits
What happens when the target agent or human is unavailable

Pre-Classify Common Intents#

For high-volume voice systems, a lightweight intent classifier can shortcut the full LLM reasoning loop for common requests. Save the LLM-powered routing for ambiguous or complex intents.

Real-World Use Cases#

Healthcare Front Desk#

A triage agent answers calls to a medical office. Simple requests (appointment scheduling, prescription refills, hours) are handled by specialist AI agents. Complex clinical questions or emergencies are routed to a nurse or on-call physician with full context.

E-Commerce Customer Service#

A triage agent detects whether the caller needs order tracking, returns, billing, or product questions. Each vertical has its own specialist agent with access to the relevant APIs. Frustrated callers or high-value accounts are warm-transferred to a human.

Financial Services#

A voice agent handles routine balance inquiries and transaction lookups. Requests involving wire transfers above a threshold, disputes, or compliance-sensitive actions trigger an automatic handoff to a human advisor, along with a full evidence pack.

Drive-Through Ordering#

A triage agent takes orders and routes special requests (dietary accommodations, large catering orders) to a specialist agent or a human manager. LiveKit's drive-thru example demonstrates this pattern.

Getting Started#

Ready to build? Here's the path from zero to a working handoff system:

If you prefer to start without code, Agent Builder lets you prototype a basic voice routing agent in your browser before converting to Python.

Define your domains. List the distinct categories of requests your voice agent needs to handle. These become your specialist agents.
Build the triage agent. Create an Agent with @function_tool methods for each specialist. Write clear, specific tool descriptions.
Implement specialists. Each specialist is its own Agent class with domain-specific instructions, tools, and optionally different LLM/TTS providers.
Add human escalation. Give every specialist a @function_tool for escalating to a human via WarmTransferTask.
Test in the LiveKit Agent Console before deploying to production telephony. When you're ready to go live, deploy to LiveKit Cloud with one click.

LiveKit provides working reference implementations for each of these steps:

Front desk booking agent: calendar booking with tasks, tools, and evaluations
Warm transfer: full AI-to-human escalation with consultation rooms
Medical office triage: healthcare-specific multi-agent coordination

Key Takeaways#

The handoff pattern replaces rigid IVR menus with intelligent, LLM-powered routing that understands natural language
It supports both agent-to-agent (routing between AI specialists) and agent-to-human (escalation) transfers
Context preservation is the single most important implementation detail; callers should never repeat themselves
LiveKit's agent framework supports handoffs natively via @function_tool returns, WarmTransferTask, and SIP-based telephony
Start simple: define your domains, build a triage agent, and expand to more specialists as needed

The LiveKit Agents quickstart is the fastest way to get a multi-agent handoff system running. Give it a try and let us know what you're building.

06.19.2026

Build a voice AI agent with memory using LiveKit and Supabase

Read

06.17.2026

Why Your Agent Leaves on Browser Refresh (and When to Keep It)

Read

06.10.2026

LiveKit noise cancellation: what it is and how it works

Read

What Is the Handoff Pattern?#

Why IVR Is Broken (and Why Handoffs Fix It)#

How the Handoff Pattern Works#

1. Triage and Intent Detection#

2. Context Packaging#

3. Transfer Execution#

4. Specialist Handling#

When to Trigger a Handoff#

Intent-Based Triggers#

Sentiment and Emotional Triggers#

Confidence Thresholds#

Explicit User Requests#

Complexity Boundaries#

Regulatory and Compliance Mandates#

The "Don't Make Them Repeat Themselves" Problem#

Building Handoffs with LiveKit#

Agent-to-Agent Routing#

Agent-to-Human Warm Transfer#

Bidirectional Handoffs#

Latency Optimization#

Architecture: Handoff vs. Other Multi-Agent Patterns#

Best Practices for Production Handoffs#

Design Your Tool Descriptions Carefully#

Don't Over-Route#

Handle Mid-Conversation Drift#

Test the Boundaries, Not Just the Happy Path#

Pre-Classify Common Intents#

Real-World Use Cases#

Healthcare Front Desk#

E-Commerce Customer Service#

Financial Services#

Drive-Through Ordering#

Getting Started#

Key Takeaways#

Related

Build a voice AI agent with memory using LiveKit and Supabase

Why Your Agent Leaves on Browser Refresh (and When to Keep It)

LiveKit noise cancellation: what it is and how it works