Skip to main content

The Handoff Pattern for Voice Agents That Replaces IVR Menus

The handoff pattern is the most practical multi-agent architecture for voice AI. It replaces rigid IVR trees with intelligent, LLM-powered routing, and it's how modern voice agents decide who handles what, mid-conversation.


Every developer building a voice agent eventually hits the same question: what happens when the conversation needs to go somewhere else?

Maybe the caller asked about billing, but your agent only handles tech support. Maybe the AI reached the limits of what it can resolve and a human needs to step in. Maybe the caller just said, "Let me talk to a person."

This is the handoff pattern for voice agents, and it's arguably the single most important multi-agent design pattern for production voice AI. It's the modern replacement for IVR menu trees ("press 1 for sales, press 2 for support"), powered by LLM intent detection instead of rigid button presses.

If you've ever built or used a customer service voice system, you've encountered this pattern. The difference now is that AI makes it actually work well.


What Is the Handoff Pattern?

The handoff pattern is a multi-agent architecture where one agent dynamically transfers control of a conversation to another agent (or a human) mid-interaction, based on real-time intent detection.

Instead of forcing callers through a decision tree, a triage agent listens to what the caller actually says, classifies intent using an LLM, and routes to the right specialist, whether that's another AI agent or a live person.

There are two primary flavors:

TypeHow It WorksBest For
Agent → Agent (Routing)A triage agent classifies intent and dispatches to a specialist AI agent (billing, booking, tech support, etc.)Multi-domain voice apps, customer support, any scenario with distinct verticals
Agent → Human (Escalation)AI recognizes it can't resolve the issue and transfers to a live person with full contextCompliance-sensitive workflows, emotionally charged callers, high-stakes decisions

In practice, most production systems use both. The triage agent routes to specialist AI agents for routine tasks, and any of those agents can escalate to a human when needed.


Why IVR Is Broken (and Why Handoffs Fix It)

IVR systems were revolutionary in 1990. In 2026, they're a liability. According to Metrigy's CX Optimization 2025-26 study, 37.6% of companies plan to fully replace IVRs with AI triage agents. Among their Research Success Group (companies seeing the highest measurable improvements from AI), that number jumps to 62.5%.

The problem with traditional IVR is structural:

  • Rigid menu trees force callers into predefined paths that rarely match their actual intent
  • "Press 1" fatigue drives callers to pound the zero button or yell "agent" repeatedly
  • No context passing. When a caller finally reaches a human, they start from scratch.
  • Maintenance overhead. Every new product, department, or workflow requires rebuilding the menu tree.

The handoff pattern solves all of these by replacing the static tree with a conversational triage agent that understands natural language and routes dynamically.

The key insight is that a well-designed handoff system makes routing invisible to the caller. They describe their problem in natural language, and the right specialist, whether AI or human, picks up seamlessly, with full context.


How the Handoff Pattern Works

At its core, the handoff follows a straightforward flow:

1. Triage and Intent Detection

The caller connects and speaks to a triage agent, a lightweight AI whose only job is to understand what the caller needs and route them to the right place.

Unlike IVR, the triage agent uses LLM-powered intent classification. It doesn't need the caller to pick from a menu. It listens, asks clarifying questions if needed, and makes a routing decision.

2. Context Packaging

Before transferring, the triage agent packages everything it knows:

  • Detected intent and confidence score
  • Extracted entities (account number, order ID, dates)
  • Conversation transcript so far
  • Caller sentiment and emotional state
  • Any data already retrieved (account lookup, order status)

This context package is what separates a good handoff from a frustrating one. The receiving agent, whether AI or human, should never ask the caller to repeat themselves.

3. Transfer Execution

The transfer itself can happen in two ways:

  • Cold transfer. The caller is immediately connected to the new agent. Fast, but the receiving agent only has the context package to work with.
  • Warm transfer. The triage agent privately briefs the receiving agent (or human) before connecting the caller. Slower, but creates a much better experience for complex issues.

4. Specialist Handling

The specialist agent (or human) picks up with full context and handles the request. If the conversation drifts into another domain, the specialist can trigger another handoff, and routing is not limited to the first triage step.


When to Trigger a Handoff

Modern handoff systems go well beyond rigid keyword matching. The best implementations use multiple trigger signals:

Intent-Based Triggers

The most common trigger is when the caller's request falls outside the current agent's domain. A tech support agent hearing "I want to cancel my subscription" should route to retention or billing.

Sentiment and Emotional Triggers

Voice carries emotional signals that text doesn't. Frustration, urgency, and escalating language should trigger escalation to a human, ideally detected from tone and prosody, not just words.

Confidence Thresholds

When the AI's confidence in its intent classification drops below a threshold, it's better to route than guess. A confidently wrong agent is worse than a brief transfer.

Explicit User Requests

The caller says "Let me talk to a person" or "Transfer me to billing." This should always be honored immediately, with no friction and no "let me try to help first."

Complexity Boundaries

Some requests require multi-step reasoning, access to systems the current agent doesn't have, or judgment calls that exceed AI capabilities. Recognize these early.

Regulatory and Compliance Mandates

In healthcare (HIPAA), finance (SOX, PCI), and government (FedRAMP), certain actions legally require human oversight. The handoff pattern enforces this automatically.


The "Don't Make Them Repeat Themselves" Problem

Research consistently shows that the #1 source of frustration in agent transfers is having to repeat information. Whether the transfer is AI-to-AI or AI-to-human, context loss destroys the experience.

According to Metrigy's CX Optimization 2025-26 Consumer Views study, 84.7% of consumers still prefer interacting with a human over an AI agent, but 46% will use AI agents in select circumstances, especially when it means getting directed to the right person faster. The key is ensuring a human option exists and the transition won't be painful.

Solving this requires:

  • Structured context passing. Don't just forward a transcript. Pass a structured summary of detected intent, extracted entities, sentiment score, actions already taken, and the caller's stated goal.
  • Pre-handoff confirmation. Before transferring, tell the caller what's happening: "I'm going to connect you with a billing specialist. I'll share everything we've discussed so you won't need to repeat yourself."
  • Human-facing context display. When routing to a live agent, show them the AI's summary, confidence scores, and full transcript in a dashboard. The human should be ready to help in 10-15 seconds.

Building Handoffs with LiveKit

LiveKit's agent framework supports the handoff pattern natively through its Agent class and @function_tool decorator, making it straightforward to build multi-agent voice systems with intelligent routing.

Agent-to-Agent Routing

In LiveKit, each specialist is defined as a separate Agent with its own instructions, tools, and personality. The triage agent uses @function_tool methods that return a different Agent instance, triggering an automatic handoff:

1
from livekit.agents import Agent, function_tool, RunContext
2
3
4
class BillingAgent(Agent):
5
def __init__(self):
6
super().__init__(
7
instructions="You are a billing specialist for Acme Corp. "
8
"Help callers with invoices, payments, and subscription changes."
9
)
10
11
12
class SupportAgent(Agent):
13
def __init__(self):
14
super().__init__(
15
instructions="You are a technical support specialist for Acme Corp. "
16
"Help callers troubleshoot bugs, outages, and product issues."
17
)
18
19
20
class SalesAgent(Agent):
21
def __init__(self):
22
super().__init__(
23
instructions="You are a sales specialist for Acme Corp. "
24
"Help callers with pricing, plans, and demo requests."
25
)
26
27
28
class TriageAgent(Agent):
29
def __init__(self):
30
super().__init__(
31
instructions="""You are a receptionist for Acme Corp.
32
Listen to what the caller needs and route them to the
33
right department. Do NOT try to handle requests yourself."""
34
)
35
36
@function_tool()
37
async def transfer_to_billing(self, context: RunContext):
38
"""Transfer the caller to the billing department.
39
Use when the caller asks about invoices, payments,
40
charges, refunds, or subscription changes."""
41
return BillingAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to billing"
42
43
@function_tool()
44
async def transfer_to_support(self, context: RunContext):
45
"""Transfer to technical support.
46
Use when the caller reports bugs, outages,
47
or needs help using the product."""
48
return SupportAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to technical support"
49
50
@function_tool()
51
async def transfer_to_sales(self, context: RunContext):
52
"""Transfer to the sales team.
53
Use when the caller asks about pricing, plans,
54
new features, or wants a demo."""
55
return SalesAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to sales"

Each tool passes self.chat_ctx.copy(exclude_instructions=True) to the next agent. .copy() hands the specialist the full conversation history, and exclude_instructions=True strips out the triage agent's persona so the specialist starts fresh with its own instructions rather than inheriting the receptionist's. Without that flag, the previous agent's system prompt would bleed into the new one. For more on context preservation and handoff options, see the workflows documentation.

Agent-to-Human Warm Transfer

For escalation to a live person, LiveKit provides a complete warm transfer workflow that includes a private consultation room where an AI agent can brief the human before connecting the caller. The sip_call_to parameter takes the supervisor's phone number, and chat_ctx passes the full conversation history so the briefing agent can summarize everything:

1
from livekit.agents import Agent, function_tool, RunContext
2
from livekit.agents.beta.workflows import WarmTransferTask
3
4
SUPERVISOR_PHONE = "+15551234567"
5
SIP_TRUNK_ID = "your-sip-trunk-id"
6
7
class SupportAgent(Agent):
8
@function_tool()
9
async def escalate_to_human(self, context: RunContext):
10
"""Transfer to a human supervisor when:
11
- The customer explicitly asks for a person
12
- The issue involves a billing dispute over $500
13
- Sentiment indicates high frustration
14
- The problem requires access to internal tools"""
15
await self.session.say(
16
"I'm connecting you with a specialist now. "
17
"I'll share our conversation so you won't "
18
"need to repeat anything.",
19
allow_interruptions=False,
20
)
21
await context.wait_for_playout()
22
result = await WarmTransferTask(
23
sip_call_to=SUPERVISOR_PHONE,
24
sip_trunk_id=SIP_TRUNK_ID,
25
chat_ctx=self.chat_ctx,
26
)
27
return result

allow_interruptions=False on session.say() prevents the caller from talking over the hold message. context.wait_for_playout() is required here because you can't directly await a speech handle inside a function tool. It's the correct way to pause until the spoken line finishes before the transfer starts.

What happens under the hood:

  1. The caller is placed on hold (audio I/O disabled, optional hold music)
  2. A private consultation room is created for the AI to brief the human
  3. The human supervisor is dialed in via SIP and receives a full context summary
  4. The supervisor is then moved into the caller's room, and the caller and human are connected
  5. Both AI agents disconnect, leaving a clean human-to-human call

This is the warm transfer, where the human is fully briefed before they ever speak to the caller. No "can you tell me what's going on?" No starting from zero.

Bidirectional Handoffs

Handoffs don't just go one direction. LiveKit supports:

  • AI → Human. Standard escalation when AI can't resolve.
  • Human → AI. After resolving a complex issue, the human can hand back to the AI for remaining routine steps (e.g., scheduling a follow-up, sending a confirmation). In LiveKit, this works via the SIP integration: the human agent triggers a transfer back into a LiveKit room where a new AI agent is initialized with the accumulated conversation context, so it picks up with full awareness of what was already resolved.
  • AI → AI → Human. Chained routing through multiple specialist agents before escalation.

Latency Optimization

The handoff itself must feel seamless in real-time voice. Dead air during routing signals to the caller that something went wrong. LiveKit handles this with:

  • Filler speech during transfer, such as "One moment while I connect you..."
  • Lightweight triage models (e.g., GPT-4o-mini) for fast intent classification
  • Per-agent plugin overrides, where each specialist can use different LLM, STT, or TTS providers optimized for their domain
  • Streaming at every pipeline stage to minimize perceived latency

Architecture: Handoff vs. Other Multi-Agent Patterns

The handoff pattern is one of several ways to coordinate multiple agents. Here's how it compares:

PatternHow It RoutesControl FlowBest For
Handoff / RoutingTriage agent classifies intent, transfers control entirely to specialistOne agent active at a timeMulti-domain voice apps, IVR replacement
How to Use the Supervisor Pattern for Multi-Agent Voice AI SystemsDirect agent-to-agent transfers without a central coordinator.Supervisor stays in controlComplex multi-step workflows requiring oversight
Sequential Pipeline Architecture for Voice AgentsFixed chain: output flows forward.Predetermined linear flowThe voice stack itself (VAD → STT → LLM → TTS)
The ReAct Pattern for Voice Agents and How AI Agents Think, Act, and RespondSingle agent loops through think, act, observe cycles with tools.Dynamic, self-directedTool-calling agents that query APIs and databases

When to choose Handoff over Supervisor. If your agents handle independent domains (billing, support, sales) and don't need to collaborate on a single request, Handoff is simpler and faster. If a single request requires coordinating multiple specialists simultaneously, use Supervisor.


Best Practices for Production Handoffs

Design Your Tool Descriptions Carefully

The LLM decides when to hand off based on the @function_tool description. Vague descriptions lead to misrouting. Be explicit about what triggers each handoff, including specific keywords, scenarios, and boundary conditions.

Don't Over-Route

If there's only one specialist agent, skip the triage layer entirely. A router with one destination is just unnecessary latency. The handoff pattern shines when there are 3+ distinct domains.

Handle Mid-Conversation Drift

Callers naturally change topics. A billing question might evolve into a cancellation request. Each specialist agent should be able to trigger its own handoffs when the conversation drifts outside its domain.

Test the Boundaries, Not Just the Happy Path

The handoff boundary is where most friction and failure lives. Test:

  • Ambiguous intents that could go to multiple specialists
  • Rapid topic switching within a single call
  • Edge cases where no specialist fits
  • What happens when the target agent or human is unavailable

Pre-Classify Common Intents

For high-volume voice systems, a lightweight intent classifier can shortcut the full LLM reasoning loop for common requests. Save the LLM-powered routing for ambiguous or complex intents.


Real-World Use Cases

Healthcare Front Desk

A triage agent answers calls to a medical office. Simple requests (appointment scheduling, prescription refills, hours) are handled by specialist AI agents. Complex clinical questions or emergencies are routed to a nurse or on-call physician with full context.

E-Commerce Customer Service

A triage agent detects whether the caller needs order tracking, returns, billing, or product questions. Each vertical has its own specialist agent with access to the relevant APIs. Frustrated callers or high-value accounts are warm-transferred to a human.

Financial Services

A voice agent handles routine balance inquiries and transaction lookups. Requests involving wire transfers above a threshold, disputes, or compliance-sensitive actions trigger an automatic handoff to a human advisor, along with a full evidence pack.

Drive-Through Ordering

A triage agent takes orders and routes special requests (dietary accommodations, large catering orders) to a specialist agent or a human manager. LiveKit's drive-thru example demonstrates this pattern.


Getting Started

Ready to build? Here's the path from zero to a working handoff system:

If you prefer to start without code, Agent Builder lets you prototype a basic voice routing agent in your browser before converting to Python.

  1. Define your domains. List the distinct categories of requests your voice agent needs to handle. These become your specialist agents.
  2. Build the triage agent. Create an Agent with @function_tool methods for each specialist. Write clear, specific tool descriptions.
  3. Implement specialists. Each specialist is its own Agent class with domain-specific instructions, tools, and optionally different LLM/TTS providers.
  4. Add human escalation. Give every specialist a @function_tool for escalating to a human via WarmTransferTask.
  5. Test in the LiveKit Agent Playground before deploying to production telephony. When you're ready to go live, deploy to LiveKit Cloud with one click.

LiveKit provides working reference implementations for each of these steps:


Key Takeaways

  • The handoff pattern replaces rigid IVR menus with intelligent, LLM-powered routing that understands natural language
  • It supports both agent-to-agent (routing between AI specialists) and agent-to-human (escalation) transfers
  • Context preservation is the single most important implementation detail; callers should never repeat themselves
  • LiveKit's agent framework supports handoffs natively via @function_tool returns, WarmTransferTask, and SIP-based telephony
  • Start simple: define your domains, build a triage agent, and expand to more specialists as needed

The LiveKit Agents quickstart is the fastest way to get a multi-agent handoff system running. Give it a try and let us know what you're building.