The handoff pattern is the most practical multi-agent architecture for voice AI. It replaces rigid IVR trees with intelligent, LLM-powered routing, and it's how modern voice agents decide who handles what, mid-conversation.
Every developer building a voice agent eventually hits the same question: what happens when the conversation needs to go somewhere else?
Maybe the caller asked about billing, but your agent only handles tech support. Maybe the AI reached the limits of what it can resolve and a human needs to step in. Maybe the caller just said, "Let me talk to a person."
This is the handoff pattern for voice agents, and it's arguably the single most important multi-agent design pattern for production voice AI. It's the modern replacement for IVR menu trees ("press 1 for sales, press 2 for support"), powered by LLM intent detection instead of rigid button presses.
If you've ever built or used a customer service voice system, you've encountered this pattern. The difference now is that AI makes it actually work well.
What Is the Handoff Pattern?
The handoff pattern is a multi-agent architecture where one agent dynamically transfers control of a conversation to another agent (or a human) mid-interaction, based on real-time intent detection.
Instead of forcing callers through a decision tree, a triage agent listens to what the caller actually says, classifies intent using an LLM, and routes to the right specialist, whether that's another AI agent or a live person.
There are two primary flavors:
| Type | How It Works | Best For |
|---|---|---|
| Agent → Agent (Routing) | A triage agent classifies intent and dispatches to a specialist AI agent (billing, booking, tech support, etc.) | Multi-domain voice apps, customer support, any scenario with distinct verticals |
| Agent → Human (Escalation) | AI recognizes it can't resolve the issue and transfers to a live person with full context | Compliance-sensitive workflows, emotionally charged callers, high-stakes decisions |
In practice, most production systems use both. The triage agent routes to specialist AI agents for routine tasks, and any of those agents can escalate to a human when needed.
Why IVR Is Broken (and Why Handoffs Fix It)
IVR systems were revolutionary in 1990. In 2026, they're a liability. According to Metrigy's CX Optimization 2025-26 study, 37.6% of companies plan to fully replace IVRs with AI triage agents. Among their Research Success Group (companies seeing the highest measurable improvements from AI), that number jumps to 62.5%.
The problem with traditional IVR is structural:
- Rigid menu trees force callers into predefined paths that rarely match their actual intent
- "Press 1" fatigue drives callers to pound the zero button or yell "agent" repeatedly
- No context passing. When a caller finally reaches a human, they start from scratch.
- Maintenance overhead. Every new product, department, or workflow requires rebuilding the menu tree.
The handoff pattern solves all of these by replacing the static tree with a conversational triage agent that understands natural language and routes dynamically.
The key insight is that a well-designed handoff system makes routing invisible to the caller. They describe their problem in natural language, and the right specialist, whether AI or human, picks up seamlessly, with full context.
How the Handoff Pattern Works
At its core, the handoff follows a straightforward flow:
1. Triage and Intent Detection
The caller connects and speaks to a triage agent, a lightweight AI whose only job is to understand what the caller needs and route them to the right place.
Unlike IVR, the triage agent uses LLM-powered intent classification. It doesn't need the caller to pick from a menu. It listens, asks clarifying questions if needed, and makes a routing decision.
2. Context Packaging
Before transferring, the triage agent packages everything it knows:
- Detected intent and confidence score
- Extracted entities (account number, order ID, dates)
- Conversation transcript so far
- Caller sentiment and emotional state
- Any data already retrieved (account lookup, order status)
This context package is what separates a good handoff from a frustrating one. The receiving agent, whether AI or human, should never ask the caller to repeat themselves.
3. Transfer Execution
The transfer itself can happen in two ways:
- Cold transfer. The caller is immediately connected to the new agent. Fast, but the receiving agent only has the context package to work with.
- Warm transfer. The triage agent privately briefs the receiving agent (or human) before connecting the caller. Slower, but creates a much better experience for complex issues.
4. Specialist Handling
The specialist agent (or human) picks up with full context and handles the request. If the conversation drifts into another domain, the specialist can trigger another handoff, and routing is not limited to the first triage step.
When to Trigger a Handoff
Modern handoff systems go well beyond rigid keyword matching. The best implementations use multiple trigger signals:
Intent-Based Triggers
The most common trigger is when the caller's request falls outside the current agent's domain. A tech support agent hearing "I want to cancel my subscription" should route to retention or billing.
Sentiment and Emotional Triggers
Voice carries emotional signals that text doesn't. Frustration, urgency, and escalating language should trigger escalation to a human, ideally detected from tone and prosody, not just words.
Confidence Thresholds
When the AI's confidence in its intent classification drops below a threshold, it's better to route than guess. A confidently wrong agent is worse than a brief transfer.
Explicit User Requests
The caller says "Let me talk to a person" or "Transfer me to billing." This should always be honored immediately, with no friction and no "let me try to help first."
Complexity Boundaries
Some requests require multi-step reasoning, access to systems the current agent doesn't have, or judgment calls that exceed AI capabilities. Recognize these early.
Regulatory and Compliance Mandates
In healthcare (HIPAA), finance (SOX, PCI), and government (FedRAMP), certain actions legally require human oversight. The handoff pattern enforces this automatically.
The "Don't Make Them Repeat Themselves" Problem
Research consistently shows that the #1 source of frustration in agent transfers is having to repeat information. Whether the transfer is AI-to-AI or AI-to-human, context loss destroys the experience.
According to Metrigy's CX Optimization 2025-26 Consumer Views study, 84.7% of consumers still prefer interacting with a human over an AI agent, but 46% will use AI agents in select circumstances, especially when it means getting directed to the right person faster. The key is ensuring a human option exists and the transition won't be painful.
Solving this requires:
- Structured context passing. Don't just forward a transcript. Pass a structured summary of detected intent, extracted entities, sentiment score, actions already taken, and the caller's stated goal.
- Pre-handoff confirmation. Before transferring, tell the caller what's happening: "I'm going to connect you with a billing specialist. I'll share everything we've discussed so you won't need to repeat yourself."
- Human-facing context display. When routing to a live agent, show them the AI's summary, confidence scores, and full transcript in a dashboard. The human should be ready to help in 10-15 seconds.
Building Handoffs with LiveKit
LiveKit's agent framework supports the handoff pattern natively through its Agent class and @function_tool decorator, making it straightforward to build multi-agent voice systems with intelligent routing.
Agent-to-Agent Routing
In LiveKit, each specialist is defined as a separate Agent with its own instructions, tools, and personality. The triage agent uses @function_tool methods that return a different Agent instance, triggering an automatic handoff:
1from livekit.agents import Agent, function_tool, RunContext234class BillingAgent(Agent):5def __init__(self):6super().__init__(7instructions="You are a billing specialist for Acme Corp. "8"Help callers with invoices, payments, and subscription changes."9)101112class SupportAgent(Agent):13def __init__(self):14super().__init__(15instructions="You are a technical support specialist for Acme Corp. "16"Help callers troubleshoot bugs, outages, and product issues."17)181920class SalesAgent(Agent):21def __init__(self):22super().__init__(23instructions="You are a sales specialist for Acme Corp. "24"Help callers with pricing, plans, and demo requests."25)262728class TriageAgent(Agent):29def __init__(self):30super().__init__(31instructions="""You are a receptionist for Acme Corp.32Listen to what the caller needs and route them to the33right department. Do NOT try to handle requests yourself."""34)3536@function_tool()37async def transfer_to_billing(self, context: RunContext):38"""Transfer the caller to the billing department.39Use when the caller asks about invoices, payments,40charges, refunds, or subscription changes."""41return BillingAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to billing"4243@function_tool()44async def transfer_to_support(self, context: RunContext):45"""Transfer to technical support.46Use when the caller reports bugs, outages,47or needs help using the product."""48return SupportAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to technical support"4950@function_tool()51async def transfer_to_sales(self, context: RunContext):52"""Transfer to the sales team.53Use when the caller asks about pricing, plans,54new features, or wants a demo."""55return SalesAgent(chat_ctx=self.chat_ctx.copy(exclude_instructions=True)), "Transferring to sales"
Each tool passes self.chat_ctx.copy(exclude_instructions=True) to the next agent. .copy() hands the specialist the full conversation history, and exclude_instructions=True strips out the triage agent's persona so the specialist starts fresh with its own instructions rather than inheriting the receptionist's. Without that flag, the previous agent's system prompt would bleed into the new one. For more on context preservation and handoff options, see the workflows documentation.
Agent-to-Human Warm Transfer
For escalation to a live person, LiveKit provides a complete warm transfer workflow that includes a private consultation room where an AI agent can brief the human before connecting the caller. The sip_call_to parameter takes the supervisor's phone number, and chat_ctx passes the full conversation history so the briefing agent can summarize everything:
1from livekit.agents import Agent, function_tool, RunContext2from livekit.agents.beta.workflows import WarmTransferTask34SUPERVISOR_PHONE = "+15551234567"5SIP_TRUNK_ID = "your-sip-trunk-id"67class SupportAgent(Agent):8@function_tool()9async def escalate_to_human(self, context: RunContext):10"""Transfer to a human supervisor when:11- The customer explicitly asks for a person12- The issue involves a billing dispute over $50013- Sentiment indicates high frustration14- The problem requires access to internal tools"""15await self.session.say(16"I'm connecting you with a specialist now. "17"I'll share our conversation so you won't "18"need to repeat anything.",19allow_interruptions=False,20)21await context.wait_for_playout()22result = await WarmTransferTask(23sip_call_to=SUPERVISOR_PHONE,24sip_trunk_id=SIP_TRUNK_ID,25chat_ctx=self.chat_ctx,26)27return result
allow_interruptions=False on session.say() prevents the caller from talking over the hold message. context.wait_for_playout() is required here because you can't directly await a speech handle inside a function tool. It's the correct way to pause until the spoken line finishes before the transfer starts.
What happens under the hood:
- The caller is placed on hold (audio I/O disabled, optional hold music)
- A private consultation room is created for the AI to brief the human
- The human supervisor is dialed in via SIP and receives a full context summary
- The supervisor is then moved into the caller's room, and the caller and human are connected
- Both AI agents disconnect, leaving a clean human-to-human call
This is the warm transfer, where the human is fully briefed before they ever speak to the caller. No "can you tell me what's going on?" No starting from zero.
Bidirectional Handoffs
Handoffs don't just go one direction. LiveKit supports:
- AI → Human. Standard escalation when AI can't resolve.
- Human → AI. After resolving a complex issue, the human can hand back to the AI for remaining routine steps (e.g., scheduling a follow-up, sending a confirmation). In LiveKit, this works via the SIP integration: the human agent triggers a transfer back into a LiveKit room where a new AI agent is initialized with the accumulated conversation context, so it picks up with full awareness of what was already resolved.
- AI → AI → Human. Chained routing through multiple specialist agents before escalation.
Latency Optimization
The handoff itself must feel seamless in real-time voice. Dead air during routing signals to the caller that something went wrong. LiveKit handles this with:
- Filler speech during transfer, such as "One moment while I connect you..."
- Lightweight triage models (e.g., GPT-4o-mini) for fast intent classification
- Per-agent plugin overrides, where each specialist can use different LLM, STT, or TTS providers optimized for their domain
- Streaming at every pipeline stage to minimize perceived latency
Architecture: Handoff vs. Other Multi-Agent Patterns
The handoff pattern is one of several ways to coordinate multiple agents. Here's how it compares:
| Pattern | How It Routes | Control Flow | Best For |
|---|---|---|---|
| Handoff / Routing | Triage agent classifies intent, transfers control entirely to specialist | One agent active at a time | Multi-domain voice apps, IVR replacement |
| How to Use the Supervisor Pattern for Multi-Agent Voice AI Systems | Direct agent-to-agent transfers without a central coordinator. | Supervisor stays in control | Complex multi-step workflows requiring oversight |
| Sequential Pipeline Architecture for Voice Agents | Fixed chain: output flows forward. | Predetermined linear flow | The voice stack itself (VAD → STT → LLM → TTS) |
| The ReAct Pattern for Voice Agents and How AI Agents Think, Act, and Respond | Single agent loops through think, act, observe cycles with tools. | Dynamic, self-directed | Tool-calling agents that query APIs and databases |
When to choose Handoff over Supervisor. If your agents handle independent domains (billing, support, sales) and don't need to collaborate on a single request, Handoff is simpler and faster. If a single request requires coordinating multiple specialists simultaneously, use Supervisor.
Best Practices for Production Handoffs
Design Your Tool Descriptions Carefully
The LLM decides when to hand off based on the @function_tool description. Vague descriptions lead to misrouting. Be explicit about what triggers each handoff, including specific keywords, scenarios, and boundary conditions.
Don't Over-Route
If there's only one specialist agent, skip the triage layer entirely. A router with one destination is just unnecessary latency. The handoff pattern shines when there are 3+ distinct domains.
Handle Mid-Conversation Drift
Callers naturally change topics. A billing question might evolve into a cancellation request. Each specialist agent should be able to trigger its own handoffs when the conversation drifts outside its domain.
Test the Boundaries, Not Just the Happy Path
The handoff boundary is where most friction and failure lives. Test:
- Ambiguous intents that could go to multiple specialists
- Rapid topic switching within a single call
- Edge cases where no specialist fits
- What happens when the target agent or human is unavailable
Pre-Classify Common Intents
For high-volume voice systems, a lightweight intent classifier can shortcut the full LLM reasoning loop for common requests. Save the LLM-powered routing for ambiguous or complex intents.
Real-World Use Cases
Healthcare Front Desk
A triage agent answers calls to a medical office. Simple requests (appointment scheduling, prescription refills, hours) are handled by specialist AI agents. Complex clinical questions or emergencies are routed to a nurse or on-call physician with full context.
E-Commerce Customer Service
A triage agent detects whether the caller needs order tracking, returns, billing, or product questions. Each vertical has its own specialist agent with access to the relevant APIs. Frustrated callers or high-value accounts are warm-transferred to a human.
Financial Services
A voice agent handles routine balance inquiries and transaction lookups. Requests involving wire transfers above a threshold, disputes, or compliance-sensitive actions trigger an automatic handoff to a human advisor, along with a full evidence pack.
Drive-Through Ordering
A triage agent takes orders and routes special requests (dietary accommodations, large catering orders) to a specialist agent or a human manager. LiveKit's drive-thru example demonstrates this pattern.
Getting Started
Ready to build? Here's the path from zero to a working handoff system:
If you prefer to start without code, Agent Builder lets you prototype a basic voice routing agent in your browser before converting to Python.
- Define your domains. List the distinct categories of requests your voice agent needs to handle. These become your specialist agents.
- Build the triage agent. Create an
Agentwith@function_toolmethods for each specialist. Write clear, specific tool descriptions. - Implement specialists. Each specialist is its own
Agentclass with domain-specific instructions, tools, and optionally different LLM/TTS providers. - Add human escalation. Give every specialist a
@function_toolfor escalating to a human viaWarmTransferTask. - Test in the LiveKit Agent Playground before deploying to production telephony. When you're ready to go live, deploy to LiveKit Cloud with one click.
LiveKit provides working reference implementations for each of these steps:
- Front desk booking agent: calendar booking with tasks, tools, and evaluations
- Warm transfer: full AI-to-human escalation with consultation rooms
- Medical office triage: healthcare-specific multi-agent coordination
Key Takeaways
- The handoff pattern replaces rigid IVR menus with intelligent, LLM-powered routing that understands natural language
- It supports both agent-to-agent (routing between AI specialists) and agent-to-human (escalation) transfers
- Context preservation is the single most important implementation detail; callers should never repeat themselves
- LiveKit's agent framework supports handoffs natively via
@function_toolreturns,WarmTransferTask, and SIP-based telephony - Start simple: define your domains, build a triage agent, and expand to more specialists as needed
The LiveKit Agents quickstart is the fastest way to get a multi-agent handoff system running. Give it a try and let us know what you're building.