Skip to main content

Detect voicemail and IVR with outbound phone agents

Outbound phone calls don't always reach a human. Voicemail boxes and IVR menus come up far more often than most teams expect, and some lines never stop ringing. An agent only has a few seconds of the call to gather enough context on how to interact.

Today we're bringing Answering Machine Detection (AMD) to LiveKit Agents — available in the core framework with no plugins required. AMD reliably distinguishes humans from machines and classifies every outbound call as human, voicemail, IVR, or unavailable. From there, you decide what happens next: leave a message, navigate a menu, or hang up.

See it in action

Why AMD matters

Outbound agents need a reliable verdict on what they're talking to, and they need it within the first second of the call. AMD makes that signal explicit so the rest of your stack can act on it:

  • Unambiguous outcomes: Did the call actually reach the person? Without AMD, voicemail pickups are indistinguishable from real conversations in your post-call data, leading to duplicate outreach to people you already reached, or missed follow-ups to people you never did.
  • Timing the next step: Detecting a machine isn't the same as knowing when to act on it. AMD surfaces the right moment to leave a voicemail or start navigating an IVR, after the beep or once the prompt finishes, instead of talking over it.
  • Preserve context: Your agent prompt shouldn't be stuffed with detection logic. Handling this outside the main loop lets your agents and developers focus on the conversation itself.
  • Reduce cost: When an agent doesn't recognize an IVR, it burns tokens and minutes instead of progressing through the menu. AMD lets agents use DTMF to select a branch, leave a pre-recorded voicemail, or hang up early.

Inside Agents AMD

AMD runs detection outside the agent's main loop: a short-circuiting rule sits in front of an LLM classification step, so easy cases stay fast and only the ambiguous ones pay the cost of a model call.

AMD workflow diagram

  • Transcript available: Whenever your STT pipeline produces a transcript, AMD routes it through the LLM. The transcript is the strongest signal available, and discarding it would hurt accuracy.
  • Short utterance, no transcript: For a one-word "hello?", a beep, or silence the STT couldn't resolve, AMD errs on the side of treating the call as human and hands control back to the agent immediately. Misclassifying a human as a machine is the worst-case failure mode, and this keeps time-to-first-response fast: when the caller says "hello?", the agent replies right away.

AMD also exposes a few hooks so it slots cleanly into your existing call flow:

  • Auto-IVR navigation lets outbound agents start navigating phone trees the moment an IVR is detected, so the agent doesn't sit idle while menus play. Available in Python today, with JS framework support coming soon.
  • Interruption protection ensures voice agents don't continue speaking after a machine is detected, and prevents them from talking over a voicemail beep.

Every setting is tunable, so AMD can be as conservative or as aggressive as your use case requires, from high-volume cold outreach to careful, customer-facing callbacks.

Benchmarks

We pressure-tested AMD against an internal dataset of voicemails, IVR prompts, and live human pick-ups spanning a range of carriers, languages, and accents. Every LLM and STT combination was scored on both accuracy (F1) and time-to-decision.

The best-performing pairing is google/gemini-3.1-flash-lite with cartesia/ink-whisper, which are our default settings for AMD when LiveKit Cloud is used:

ClassF1
Human95.7%
IVR98.2%
Voicemail97.3%
Macro F197.0%
Micro F1 (accuracy)94.7%

Macro F1 averages each class equally; micro F1 weights by frequency, so it doubles as overall accuracy.

Median (P50) Time to Detection is 840 milliseconds, measured from the start of the session to the moment a verdict is ready. With preemptive generation enabled, the agent's first reply is generated in parallel, so it can speak the moment AMD confirms a human.

Other combinations such as openai/gpt-4.1 and assemblyai/universal-streaming-multilingual also perform well. If you prefer a different LLM or STT, the documentation lists every evaluated model and how to swap them in with LiveKit Inference, which offers an easy access to all of them without additional setup. Or you can use plugins to connect to a wide range of providers.

Try it today

Answering Machine Detection is available in livekit-agents 1.5.9 for Python and 1.4.2 for Node.js. Upgrade your agent to the latest version to get started:

Python

1
from livekit import api
2
from livekit.agents import AMD
3
4
await session.start(...)
5
6
async with AMD(session, participant_identity=participant_identity) as detector:
7
# create and wait for the SIP participant
8
await ctx.api.sip.create_sip_participant(...)
9
participant = await ctx.wait_for_participant(identity=participant_identity)
10
11
result = await detector.execute()
12
13
# custom logic for each category
14
if result.category == "human" or result.category == "uncertain":
15
# let the agent handle the conversation as usual
16
elif result.category == "machine-ivr":
17
# IVR navigation runs automatically when `ivr_detection=True` (default)
18
elif result.category == "machine-vm":
19
# use say() or generate_reply() to leave a message
20
speech_handle = session.generate_reply(
21
instructions=(
22
"You've reached voicemail. Leave a brief message."
23
),
24
)
25
await speech_handle.wait_for_playout()
26
elif result.category == "machine-unavailable":
27
# use ctx.shutdown() to end the call
28
...

Node.js

1
import { voice } from '@livekit/agents';
2
3
await session.start({...});
4
5
const detector = new voice.AMD(session, { participantIdentity });
6
7
try {
8
// create and wait for the SIP participant
9
await sip.createSipParticipant(...);
10
const participant = await ctx.waitForParticipant(participantIdentity);
11
12
const result = await detector.execute();
13
14
// custom logic for each category
15
if (
16
result.category === voice.AMDCategory.HUMAN ||
17
result.category === voice.AMDCategory.UNCERTAIN ||
18
result.category === voice.AMDCategory.MACHINE_IVR
19
) {
20
// let the agent handle the conversation (auto-IVR navigation isn't available in Node.js yet,
21
// so MACHINE_IVR falls back to the regular agent loop)
22
} else if (result.category === voice.AMDCategory.MACHINE_VM) {
23
// use say() or generateReply() to leave a message
24
const speechHandle = session.generateReply({
25
instructions:
26
"You've reached voicemail. Leave a brief message.",
27
});
28
await speechHandle.waitForPlayout();
29
} else if (result.category === voice.AMDCategory.MACHINE_UNAVAILABLE) {
30
// use session.shutdown() to end the call
31
}
32
} finally {
33
await detector.aclose();
34
}

Read the documentation for the full reference, including how to configure the classifier, pick models, and hook into each AMD outcome.

Each prediction also surfaces in Agent Console as a new ANSWERING MACHINE DETECTION event, so you can debug AMD alongside the rest of your session. Try it on your outbound agents and let us know how they do in our community.