Skip to main content

The ReAct Pattern for Voice Agents and How AI Agents Think, Act, and Respond

What Is the ReAct Pattern?

Every time a voice agent checks your calendar, looks up an order, or books an appointment, it's running a loop that most developers never see: Think → Act → Observe → Repeat.

Loading diagram…

This is the ReAct pattern (short for Reasoning and Acting), and it's the foundational design pattern behind modern AI agents. First introduced by Yao et al. (2022; published at ICLR 2023), ReAct describes how a large language model (LLM) alternates between explicit reasoning steps and tool-using actions in an iterative loop until it arrives at a final answer.

Instead of generating a response in one shot, a ReAct agent:

  1. Thinks about what it knows and what's missing
  2. Acts by calling an external tool (an API, database, or service)
  3. Observes the result
  4. Thinks again with the new information
  5. Repeats until it has enough context to respond

For voice agents, this matters because it's the pattern that determines how your agent decides to call tools, look up data, and respond, all in real time, while a human is waiting on the other end of the line.


Why Developers Should Care About ReAct

If you've ever built a voice agent that calls a function (checking availability, looking up a customer record, placing an order), you've already implemented ReAct, whether you knew it or not.

Understanding the pattern explicitly helps you:

  • Design better tool schemas that minimize unnecessary reasoning loops
  • Debug failures by tracing exactly where the agent's logic broke down
  • Optimize latency by reducing the number of think→act cycles
  • Scale capabilities by adding tools without breaking reliability

ReAct is also the reasoning engine inside every other agent pattern. The Supervisor pattern uses ReAct for routing decisions. The Handoff pattern uses ReAct to determine when to transfer. Understanding ReAct is a prerequisite to understanding the rest.


How the ReAct Loop Works in a Voice Agent

Let's walk through a concrete example. A user calls in and says:

"What time is my dentist appointment tomorrow?"

Here's the ReAct loop in action:

StepPhaseWhat Happens
1🧠 Think"The user wants appointment info. I need to check their calendar. I'll call the calendar lookup tool."
2ActCalls lookup_calendar(user_id="123", date="2026-03-10")
3👁️ ObserveTool returns {appointment: "Dr. Smith, 2:30 PM"}
4🧠 Think"I have the info. I can respond now."
5💬 Respond"Your dentist appointment with Dr. Smith is tomorrow at 2:30 PM."

For simple lookups, the loop runs once. For complex, multi-step requests (like booking an appointment that requires checking availability, confirming a time, and sending a confirmation), the loop may iterate several times before the agent has enough context to give a final answer.

The key insight is that the LLM never executes tools directly. It proposes tool calls, and the surrounding system executes them and feeds results back. This separation is what makes ReAct safe, loggable, and controllable.


ReAct vs. Chain of Thought vs. Function Calling

These three concepts are frequently confused. Here's how they differ:

ApproachHow It WorksStrengthsLimitations
Chain of Thought (CoT)LLM reasons step-by-step but never takes actions. Purely internal reasoning.Improves accuracy on reasoning-heavy tasksCan't access live data. Prone to hallucination when facts are needed.
Function CallingLLM outputs structured tool calls (JSON) without explicit reasoning textReliable, fast, structured outputReasoning is implicit, making it harder to debug and prone to losing track in long sequences
ReActCombines CoT reasoning with tool actions in an interleaved loopVisible reasoning, grounded in real data, auditableMore tokens consumed, potential latency from reasoning steps

In practice, modern voice agents blend these approaches. Most frameworks use function calling mechanics (structured JSON tool calls) with ReAct-style reasoning traces for observability. You get the reliability of function calling with the debuggability of explicit reasoning.


When to Use the ReAct Pattern

ReAct is the right choice when:

  • The solution path isn't known in advance. The agent needs to figure out what to do, not just do a fixed sequence.
  • Your agent calls tools. Appointment booking, order lookups, CRM queries, knowledge base searches. Any time an agent needs external data to answer.
  • You need an audit trail. Every decision the agent makes is visible. You can trace what it thought, what it tried, and what it observed. When things go wrong, you can trace exactly where logic broke down.
  • You're prototyping. ReAct is the simplest pattern to get a tool-using agent working.

When to consider alternatives

  • Simple, deterministic flows where steps are always the same → use the Sequential Pipeline instead
  • Multi-domain routing where the main job is directing to specialists → use the Handoff pattern
  • Too many tools (20+) → the LLM may pick the wrong tool or get lost. Consider a Supervisor to route to specialized sub-agents, each with a focused toolset.
  • High-volume, simple queries like FAQ lookups or status checks → these don't need iterative reasoning

The Latency Challenge: ReAct in Real-Time Voice

Here's the hard part. Every think→act→observe cycle adds latency. In text chat, a user can wait. In voice, every extra cycle is dead air.

A single ReAct loop iteration can add 500ms or more if the LLM generates verbose reasoning before acting. Multiply that by 2-3 iterations for a complex request, and you're looking at seconds of silence.

How to minimize latency in voice ReAct agents

1. Design tools that return complete, ready-to-speak answers

Don't return raw database rows that require further LLM processing. Return pre-formatted results like "3 slots available at 1 PM, 2:30 PM, and 4 PM" instead of [{time: "13:00"}, {time: "14:30"}, {time: "16:00"}].

2. Pre-classify intent to skip unnecessary reasoning

For common use cases (billing, support, booking), a lightweight classifier can bypass the full reasoning loop. The agent goes straight to the right tool instead of "thinking" about which tool to use.

3. Use filler speech during tool calls

While the agent is waiting for an API response, have it say "Let me check that for you" or "One moment while I look that up." This eliminates dead air and makes the interaction feel natural.

4. Stream partial responses

Start TTS on the first sentence of the response while the LLM is still generating the rest. Don't wait for the full answer.

5. Limit tool set size

ReAct agents struggle with large tool sets. Keep each agent's toolbox focused (max 5-10 tools). If you need more capabilities, use a Supervisor pattern to route to specialized sub-agents.


Building a ReAct Voice Agent with LiveKit

LiveKit's Agents framework implements the ReAct pattern natively through the @function_tool decorator. When you give an agent tools, LiveKit handles the full think→act→observe loop automatically. The LLM decides whether to call a tool, LiveKit executes it, and the return value is fed back to the LLM for continued reasoning.

Here's a practical example, an appointment booking agent that demonstrates multi-step ReAct reasoning:

1
from livekit.agents import Agent, function_tool, RunContext
2
3
class AppointmentAgent(Agent):
4
def __init__(self):
5
super().__init__(
6
instructions="""You help users book appointments.
7
Use the available tools to check availability and book slots.
8
Always confirm details with the user before booking."""
9
)
10
11
@function_tool()
12
async def check_availability(self, context: RunContext, date: str):
13
"""Check available appointment slots for a given date.
14
Args:
15
date: The date to check in YYYY-MM-DD format.
16
"""
17
raw_slots = await calendar_api.get_slots(date)
18
# Pre-format as speech-ready strings so the LLM can read them directly
19
formatted = [s["time"].strftime("%I:%M %p") for s in raw_slots]
20
return {"available_slots": formatted}
21
22
@function_tool()
23
async def book_appointment(self, context: RunContext, date: str, time: str):
24
"""Book an appointment at the specified date and time.
25
Args:
26
date: The date in YYYY-MM-DD format.
27
time: The time in HH:MM format.
28
"""
29
confirmation = await calendar_api.book(date, time)
30
return {"confirmation_id": confirmation.id}

What happens at runtime

  1. User says: "Can I get an appointment tomorrow afternoon?"
  2. 🧠 Think: LLM reasons it needs to check availability
  3. Act: Calls check_availability(date="2026-03-10")
  4. 👁️ Observe: Returns {"available_slots": ["1:00 PM", "2:30 PM", "4:00 PM"]}
  5. 💬 Speak: "I have three slots available: 1 PM, 2:30 PM, and 4 PM. Which works best?"
  6. User says: "2:30 works"
  7. 🧠 Think: LLM reasons it should book the confirmed time
  8. Act: Calls book_appointment(date="2026-03-10", time="14:30")
  9. 👁️ Observe: Returns confirmation
  10. 💬 Speak: "You're all set! Appointment confirmed for tomorrow at 2:30 PM."

This is two full ReAct loops (one to gather data, one to take action), orchestrated automatically by LiveKit's agent framework.


LiveKit Features That Supercharge ReAct

LiveKit's framework provides several capabilities specifically designed to make ReAct work well in real-time voice:

Eliminate dead air with session.say()

Tools can generate speech during execution. While waiting for an async API call, the agent can say "Let me check that for you", eliminating the silence that makes ReAct feel slow:

1
@function_tool()
2
async def lookup_order(self, context: RunContext, order_id: str):
3
"""Look up an order by ID."""
4
context.session.say("Let me pull up that order for you.")
5
result = await orders_api.get(order_id)
6
return result

Prevent interruptions during critical actions

In voice, users sometimes speak while the agent is mid-action. context.disallow_interruptions() prevents barge-in during critical tool execution, such as when writing to a database:

1
@function_tool()
2
async def process_payment(self, context: RunContext, amount: float):
3
"""Process a payment."""
4
context.disallow_interruptions()
5
result = await payment_api.charge(amount)
6
return result

Graceful error handling with ToolError

When a tool call fails, ToolError returns a structured error to the LLM. The agent then reasons about the failure and tries a different approach. The observe→think loop handles recovery gracefully instead of crashing:

1
from livekit.agents.llm import ToolError
2
3
@function_tool()
4
async def check_balance(self, context: RunContext, account_id: str):
5
"""Check account balance."""
6
try:
7
return await accounts_api.balance(account_id)
8
except AccountNotFound:
9
raise ToolError("Account not found. Ask the user to verify their account number.")

Dynamic tool management

agent.update_tools() lets you add or remove tools at runtime. Start with a small toolset for fast routing, then expand capabilities as the conversation evolves, keeping the ReAct loop tight when it matters most.

MCP support for extensible toolboxes

LiveKit agents can load tools from external MCP (Model Context Protocol) servers, extending the ReAct toolbox without code changes. Connect to any MCP-compatible service, and the tools appear automatically in the agent's reasoning loop.


Common ReAct Failure Modes (and How to Fix Them)

ReAct agents fail in predictable ways. Knowing the patterns helps you build more reliable voice agents.

1. Wrong tool selection

Symptom. The agent calls the billing API when the user asks about shipping.

Fix. Write descriptive, unambiguous tool descriptions. The LLM chooses tools based on the docstring. Vague descriptions lead to wrong choices. Be specific about when each tool should be used.

2. Infinite reasoning loops

Symptom. The agent keeps calling tools without ever producing a final answer.

Fix. Set a maximum iteration limit (3-5 tool calls are typical for voice). If the agent can't resolve the query in that budget, have it gracefully escalate: "I wasn't able to find that information. Let me connect you with someone who can help."

3. Tool overload

Symptom. Agent accuracy drops as you add more tools (especially beyond 15-20).

Fix. Decompose into specialized sub-agents using the Supervisor pattern. Each sub-agent gets a focused toolset of 5-10 tools, and the supervisor routes to the right one. As one AI engineering practitioner noted: "ReAct shines when the action space is small and the feedback from tools is crisp. Once the tool list grows, the reasoning loop degrades fast."

4. Verbose reasoning causing latency

Symptom. The agent "thinks" for 2+ seconds before acting, even on simple requests.

Fix. Use a fast, lightweight model (like GPT-4o-mini) for initial triage and tool routing, then switch to a larger model only for complex reasoning. LiveKit's Agent class supports per-agent model overrides. All models are accessed through LiveKit Inference, a unified API for STT, LLM, and TTS with no separate account setup required.

5. Hallucinated tool arguments

Symptom. The agent invents parameter values instead of asking the user.

Fix. Make required parameters explicit in tool schemas and add validation. If a required field is missing, return a ToolError prompting the agent to ask the user.


ReAct as the Foundation for Advanced Patterns

ReAct is the reasoning engine inside every other pattern:

  • Supervisor / Coordinator, where the supervisor agent uses ReAct to reason about which specialist to delegate to
  • Handoff / Routing, where the triage agent uses ReAct to determine when and where to transfer the call
  • Human-in-the-Loop, where the agent uses ReAct to assess risk and decide when to escalate to a human
  • Sequential Pipeline, where each stage can use ReAct internally for its specialized processing

Understanding ReAct gives you a mental model for debugging any agent behavior, because at the lowest level, every agent decision follows the same think→act→observe loop.


Key Takeaways

  • ReAct (Think → Act → Observe → Repeat) is the reasoning engine inside every tool-calling agent. Understanding it helps you design better tools, debug failures, and optimize latency
  • The pattern separates agents that talk from agents that solve problems. It's how your agent decides to check a calendar, look up an order, or book an appointment
  • Latency is the main tradeoff in voice. Minimize it with complete tool responses, intent pre-classification, filler speech, and focused toolsets (5–10 tools max)
  • ReAct is the foundation for every other pattern. The Supervisor uses it for routing, the Handoff uses it to decide when to transfer, HITL uses it to assess risk
  • LiveKit's @function_tool decorator implements the full loop automatically, with built-in support for filler speech, error recovery, and dynamic tool management

Getting Started

Here's the fastest path to a ReAct-powered voice agent:

  1. Start with the LiveKit Agents quickstart to get a working voice agent in minutes
  2. Add your first tool by decorating a method with @function_tool() and watch the ReAct loop come alive
  3. Test in the Agent Playground to interact with your agent without any frontend setup, or prototype without code using Agent Builder
  4. Iterate on tool schemas. The #1 lever for ReAct quality is well-written tool descriptions
  5. Monitor the reasoning loop using Agent Observability to trace every think→act→observe cycle and identify optimization opportunities

The ReAct pattern is the foundation. Once you understand how your agent thinks, you can make it think faster, smarter, and more reliably. That's what separates a demo from a production voice agent.

Give it a try and let us know what you're building.