Skip to main content

Bring your LangChain Agents to LiveKit

Most voice integrations force you to redesign your agent around a new runtime. LiveKit takes the opposite approach: it adapts your existing LangChain graph into a real-time system without changing your logic. With the LangChain plugin, you can take your existing graph-based agent and connect it to a real-time voice pipeline, complete with speech-to-text, text-to-speech, and the infrastructure to deploy it at scale. You don't need to rewrite your agent logic. You just need to wrap it.

This post walks through what's possible, what the integration looks like for different types of LangChain implementations, and where you'll need to be thoughtful about the transition from text to voice.

The Approach

The fastest path from an existing LangChain agent to a working voice agent is shorter than you might expect:

  1. Install livekit-agents with the LangChain plugin.
  2. Pass your compiled graph to langchain.LLMAdapter(graph=your_graph).
  3. Wire up an AgentSession with STT, TTS, and VAD providers.
  4. Deploy and connect.

The LangChain examples on the recipes page provide runnable implementations for each pattern covered in this post. For the full API reference and configuration options, see the LangChain integration guide.

Why LiveKit?

If you're a LangChain developer, you already have agent logic, tools, and workflows that work. What you probably don't have is a production-grade way to connect that agent to a live audio stream. That's the gap LiveKit fills.

LiveKit's Agents framework provides a complete voice AI pipeline: voice activity detection (VAD) to know when the user is speaking, speech-to-text (STT) to transcribe their words, your LLM to generate a response, and text-to-speech (TTS) to speak it back, all orchestrated in real-time over WebRTC. It handles the hard parts of real-time communication: low-latency audio transport, turn detection, interruption handling, and session management.

The LangChain plugin (livekit-plugins-langchain) sits at the LLM layer of this pipeline. It adapts your LangGraph workflow to LiveKit's LLM interface via the LLMAdapter, which means the rest of your agent (your tools, your graph structure, your prompts) stays exactly as it is. LiveKit handles everything around it: the audio in, the audio out, and the real-time transport layer connecting your agent to users on web, mobile, or telephony.

Loading diagram…

For teams that have already invested in LangChain, this is a practical path to voice AI without starting from scratch.

How the Integration Works

The LLMAdapter is the bridge between your LangChain agent and LiveKit. It implements LiveKit's llm.LLM interface by wrapping a LangGraph graph, converting LiveKit's chat context into LangChain messages and streaming the graph's token output back through the voice pipeline.

Under the hood, the adapter:

  • Converts LiveKit's chat context to LangChain's message format (SystemMessage, HumanMessage, AIMessage)
  • Passes the messages as state to your graph via {"messages": messages}
  • Streams the response using LangGraph's stream_mode="messages" for minimal time-to-first-token

This means your graph needs to satisfy two requirements: it must accept state with a messages key, and it must support streaming in "messages" mode. If you're using any of the standard LangChain agent patterns, you almost certainly already meet both.

Mapping Your Implementation to LiveKit

Not all LangChain implementations are the same. Here's how the plugin maps to each type.

LangGraph Workflows

If you've built a custom StateGraph with a messages field in your state, you're already in the ideal position. Compile your graph and pass it directly to LLMAdapter:

1
from livekit.plugins import langchain
2
3
workflow = StateGraph(AgentState)
4
# ... add your nodes and edges ...
5
graph = workflow.compile()
6
7
session = AgentSession(
8
llm=langchain.LLMAdapter(graph=graph),
9
# ... stt, tts, vad, etc.
10
)

Your graph logic (nodes, conditional edges, tool calls) runs exactly as before. LiveKit simply drives it with voice input instead of text. See the LangGraph workflow example for a complete, runnable version.

LangChain Agents

Agents built with create_agent return a CompiledStateGraph with message-based state. They work with the LLMAdapter directly, no modification needed:

1
from langchain.agents import create_agent
2
3
agent_graph = create_agent(
4
model=ChatOpenAI(model="gpt-4.1-mini"),
5
tools=[get_weather, search_docs],
6
)
7
8
session = AgentSession(
9
llm=langchain.LLMAdapter(graph=agent_graph),
10
# ... stt, tts, vad, etc.
11
)

If your agent uses multiple tools, middleware like summarization, or complex system prompts, all of that carries over. The LLMAdapter sees the same compiled graph regardless of how many tools or features you've configured. See the LangChain agent example for a complete implementation.

LangChain Deep Agents

Deep agents, with their built-in planning, subagents, and long-term memory, also return a CompiledStateGraph. The pattern is identical:

1
from deepagents import create_deep_agent
2
3
deep_graph = create_deep_agent(
4
model=ChatOpenAI(model="gpt-4.1-mini"),
5
tools=[get_weather],
6
subagents=[{"name": "researcher", ...}],
7
)
8
9
session = AgentSession(
10
llm=langchain.LLMAdapter(graph=deep_graph),
11
# ... stt, tts, vad, etc.
12
)

Subagent delegation, todo planning, and other deep agent features all run inside the graph. LiveKit only sees the outer graph's message stream, so the complexity is fully encapsulated. See the deep agent example for a complete implementation.

Customizing LLM output with the LLM node

When a graph-based agent calls tools, the streaming output includes both AI messages and tool result messages. By default, the LLMAdapter passes all of these through, which means tool results could be spoken aloud. This applies to any agent that uses tools, whether it's a standard create_agent with tool calling or a deep agent that uses internal tools like write_todos or task for subagent delegation.

To control what gets spoken, you can override the llm_node on your AgentSession. The LLM node gives you fine-grained control over the LLM output before it reaches the TTS stage. For example, you can filter the stream to yield only AIMessageChunk content, preventing tool results from being read aloud. Both the LangChain agent example and the deep agent example demonstrate this pattern.

What Isn't Supported

The LLMAdapter requires a Pregel-compatible graph (the execution model used by LangGraph), specifically a compiled StateGraph or anything implementing the PregelProtocol. This covers all graph-based LangChain patterns, but it does leave some implementations out.

Plain LCEL chains like a prompt | llm or prompt | llm | output_parser pipeline are Runnable objects, not Pregel graphs. They don't implement astream(state, config, stream_mode="messages") or accept message-based state. The adapter can't use them directly.

Bare chat models like ChatOpenAI or similar without any graph wrapping face the same issue. They're Runnables, not graphs.

Graphs without messages in state. If you've built a custom StateGraph whose state schema uses keys like {"query": str, "result": str} but doesn't include messages, the adapter won't be able to drive it. It always sends {"messages": ...} as input.

For all three cases, the workaround is the same: wrap your existing logic in a minimal StateGraph with a messages key in the state and a single node that invokes your chain or model. It's a small amount of boilerplate, but it's required for the adapter to function.

The Latency Question

This is the most important consideration when moving from text to voice, and it deserves honest treatment.

The LLMAdapter uses LangGraph's streaming mode to minimize time-to-first-token. For straightforward agents with a few tools responding to direct questions, this works well. The user asks something, the agent streams a response, and TTS begins speaking as tokens arrive.

But many LangChain workflows weren't designed with voice latency in mind. A multi-step research agent that calls three APIs sequentially, a deep agent that plans with write_todos before responding, or a graph with several conditional branches that each invoke the LLM can all introduce seconds of silence before the user hears anything. In a text interface, a three-second wait is unremarkable. In a voice conversation, it feels broken.

There are several strategies to manage this:

Verbal status updates. If a tool call or operation takes more than a few hundred milliseconds, you can use LiveKit's generate_reply to speak a brief status update while the operation continues in the background. Something like "Let me look that up for you" buys time without feeling like dead air.

Background audio. LiveKit supports "thinking" sounds, ambient audio that plays automatically while tool calls are in progress. This provides a natural auditory cue that the agent is working.

Frontend feedback. If your application has a UI, you can use LiveKit's RPC mechanism to push status updates to the frontend during long-running operations, even allowing the user to cancel if needed.

Design for voice from the start. The most effective approach is to review your agent's workflow with voice latency in mind. Can you reduce sequential tool calls? Can you pre-fetch data before the conversation starts? Can you use LiveKit's on_user_turn_completed node to perform RAG lookups in parallel with the LLM call rather than as a separate tool call round-trip?

The LiveKit user feedback documentation covers these patterns in detail.

Other Considerations

System prompts for voice. Your existing system prompts were likely written for text output. Voice responses need to be concise, free of markdown formatting, and natural when spoken aloud. Avoid bullet points, asterisks, and emoji in your prompts, or explicitly instruct the model not to use them. A response that reads well on screen can sound awkward when spoken.