How to build an agent with speech-to-text input and text-only output
Configure a LiveKit Agent to accept audio input via speech-to-text while responding only with text, and learn how to receive the text responses on your frontend.
Last Updated:
You can configure a LiveKit Agent to accept audio input (speech-to-text) while responding only with text—no TTS audio output. This is useful for chat-style interfaces where you want voice input but text-based responses.
Configuration
To set this up, disable audio output when starting your agent session while keeping audio input enabled:
Python
1from livekit.agents.voice import room_io23await session.start(4agent=MyAgent(),5room=ctx.room,6room_options=room_io.RoomOptions(7audio_output=False, # Disable TTS audio output8# audio_input remains True by default9),10)
Node.js
1await session.start({2agent: new MyAgent(),3room: ctx.room,4outputOptions: {5audioEnabled: false, // Disable TTS audio output6},7// inputOptions.audioEnabled remains true by default8});
When audio output is disabled:
- The agent will not publish an audio track to the room
- Text responses are published to the
lk.transcriptiontext stream topic - Responses are sent without the
lk.transcribed_track_idattribute (since there's no audio track to associate with) - Text is sent without speech synchronization
Receiving Agent Responses
To receive the agent's text responses, you need to listen to the lk.transcription text stream topic on your frontend client. The built-in playground UI uses legacy transcription events and won't display responses when audio track publishing is disabled.
JavaScript
1room.registerTextStreamHandler('lk.transcription', async (reader, participantInfo) => {2const message = await reader.readAll();34// Check if this is a transcription (has track ID) or a text-only response5const isTranscription = reader.info.attributes['lk.transcribed_track_id'] != null;67if (isTranscription) {8console.log(`Transcription from ${participantInfo.identity}: ${message}`);9} else {10// This is a text-only agent response (no audio)11console.log(`Agent response from ${participantInfo.identity}: ${message}`);12}13});
Swift
1try await room.registerTextStreamHandler(for: "lk.transcription") { reader, participantIdentity in2let message = try await reader.readAll()34if let _ = reader.info.attributes["lk.transcribed_track_id"] {5print("Transcription from \(participantIdentity): \(message)")6} else {7// Text-only agent response8print("Agent response from \(participantIdentity): \(message)")9}10}
React
For React applications, use the useTranscriptions hook from @livekit/components-react:
1import { useTranscriptions } from '@livekit/components-react';23function ChatDisplay() {4const transcriptions = useTranscriptions();56return (7<div>8{transcriptions.map((segment) => (9<div key={segment.id}>10<strong>{segment.participant?.identity}:</strong> {segment.text}11</div>12))}13</div>14);15}
Important Notes
-
Console playground limitation: If you're using the console playground and don't see agent responses to audio input, this is expected behavior. You must implement a custom receiver to listen to the
lk.transcriptiontext stream topic. -
Distinguishing response types: When audio output is disabled, agent responses won't have a
lk.transcribed_track_idattribute. You can use this to differentiate between transcriptions of audio tracks and text-only responses. -
Hybrid mode: If you need to dynamically toggle audio on and off, use
session.output.set_audio_enabled()instead of disabling it inRoomOptions. See the text and transcriptions guide for more details.
Example Implementations
For complete working examples, check out:
- Transcriber agent: An agent that performs STT without TTS or LLM
- Text streams documentation: Full guide on receiving and sending text streams
Additional Resources
- Text and transcriptions guide - Complete documentation on text input/output in agents
- Voice agents examples repository - More agent examples
Read related documentation
- LiveKit Agents overview - Get started with voice AI agents
- Speech-to-text plugins - Configure STT providers
- Text streams - Send and receive text data
Find more Agents guides
- Troubleshooting STT not picking up utterances - Diagnose speech detection issues
- How to detect when an agent has finished speaking - Track playback completion events