LiveKit Inference
Build voice agents with the leading AI models on the market. With LiveKit Inference, you can iterate quickly and swap models with a single line of code.

Fast and reliable access to the best voice AI models
Deploy production-grade voice agents with the best-performing STT, LLM, and TTS models in the market.
Fast iterations
Swap models and voices with a single string update in your agent code. No installs or account setup required.
1from livekit.agents import AgentSession, inference23session = AgentSession(4stt=inference.STT(5model="deepgram/flux-general",6language="en"7),8llm=inference.LLM(9model="openai/gpt-4.1-mini",10),11tts=inference.TTS(12model="cartesia/sonic-3",13voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",14),15)
Performance at scale
Reduce end-to-end latency with global co-location of agents and models, dynamic routing, and provisioned LLM capacity.

Detailed observability
View turn-by-turn latency statistics and traces for inference requests in LiveKit Cloud to optimize agent performance.

Concurrency simplified
Manage your concurrency limits for all your voice AI models in one place.

Build, run, and observe
agents with LiveKit Cloud
Our end-to-end platform powers enterprise-grade voice AI
for customer support at global scale.
FAQs
How do I access LiveKit Inference?
What voice AI models are available on LiveKit Inference?
What are the concurrency limits on LiveKit Inference?
How much does it cost to access AI models with LiveKit Inference?
How do I decide between using LiveKit Inference and LiveKit Agents plugins?
Is LiveKit Inference only available for agents deployed to LiveKit Cloud?
Ready to build?
Start building a voice AI agent with a free account. Reach out to us if you're interested in custom pricing.
No credit card required • 1,000 free agent session minutes monthly
