LiveKit Inference
Build voice agents with the leading AI models on the market. With LiveKit Inference, you can iterate quickly and swap models with a single line of code.

Fast and reliable access to the best voice AI models
Deploy production-grade voice agents with the best-performing STT, LLM, and TTS models in the market.
Performance at scale
Reduce end-to-end latency with global co-location of agents and models, dynamic routing, and provisioned LLM capacity.

No Model Lock-In
Swap in the latest voice AI models from any of the leading providers with one account, one bill, and no new vendor contracts.
1from livekit.agents import AgentSession, inference23session = AgentSession(4stt=inference.STT(5model="deepgram/flux-general",6language="en"7),8llm=inference.LLM(9model="openai/gpt-5.3-chat-latest",10),11tts=inference.TTS(12model="cartesia/sonic-3",13voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",14),15)
Concurrency simplified
Manage your concurrency limits for all your voice AI models in one place.

Detailed observability
View turn-by-turn latency statistics and traces for inference requests in LiveKit Cloud to optimize agent performance.

Build, run, and observe
agents with LiveKit Cloud
Our end-to-end platform powers enterprise-grade voice AI
for customer support at global scale.
FAQs
How do I access LiveKit Inference?
What voice AI models are available on LiveKit Inference?
What are the concurrency limits on LiveKit Inference?
How much does it cost to access AI models with LiveKit Inference?
How do I decide between using LiveKit Inference and LiveKit Agents plugins?
Is LiveKit Inference only available for agents deployed to LiveKit Cloud?
Ready to build?
Start building a voice AI agent with a free account. Reach out to us if you're interested in custom pricing.
No credit card required • 1,000 free agent session minutes monthly
