Skip to main content
close
Product

LiveKit Inference

Build voice agents with the leading AI models on the market. With LiveKit Inference, you can iterate quickly and swap models with a single line of code.

Illustration of isometric grid with LiveKit Inference elements
300,000+ developers
Billions of calls annually
300+ AI model integrations

Fast and reliable access to the best voice AI models

Deploy production-grade voice agents with the best-performing STT, LLM, and TTS models in the market.

Fast iterations

Swap models and voices with a single string update in your agent code. No installs or account setup required.

1
from livekit.agents import AgentSession, inference
2
3
session = AgentSession(
4
stt=inference.STT(
5
model="deepgram/flux-general",
6
language="en"
7
),
8
llm=inference.LLM(
9
model="openai/gpt-4.1-mini",
10
),
11
tts=inference.TTS(
12
model="cartesia/sonic-3",
13
voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
14
),
15
)

Performance at scale

Reduce end-to-end latency with global co-location of agents and models, dynamic routing, and provisioned LLM capacity.

A graph showing the end-to-end latency of an agent session.

Detailed observability

View turn-by-turn latency statistics and traces for inference requests in LiveKit Cloud to optimize agent performance.

A screenshot of the LiveKit Cloud dashboard showing detailed observability logs.

Concurrency simplified

Manage your concurrency limits for all your voice AI models in one place.

Screenshot of the LiveKit Cloud inference usage UI.
The LiveKit Platform

Build, run, and observe
agents with LiveKit Cloud

Our end-to-end platform powers enterprise-grade voice AI
for customer support at global scale.

Open AI
Salesforce
Deutsche Telekom
Zocdoc
Coursera
xAI
Headspace
Oracle
Assort Health
Spotify
Explore the full platform

FAQs

How do I access LiveKit Inference?
Any agent using LiveKit Cloud for transport has access to LiveKit Inference. Use the inference module in the Agents SDK to select which AI models your voice agent should use. To learn more, visit our docs.
What voice AI models are available on LiveKit Inference?
LiveKit Inference supports over 50 different voice AI models, including the latest models from OpenAI, Gemini, Cartesia, Deepgram, and ElevenLabs. For the complete list, see our pricing page.
What are the concurrency limits on LiveKit Inference?
Each LiveKit Cloud plan includes a monthly quota of LiveKit Inference credits (billed based on model prices) and a limit on the number of concurrent requests. To learn more, visit our pricing page and our docs.
How much does it cost to access AI models with LiveKit Inference?
LiveKit Inference credits are billed by model type: LLM by tokens, STT by duration, and TTS by characters. Discounted rates are available for most STT and TTS models on the Scale plan. Preferred rates for high-volume usage are available with an Enterprise contract. To learn more, visit our pricing page.
How do I decide between using LiveKit Inference and LiveKit Agents plugins?
For fast and reliable access to the most popular AI models, use LiveKit Inference. If you want to bring your own model or use a different provider, use the LiveKit Agents plugins. To learn more about visit our docs, visit our docs.
Is LiveKit Inference only available for agents deployed to LiveKit Cloud?
LiveKit Inference is available to any agent using LiveKit Cloud for transport, including self-hosted agents and agents deployed to LiveKit Cloud.

Ready to build?

Start building a voice AI agent with a free account. Reach out to us if you're interested in custom pricing.

No credit card required • 1,000 free agent session minutes monthly