Product

LiveKit Inference

Build voice agents with the leading AI models on the market. With LiveKit Inference, you can iterate quickly and swap models with a single line of code.

Start building

Documentation

Illustration of isometric grid with LiveKit Inference elements

300,000+ developers

Billions of calls annually

300+ AI model integrations

Fast and reliable access to the best voice AI models

Deploy production-grade voice agents with the best-performing STT, LLM, and TTS models in the market.

Reduce end-to-end latency with global co-location of agents and models, dynamic routing, and provisioned LLM capacity.

A graph showing the end-to-end latency of an agent session.

Fast and reliable access to the best voice AI models

Deploy production-grade voice agents with the best-performing STT, LLM, and TTS models in the market.

Performance at scale

Reduce end-to-end latency with global co-location of agents and models, dynamic routing, and provisioned LLM capacity.

No Model Lock-In

Swap in the latest voice AI models from any of the leading providers with one account, one bill, and no new vendor contracts.

1from livekit.agents import AgentSession, inference
2
3session = AgentSession(
4    stt=inference.STT(
5        model="deepgram/flux-general",
6        language="en"
7    ),
8    llm=inference.LLM(
9        model="openai/gpt-5.3-chat-latest",
10    ),
11    tts=inference.TTS(
12        model="cartesia/sonic-3",
13        voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
14    ),
15)

Concurrency simplified

Manage your concurrency limits for all your voice AI models in one place.

Screenshot of the LiveKit Cloud inference usage UI.

Detailed observability

View turn-by-turn latency statistics and traces for inference requests in LiveKit Cloud to optimize agent performance.

The LiveKit Platform

Build, run, and observe
agents with LiveKit Cloud

Our end-to-end platform powers enterprise-grade voice AI
for customer support at global scale.

Explore the full platform

Resources to help you learn and ship

Read the announcement blog

product

Prompt your agent to speak like a human

guide

Learn more about model pricing

product

FAQs

How do I access LiveKit Inference?

Any agent using LiveKit Cloud for transport has access to LiveKit Inference. Use the inference module in the Agents SDK to select which AI models your voice agent should use. To learn more, visit our docs.

What voice AI models are available on LiveKit Inference?

LiveKit Inference supports over 50 different voice AI models, including the latest models from OpenAI, Gemini, Cartesia, Deepgram, and ElevenLabs. For the complete list, see our pricing page.

What are the concurrency limits on LiveKit Inference?

Each LiveKit Cloud plan includes a monthly quota of LiveKit Inference credits (billed based on model prices) and a limit on the number of concurrent requests. To learn more, visit our pricing page and our docs.

How much does it cost to access AI models with LiveKit Inference?

LiveKit Inference credits are billed by model type: LLM by tokens, STT by duration, and TTS by characters. Discounted rates are available for most STT and TTS models on the Scale plan. Preferred rates for high-volume usage are available with an Enterprise contract. To learn more, visit our pricing page.

How do I decide between using LiveKit Inference and LiveKit Agents plugins?

For fast and reliable access to the most popular AI models, use LiveKit Inference. If you want to bring your own model or use a different provider, use the LiveKit Agents plugins. To learn more about visit our docs, visit our docs.

Is LiveKit Inference only available for agents deployed to LiveKit Cloud?

LiveKit Inference is available to any agent using LiveKit Cloud for transport, including self-hosted agents and agents deployed to LiveKit Cloud.

Ready to build?

Start building a voice AI agent with a free account. Reach out to us if you're interested in custom pricing.

Start building

Contact sales

No credit card required • 1,000 free agent session minutes monthly

LiveKit Inference