Gemma 4 31B on
LiveKit Inference
Same answer quality at 5.2x lower latency, and 6x lower cost.

83%cheaper
814msfaster
88%task completion
A faster, cheaper default
for voice agents
Gemma 4 31B is smart enough, fast enough, and affordable for real-time production traffic. It runs voice-agent tasks at a fraction of the latency and cost of proprietary defaults.
Faster, cheaper…
and smarter, too
Measured on a reference-agent evaluation with task-based judging. Gemma 4 31B clears the production bar on every voice-agent task we test.
Gemma runs better on LiveKit Inference
Other inference platforms optimize for throughput and accept higher latency. We do the opposite: we optimize for low latency and accept lower throughput. Voice can't wait.
How we measured every number
Latency metric
Time to first token: how quickly the model starts generating each response, measured across every turn of every scenario.
Latency harness
The same reference-agent conversations that produce the capability scores; every turn of every scenario contributes a latency sample.
Capability harness
A reference-agent evaluation with task-based judging across instruction following, tool calls, and multi-turn coherence.
Providers tested
Gemma 4 31B on LiveKit Inference and OpenRouter; GPT-4.1, GPT-4.1 mini, and Gemini 2.5 Flash for the model-choice comparison.
See the difference for yourself
One line in your LiveKit Agents session points your agent at Gemma 4 31B.
1# Gemma 4 31B, served on LiveKit's GPUs2from livekit.agents import AgentSession34session = AgentSession(5llm="google/gemma-4-31b-it",6)