Open-source wake word training in a single command

Wake words are the short spoken phrases, like "Hey Siri" or "Alexa", that activate a voice-enabled device or agent. They're the first step in any hands-free voice interaction, and getting them right matters: too sensitive and they fire constantly, too strict and users have to repeat themselves.

Today we're launching livekit-wakeword, an open-source wake word library built for simplicity and speed.

Why we built this#

If you've tried training wake word models before, you know the pain:

Existing codebases are outdated, with broken dependencies everywhere.
Documentation is sparse or nonexistent, so training new models requires hours or even days of reverse-engineering.

And even if you manage to train a model, you still end up with one that false-triggers constantly because you used the vanilla settings the authors provided.

We built livekit-wakeword to fix all of this. Now you can train your own wake word model from scratch, locally, with a single command.

Use cases#

Custom wake words unlock hands-free voice activation across a wide range of applications:

Voice agents: Give your AI agent a branded activation phrase ("Hey Jarvis," "OK Chef") instead of relying on a generic keyword.
Smart home assistant: Train a custom phrase for your home setup without depending on cloud services.
Robotics: Activate a robot with a spoken command in noisy warehouse or factory environments.
Kiosks & accessibility devices: Enable hands-free activation for retail, healthcare, or public-facing hardware.
In-car & embedded systems: Trigger voice control in vehicles or IoT devices running on constrained hardware.

Performance#

Even though our library is simple and fast, we didn't sacrifice accuracy. Compared to openWakeWord, livekit-wakeword achieved dramatically better results across every metric:

100× fewer false positives per hour
60× lower detection error
86% vs 69% recall

Metric	livekit-wakeword	openWakeWord
False positives per hour (FPPH)	0.08	8.50
Detection error tradeoff (AUT)	0.0012	0.0720
Recall	86%	69%

FPPH measures how often the model incorrectly fires when no wake word was spoken — lower is better. AUT (area under the DET curve) captures the overall tradeoff between false positives and missed detections.

See the full comparison for DET curves, test conditions, and detailed methodology.

How it works#

Under the hood, livekit-wakeword generates thousands of synthetic training samples using text-to-speech, then applies realistic audio augmentations (background noise, reverb, gain variation) to simulate real-world conditions. A lightweight convolutional-attention classifier trains on top of pre-computed audio embeddings, producing a small, fast model that generalizes well beyond its training data.

Since our exported models use the same ONNX format and inference pipeline as openWakeWord, they're fully compatible. Your Home Assistant or legacy projects still work with zero changes.

Part of the LiveKit ecosystem#

livekit-wakeword is designed to work seamlessly with the LiveKit platform. Use a wake word to trigger a LiveKit Agent session. The wake word model runs locally on-device with minimal latency, and once activated, LiveKit handles the realtime audio streaming to your agent.

Start building with livekit-wakeword#

To train a new wake word model, install the library and run setup:

1# install livekit-wakeword with training, evaluation, and export extras
2pip install livekit-wakeword[train,eval,export]
3
4# download required embedding models and datasets
5livekit-wakeword setup

Then create a config file for your wake word:

1model_name: hey_robot
2target_phrases:
3  - "hey robot"
4
5n_samples: 10000 # synthetic training samples per class
6model:
7  model_type: conv_attention # our new conv-attention classifier
8  model_size: small
9steps: 50000 # training steps

Check out the README for the full list of config options. Once your config is ready, you can train your model with a single command:

1# generates synthetic data, augments, trains, and exports to ONNX
2# your model will be saved to ./output/hey_robot/hey_robot.onnx
3livekit-wakeword run configs/hey_robot.yaml

That single command handles everything: synthetic data generation, augmentation, training, and ONNX export. You'll get a production-ready model file you can use right away. The exported model is a standard ONNX file, fully backward compatible with openWakeWord, so it drops into Home Assistant or any existing openWakeWord integration with zero changes.

To run detection, just load the model and feed it audio:

1from livekit.wakeword import WakeWordModel
2
3# load your exported ONNX model
4model = WakeWordModel(models=["hey_robot.onnx"])
5
6# feed 16kHz audio frames (int16 or float32)
7scores = model.predict(audio_frame)
8if scores["hey_robot"] > 0.5:
9    print("Wake word detected!")

We also provide a WakeWordListener that handles all the audio capture for you, so you can listen from the microphone without writing any audio code yourself:

1from livekit.wakeword import WakeWordModel, WakeWordListener
2
3model = WakeWordModel(models=["hey_robot.onnx"])
4
5# captures audio from the microphone and runs detection automatically
6async with WakeWordListener(model, threshold=0.5) as listener:
7    while True:
8        detection = await listener.wait_for_detection()
9        print(f"Detected {detection.name}!")

For a complete example that uses wake word detection to spawn a LiveKit agent, check out hello-wakeword.

Other runtimes#

For production deployments, we currently support Rust. More runtimes are on the roadmap.

Future directions#

On the hardware side, the current architecture already runs comfortably on single-board computers, but we're taking it further. We're building an end-to-end model that removes the need for a separate embedding model, making it small enough to run directly on ESP32 and other embedded microcontrollers.

Want to get involved? Check out the repo and join our developer community to share what you're building.

07.02.2026