Wake words are the short spoken phrases, like "Hey Siri" or "Alexa", that activate a voice-enabled device or agent. They're the first step in any hands-free voice interaction, and getting them right matters: too sensitive and they fire constantly, too strict and users have to repeat themselves.
Today we're launching livekit-wakeword, an open-source wake word library built for simplicity and speed.
Why we built this
If you've tried training wake word models before, you know the pain:
- Existing codebases are outdated, with broken dependencies everywhere.
- Documentation is sparse or nonexistent, so training new models requires hours or even days of reverse-engineering.
And even if you manage to train a model, you still end up with one that false-triggers constantly because you used the vanilla settings the authors provided.
We built livekit-wakeword to fix all of this. Now you can train your own wake word model from scratch, locally, with a single command.
Use cases
Custom wake words unlock hands-free voice activation across a wide range of applications:
- Voice agents: Give your AI agent a branded activation phrase ("Hey Jarvis," "OK Chef") instead of relying on a generic keyword.
- Smart home assistant: Train a custom phrase for your home setup without depending on cloud services.
- Robotics: Activate a robot with a spoken command in noisy warehouse or factory environments.
- Kiosks & accessibility devices: Enable hands-free activation for retail, healthcare, or public-facing hardware.
- In-car & embedded systems: Trigger voice control in vehicles or IoT devices running on constrained hardware.
Performance
Even though our library is simple and fast, we didn't sacrifice accuracy. Compared to openWakeWord, livekit-wakeword achieved dramatically better results across every metric:
- 100× fewer false positives per hour
- 60× lower detection error
- 86% vs 69% recall
| Metric | livekit-wakeword | openWakeWord |
|---|---|---|
| False positives per hour (FPPH) | 0.08 | 8.50 |
| Detection error tradeoff (AUT) | 0.0012 | 0.0720 |
| Recall | 86% | 69% |
FPPH measures how often the model incorrectly fires when no wake word was spoken — lower is better. AUT (area under the DET curve) captures the overall tradeoff between false positives and missed detections.
See the full comparison for DET curves, test conditions, and detailed methodology.
How it works
Under the hood, livekit-wakeword generates thousands of synthetic training samples using text-to-speech, then applies realistic audio augmentations (background noise, reverb, gain variation) to simulate real-world conditions. A lightweight convolutional-attention classifier trains on top of pre-computed audio embeddings, producing a small, fast model that generalizes well beyond its training data.
Since our exported models use the same ONNX format and inference pipeline as openWakeWord, they're fully compatible. Your Home Assistant or legacy projects still work with zero changes.
Part of the LiveKit ecosystem
livekit-wakeword is designed to work seamlessly with the LiveKit platform. Use a wake word to trigger a LiveKit Agent session. The wake word model runs locally on-device with minimal latency, and once activated, LiveKit handles the realtime audio streaming to your agent.
Start building with livekit-wakeword
To train a new wake word model, install the library and run setup:
1# install livekit-wakeword with training, evaluation, and export extras2pip install livekit-wakeword[train,eval,export]34# download required embedding models and datasets5livekit-wakeword setup
Then create a config file for your wake word:
1model_name: hey_robot2target_phrases:3- "hey robot"45n_samples: 10000 # synthetic training samples per class6model:7model_type: conv_attention # our new conv-attention classifier8model_size: small9steps: 50000 # training steps
Check out the README for the full list of config options. Once your config is ready, you can train your model with a single command:
1# generates synthetic data, augments, trains, and exports to ONNX2# your model will be saved to ./output/hey_robot/hey_robot.onnx3livekit-wakeword run configs/hey_robot.yaml
That single command handles everything: synthetic data generation, augmentation, training, and ONNX export. You'll get a production-ready model file you can use right away. The exported model is a standard ONNX file, fully backward compatible with openWakeWord, so it drops into Home Assistant or any existing openWakeWord integration with zero changes.
To run detection, just load the model and feed it audio:
1from livekit.wakeword import WakeWordModel23# load your exported ONNX model4model = WakeWordModel(models=["hey_robot.onnx"])56# feed 16kHz audio frames (int16 or float32)7scores = model.predict(audio_frame)8if scores["hey_robot"] > 0.5:9print("Wake word detected!")
We also provide a WakeWordListener that handles all the audio capture for you, so you can listen from the microphone without writing any audio code yourself:
1from livekit.wakeword import WakeWordModel, WakeWordListener23model = WakeWordModel(models=["hey_robot.onnx"])45# captures audio from the microphone and runs detection automatically6async with WakeWordListener(model, threshold=0.5) as listener:7while True:8detection = await listener.wait_for_detection()9print(f"Detected {detection.name}!")
For a complete example that uses wake word detection to spawn a LiveKit agent, check out hello-wakeword.
Other runtimes
For production deployments, we currently support Rust. More runtimes are on the roadmap.
Future directions
On the hardware side, the current architecture already runs comfortably on single-board computers, but we're taking it further. We're building an end-to-end model that removes the need for a separate embedding model, making it small enough to run directly on ESP32 and other embedded microcontrollers.
Want to get involved? Check out the repo and join our developer community to share what you're building.