Skip to main content

Open-source wake word training in a single command

Wake words are the short spoken phrases, like "Hey Siri" or "Alexa", that activate a voice-enabled device or agent. They're the first step in any hands-free voice interaction, and getting them right matters: too sensitive and they fire constantly, too strict and users have to repeat themselves.

Today we're launching livekit-wakeword, an open-source wake word library built for simplicity and speed.

Why we built this

If you've tried training wake word models before, you know the pain:

  • Existing codebases are outdated, with broken dependencies everywhere.
  • Documentation is sparse or nonexistent, so training new models requires hours or even days of reverse-engineering.

And even if you manage to train a model, you still end up with one that false-triggers constantly because you used the vanilla settings the authors provided.

We built livekit-wakeword to fix all of this. Now you can train your own wake word model from scratch, locally, with a single command.

Use cases

Custom wake words unlock hands-free voice activation across a wide range of applications:

  • Voice agents: Give your AI agent a branded activation phrase ("Hey Jarvis," "OK Chef") instead of relying on a generic keyword.
  • Smart home assistant: Train a custom phrase for your home setup without depending on cloud services.
  • Robotics: Activate a robot with a spoken command in noisy warehouse or factory environments.
  • Kiosks & accessibility devices: Enable hands-free activation for retail, healthcare, or public-facing hardware.
  • In-car & embedded systems: Trigger voice control in vehicles or IoT devices running on constrained hardware.

Performance

Even though our library is simple and fast, we didn't sacrifice accuracy. Compared to openWakeWord, livekit-wakeword achieved dramatically better results across every metric:

  • 100× fewer false positives per hour
  • 60× lower detection error
  • 86% vs 69% recall
Metriclivekit-wakewordopenWakeWord
False positives per hour (FPPH)0.088.50
Detection error tradeoff (AUT)0.00120.0720
Recall86%69%

FPPH measures how often the model incorrectly fires when no wake word was spoken — lower is better. AUT (area under the DET curve) captures the overall tradeoff between false positives and missed detections.

See the full comparison for DET curves, test conditions, and detailed methodology.

How it works

Under the hood, livekit-wakeword generates thousands of synthetic training samples using text-to-speech, then applies realistic audio augmentations (background noise, reverb, gain variation) to simulate real-world conditions. A lightweight convolutional-attention classifier trains on top of pre-computed audio embeddings, producing a small, fast model that generalizes well beyond its training data.

Since our exported models use the same ONNX format and inference pipeline as openWakeWord, they're fully compatible. Your Home Assistant or legacy projects still work with zero changes.

Part of the LiveKit ecosystem

livekit-wakeword is designed to work seamlessly with the LiveKit platform. Use a wake word to trigger a LiveKit Agent session. The wake word model runs locally on-device with minimal latency, and once activated, LiveKit handles the realtime audio streaming to your agent.

Start building with livekit-wakeword

To train a new wake word model, install the library and run setup:

1
# install livekit-wakeword with training, evaluation, and export extras
2
pip install livekit-wakeword[train,eval,export]
3
4
# download required embedding models and datasets
5
livekit-wakeword setup

Then create a config file for your wake word:

1
model_name: hey_robot
2
target_phrases:
3
- "hey robot"
4
5
n_samples: 10000 # synthetic training samples per class
6
model:
7
model_type: conv_attention # our new conv-attention classifier
8
model_size: small
9
steps: 50000 # training steps

Check out the README for the full list of config options. Once your config is ready, you can train your model with a single command:

1
# generates synthetic data, augments, trains, and exports to ONNX
2
# your model will be saved to ./output/hey_robot/hey_robot.onnx
3
livekit-wakeword run configs/hey_robot.yaml

That single command handles everything: synthetic data generation, augmentation, training, and ONNX export. You'll get a production-ready model file you can use right away. The exported model is a standard ONNX file, fully backward compatible with openWakeWord, so it drops into Home Assistant or any existing openWakeWord integration with zero changes.

To run detection, just load the model and feed it audio:

1
from livekit.wakeword import WakeWordModel
2
3
# load your exported ONNX model
4
model = WakeWordModel(models=["hey_robot.onnx"])
5
6
# feed 16kHz audio frames (int16 or float32)
7
scores = model.predict(audio_frame)
8
if scores["hey_robot"] > 0.5:
9
print("Wake word detected!")

We also provide a WakeWordListener that handles all the audio capture for you, so you can listen from the microphone without writing any audio code yourself:

1
from livekit.wakeword import WakeWordModel, WakeWordListener
2
3
model = WakeWordModel(models=["hey_robot.onnx"])
4
5
# captures audio from the microphone and runs detection automatically
6
async with WakeWordListener(model, threshold=0.5) as listener:
7
while True:
8
detection = await listener.wait_for_detection()
9
print(f"Detected {detection.name}!")

For a complete example that uses wake word detection to spawn a LiveKit agent, check out hello-wakeword.

Other runtimes

For production deployments, we currently support Rust. More runtimes are on the roadmap.

Future directions

On the hardware side, the current architecture already runs comfortably on single-board computers, but we're taking it further. We're building an end-to-end model that removes the need for a separate embedding model, making it small enough to run directly on ESP32 and other embedded microcontrollers.

Want to get involved? Check out the repo and join our developer community to share what you're building.