Real microphones don't capture clean speech. They pick up fans, music, a TV in the next room, a colleague talking three feet away, and the echo of your own voice bouncing back through your speakers. For a voice AI agent, that background noise and competing speech degrade transcription quality and interfere with turn detection, so the agent mishears words, talks over the user, or responds to speech that was never directed at it.
This post explains what noise cancellation (NC) means in LiveKit, where you can apply it (the agent, the client, and the SIP trunk), and how it differs from related concepts such as echo cancellation.
Noise cancellation best practices (TL;DR)
- For voice AI, apply noise cancellation within your agent only.
- For livestreaming, apply it within the client.
- For telephony without voice AI, apply it on the SIP trunk.
- For a single primary speaker, use a voice isolation model.
- For multiple speakers (or if a voice isolation model causes problems), use a model that suppresses all background noise.
Where to apply noise cancellation
Before choosing a specific model or setting, it helps to know where in the audio path the processing runs. LiveKit gives you three options, each applying to a different part of the pipeline.
- In the agent: Server-side processing of inbound audio, with access to LiveKit Cloud's enhanced models. This is the recommended default for most voice AI solutions.
- In the client (frontend): Web, mobile, or desktop SDKs process audio from the user's microphone before it's sent to the LiveKit room. LiveKit supports both Krisp models and WebRTC noiseSuppression on the frontend.
- At the SIP trunk: For telephony, Krisp cancellation is applied directly at the trunk for inbound or outbound calls.
Loading diagram…
Stacking models
LiveKit refers to both Krisp and ai-coustics as "enhanced" noise cancellation, and it's important not to stack enhanced models on the same audio. These models are trained on raw audio, so feeding them input that another model has already processed can produce unexpected results.
In practice: if you cancel noise in the agent, don't also enable Krisp on the frontend or your SIP trunk.
The exception is standard WebRTC noise suppression (not to be confused with background noise suppression) and the separate echo cancellation feature; both can be left enabled alongside an enhanced model.
Background noise suppression vs. voice isolation
LiveKit's enhanced noise cancellation comes in two flavors that solve different problems and are priced differently.
Background noise suppression removes non-speech noise such as traffic, fans, or music while preserving all speech. Use it when the main challenge is environmental noise, or when there are multiple legitimate speakers you want to keep (for example, diarization).
Voice isolation goes further: it emphasizes the primary speaker and suppresses competing speech as well as background noise. Use it for single-speaker scenarios, such as call centers, where cross-talk from nearby people could confuse transcription or turn detection.
LiveKit supports two providers across both categories: Krisp and ai-coustics.
| Background noise suppression | Voice isolation | |
|---|---|---|
| What it removes | Non-speech noise; keeps all speech | Competing voices and noise; keeps only the primary speaker |
| Best for | Environmental noise; multi-speaker / diarization | Single speaker; cross-talk environments |
| Models | Krisp NC, ai-coustics QUAIL_L | Krisp BVC, Krisp BVCTelephony, ai-coustics QUAIL_VF_S / QUAIL_VF_L |
| Cost | Included with LiveKit Cloud, no surcharge | Billed separately (metered) |
For more on these models, including word error rates (WER), see the noise cancellation documentation.
Noise cancellation vs. echo cancellation (and related concepts)
These terms are used interchangeably in casual conversation, but in LiveKit they're distinct features.
Echo cancellation removes your own speaker output looping back into your microphone, and is the reason you don't hear yourself a half-second late during a voice call. It's a separate feature from noise cancellation and can be left enabled alongside it. In the LiveKit SDKs, it's the WebRTC echoCancellation setting at the client.
WebRTC noise suppression removes unwanted sound. Specifically, this is the WebRTC built-in noiseSuppression, which offers lightweight, client-side processing. This contrasts with the "enhanced" noise cancellation offered by the Krisp or ai-coustics models. Although WebRTC noise suppression can be left enabled alongside the enhanced models, you see little benefit in practice, so you can leave it disabled.
VAD (voice activity detection) is a related but separate concept: it detects whether someone is speaking (to drive turn detection) rather than cleaning the audio. Cleaner input produces better VAD and turn-detection signals, because cancellation runs before VAD and STT in the pipeline. The ai-coustics plugin also ships a built-in VAD adapter, so you can skip running a separate VAD like Silero (more info).
Noise cancellation within the agent
Agent-side cancellation is the recommended approach for most voice AI use cases, and you can choose between Krisp and ai-coustics.
Install the package for your chosen provider by following the instructions in the documentation.
Then add the filter to your room input options when starting the session, specifying the desired model.
1# Python2from livekit.agents import room_io3from livekit.plugins import ai_coustics # or noise_cancellation45await session.start(6room_options=room_io.RoomOptions(7audio_input=room_io.AudioInputOptions(8# Voice isolation (metered):9noise_cancellation=ai_coustics.audio_enhancement(10model=ai_coustics.EnhancerModel.QUAIL_VF_S11),12# or noise_cancellation.BVC()13# background noise suppression (no surcharge):14# ai_coustics.audio_enhancement(model=ai_coustics.EnhancerModel.QUAIL_L)15# or noise_cancellation.NC()16),17),18)
1// Node.js2import * as aiCoustics from '@livekit/plugins-ai-coustics';3// or BackgroundVoiceCancellation from '@livekit/noise-cancellation-node'45await session.start({6inputOptions: {7// Voice isolation (metered):8noiseCancellation: aiCoustics.audioEnhancement({ model: 'quailVfS' }),9// or BackgroundVoiceCancellation() from '@livekit/noise-cancellation-node'10// Background suppression (no surcharge):11// aiCoustics.audioEnhancement({ model: 'quailL' })12// or NoiseCancellation() from '@livekit/noise-cancellation-node'13},14});
You can find additional configuration options, such as the ai-coustics enhancement level, in the docs.
Enhanced noise cancellation support for self-hosted agents
Provided you use LiveKit Cloud for your media transport, you can use any of the Krisp or ai-coustics models, whether you host your agents on LiveKit Cloud or self-host them. Billable models are charged at the same pricing tiers regardless of how you host your agents, as detailed in the pricing section later in this post.
Be aware that enhanced cancellation is CPU- and memory-intensive. If you self-host your agents, expect higher CPU and memory requirements than for agents that don't use it.
If you self-host LiveKit server, you are more limited in your model choice, though you can run ai-coustics by providing your own key, as detailed in the docs.
Noise cancellation in the client / frontend (web and mobile)
Client-side cancellation cleans up the microphone input on the user's own device before that audio is encoded and sent into the room. The docs refer to this as "outbound" audio, as in "outbound from the device into the LiveKit room".
When should you cancel in the client rather than the agent? Client-side makes sense for conferencing and livestreaming apps where the audio goes to other humans rather than an agent. For voice AI, apply noise cancellation within your agent, and avoid using two enhanced models on the same audio pathway, as explained earlier.
Two distinct types of noise cancellation can be applied in the frontend:
The Krisp frontend filter
The Krisp frontend filter applies enhanced cancellation directly in the SDK. Platform support varies: notably, the BVC model (which offers voice isolation) is only available in the JavaScript/web SDK, while the other client SDKs support the standard Krisp NC model. For the full breakdown of which SDKs support which models, see the table in the docs.
On the web, the React components package exposes a useKrispNoiseFilter hook, or you can use the KrispNoiseFilter class directly, as shown below:
1// Web (base JS SDK)2const { KrispNoiseFilter } = await import('@livekit/krisp-noise-filter');3const krispProcessor = KrispNoiseFilter();4await trackPublication.track.setProcessor(krispProcessor);5await krispProcessor.setEnabled(true);
Note that not all browsers support Krisp, so be sure to check with isKrispNoiseFilterSupported().
The mobile and desktop SDKs follow the same pattern: instantiate the Krisp processor and attach it as the capture post-processor. All these examples use the standard Krisp NC model, which offers noise cancellation but not voice isolation:
1// Swift2import LiveKit3import SwiftUI4import LiveKitKrispNoiseFilter56// Keep this as a global variable or somewhere that won't be deallocated7let krispProcessor = LiveKitKrispNoiseFilter()89struct ContentView: View {10@StateObject private var room = Room()1112var body: some View {13MyOtherView()14.environmentObject(room)15.onAppear {16// Attach the processor17AudioManager.shared.capturePostProcessingDelegate = krispProcessor18// This must be done before calling `room.connect()`19room.add(delegate: krispProcessor)2021// You are now ready to connect to the room from this view or any child view22}23}24}
1// Android (Kotlin)2val krisp = KrispAudioProcessor.getInstance(getApplication())3coroutineScope.launch(Dispatchers.IO) { krisp.init() } // once, off the main thread45val room = LiveKit.create(6getApplication(),7overrides = LiveKitOverrides(8audioOptions = AudioOptions(9audioProcessorOptions = AudioProcessorOptions(capturePostProcessor = krisp),10),11),12)13// Or after creation: room.audioProcessingController.setCapturePostProcessing(krisp)
You can find examples for additional languages in the docs.
WebRTC noise and echo cancellation
WebRTC noise suppression and echo cancellation are the built-in, browser-level features, enabled by passing the corresponding properties in the AudioCaptureOptions object at connection time (see JavaScript, Swift, Flutter). If you are not using an enhanced model at any stage of your audio pipeline, LiveKit strongly recommends enabling these.
Noise cancellation for SIP (trunk-based telephony)
Phone calls have no browser frontend to run a filter in, so LiveKit lets you apply Krisp cancellation directly at the SIP trunk for both inbound and outbound calls. This uses the standard Krisp NC model at the trunk level; the other models are not available for SIP.
It helps to picture where the trunk sits. The trunk is the bridge between your SIP provider (Twilio, Telnyx, etc.) and the LiveKit room; trunk-level noise cancellation cleans the phone-network audio as it crosses that bridge, before it ever reaches the room or your agent:
Loading diagram…
For inbound calls, set krisp_enabled: true (or krispEnabled in JSON / most SDKs) in the inbound trunk configuration.
1{2"trunk": {3"name": "My trunk",4"numbers": ["+15105550100"],5"krispEnabled": true6}7}
For outbound calls, set krisp_enabled: true in the CreateSIPParticipant request.
1# Python2request = CreateSIPParticipantRequest(3sip_trunk_id="<trunk_id>",4sip_call_to="<phone_number>",5room_name="my-sip-room",6participant_identity="sip-participant",7krisp_enabled=True,8)
1// Node.js2const participant = await sipClient.createSipParticipant(3trunkId,4phoneNumber,5'my-sip-room',6{7participantIdentity: 'sip-participant',8krispEnabled: true9},10);
Combining SIP participants with agents
Where your SIP caller dials into a room with an agent, you risk applying noise cancellation in two places: on the SIP trunk and within the agent (as described previously). In this case, don't enable noise cancellation at both the trunk level and within the agent; LiveKit recommends enabling it only on your agent. You have a couple of options:
- BVCTelephony is Krisp's voice-isolation model, tuned specifically for the narrow, compressed audio band of phone calls. You can use Selectors to set the noise cancellation model dynamically based on participant type.
- ai-coustics models can also process phone audio. Although not specifically tuned for telephony, they offer more configuration than BVCTelephony.
1# Python — telephony-tuned Krisp voice isolation2noise_cancellation=noise_cancellation.BVCTelephony()3# or an ai-coustics model, e.g.:4# noise_cancellation=ai_coustics.audio_enhancement(model=ai_coustics.EnhancerModel.QUAIL_VF_S)
1// Node.js2noiseCancellation: TelephonyBackgroundVoiceCancellation()3// or aiCoustics.audioEnhancement({ model: 'quailVfS' })
Pricing
As explained earlier, background noise suppression and voice isolation are two separate concepts: background noise suppression removes all non-speech noise, while voice isolation isolates a primary speaker. LiveKit prices these two forms of noise cancellation separately on its pricing page:
| Noise cancellation | Description | Pricing page | Cost |
|---|---|---|---|
| Voice isolation at the agent level | Models that isolate the primary speaker: Krisp BVC, Krisp BVCTelephony, ai-coustics QUAIL_VF_S / QUAIL_VF_L | Voice isolation | Free allowance, then metered * |
| Noise suppression at the agent level | Models that remove all background noise: Krisp NC, ai-coustics QUAIL_L | Background noise suppression | Included with LiveKit Cloud † |
| Noise suppression at the frontend or SIP trunk | Krisp NC and BVC models, where supported (as described earlier) | Enhanced noise cancellation | Included with LiveKit Cloud † |
* All tiers, including the free tier, receive an allowance, but metered use over that allowance only applies to paid plans.
† Includes agents hosted in LiveKit Cloud and self-hosted agents. Does not include self-hosted LiveKit deployments. If you self-host your SFU, the ai-coustics plugin can authenticate directly against ai-coustics with your own license key. See the docs for more information.
Troubleshooting steps
| Issue | Resolution |
|---|---|
| Agent transcribes background chatter as words | Apply Background noise suppression in your agent. |
| Nearby people's speech confuses the agent | Make sure your noise suppression is using a Voice isolation model. |
| User hears themselves echoed back | Apply Echo cancellation (WebRTC) in the client. |
| Turn detection misfires in noisy rooms | Apply either Voice isolation for a single speaker use case, or background suppression if you have multiple speakers. |
| SIP audio quality is bad | Make sure you have noise cancellation enabled. The docs also list several other troubleshooting steps not related to noise cancellation. |
| Audio works fine with a headset, but using a laptop microphone causes the voice to be very faint | Models designed for a primary speaker work best in close-microphone scenarios, like headsets. Do not use a voice isolation model in this scenario. |
| Call audio goes very quiet after an answering machine or IVR | Models designed for a primary speaker will not switch speaker during the conversation. What has likely happened is that the model detected the user's speech as secondary, having first detected the answering machine or IVR. Do not use a voice isolation model in this scenario. |
| Audio quality is poor when viewing the call in Agent Insights, but quality is fine in the Egress output | Any noise cancellation applied at the agent level shows up in the Agent Insights traces but is absent from Egress; this indicates a misconfiguration of your agent noise cancellation. |
| Agent is not transcribing audio | The agent noise cancellation may be misconfigured or too aggressive. Try switching to a different model, specifically one that does not support voice isolation. |
| STT transcriptions are inaccurate | Some speech-to-text (STT) models are trained on audio that includes background noise, performing better without noise cancellation applied. Try disabling noise cancellation in your agent to see if your overall experience improves. |
Working with LiveKit support?
Remember that the support team does not have access to your Agent Insights data, including recordings, unless you share the specific sessions with them.