How to Build a Background Observer for Voice AI Guardrails

Most voice agent guardrails live inside the agent's system prompt. That works for simple rules, but it tends to break down when safety logic gets complex. A prompt telling your agent to "watch for threats and escalate immediately" competes with the agent's primary job of having a natural conversation. The same model that handles conversation also enforces compliance, and in practice, complex rules can dilute the agent's conversational ability while adding latency. When the conversation model runs under strict latency constraints, adding multi-step safety reasoning often means either the safety logic stays shallow or the response time suffers.

The observer pattern for AI agent orchestration solves this by separating detection from response. A background process monitors the conversation in parallel, evaluates it with a separate LLM, and injects corrections into the active agent's context. The user never knows a second model is involved. The agent finds new instructions in its context and adjusts.

This guide walks you through building a background observer using LiveKit's Agents framework. You'll use the conversation_item_added event for real-time transcript monitoring, a separate LLM for asynchronous policy evaluation, and update_chat_ctx to inject guardrails without interrupting the conversation.

The observer pattern for AI agent orchestration#

The observer runs a three-phase loop alongside the main conversation:

Listen. Register a listener on the AgentSession's conversation_item_added event. Every time the user speaks, the observer captures the transcribed text.
Evaluate. Send the accumulated transcript to a separate LLM with a structured evaluation prompt. This model can be slower and more capable than the conversation model because it runs asynchronously.
Inject. When the evaluator flags an issue, copy the active agent's chat context, append a system message with the guardrail instruction, and push the updated context back. The agent picks up the new instruction on its next turn.

The observer never blocks the conversation. The user keeps talking, the agent keeps responding, and the observer evaluates in the background. If it finds something, the agent's behavior changes on the next turn.

Setting up the agent session#

Here's the session setup for a ride-share support agent with a background observer. The primary agent handles the conversation while the observer monitors for policy violations.

1server = AgentServer()
2
3
4def prewarm(proc: JobProcess) -> None:
5    proc.userdata["vad"] = silero.VAD.load()
6
7
8server.setup_fnc = prewarm
9
10
11@server.rtc_session(agent_name="rideshare-agent")
12async def rideshare_agent(ctx: JobContext) -> None:
13    ctx.log_context_fields = {"room": ctx.room.name}
14
15    session = AgentSession[RideData](
16        userdata=RideData(),
17        stt=inference.STT(model="deepgram/nova-3", language="multi"),
18        llm=inference.LLM(model="openai/gpt-4.1-mini"),
19        tts=inference.TTS(
20            model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"
21        ),
22        turn_detection=MultilingualModel(),
23        vad=ctx.proc.userdata["vad"],
24        preemptive_generation=True,
25    )
26
27    # The observer uses a more capable model for nuanced policy analysis.
28    # It runs entirely in the background — never blocking the main conversation.
29    observer_llm = inference.LLM(model="openai/gpt-4.1")
30    start_observer(session, observer_llm)
31
32    await session.start(
33        agent=SupportAgent(),
34        room=ctx.room,
35        room_options=room_io.RoomOptions(
36            audio_input=room_io.AudioInputOptions(
37                noise_cancellation=lambda params: (
38                    noise_cancellation.BVCTelephony()
39                    if params.participant.kind
40                    == rtc.ParticipantKind.PARTICIPANT_KIND_SIP
41                    else noise_cancellation.BVC()
42                ),
43            ),
44        ),
45    )
46
47    await ctx.connect()

Two things to notice here. The session uses inference.LLM(model="openai/gpt-4.1-mini") for the conversation, a fast model optimized for low latency. The observer gets inference.LLM(model="openai/gpt-4.1"), a more capable model that can handle policy analysis. Both use the LiveKit Inference Gateway, so the only credentials you need are LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET. No separate provider API keys required.

The start_observer function is synchronous. It instantiates the PolicyObserver in the constructor and returns immediately, so it does not block session startup.

Building the observer#

The observer is a plain Python class, not an Agent subclass. It never takes control of the session, never speaks to the user, and never appears in the agent lifecycle. It just listens and injects.

Here's the core structure:

1class PolicyObserver:
2    """
3    Monitors the ride-share support conversation in parallel for policy violations.
4
5    Runs an LLM evaluation after every user turn. If an eval is already in
6    flight when a new turn arrives, a follow-up eval is scheduled to run
7    immediately after, so no content is ever skipped.
8
9    When a violation is detected, injects a [POLICY: ...] system message into
10    the active agent's chat_ctx via update_chat_ctx. The agent sees the hint
11    on its next reply cycle without any interruption or handoff. Each violation
12    type is injected at most once per call to avoid noise.
13    """
14
15    VIOLATION_KEYS: ClassVar[list[str]] = [
16        "safety_emergency",  # Caller is in immediate physical danger
17        "threatening_language",  # Threats toward driver, agent, or others
18        "discrimination_report",  # Discriminatory driver behavior
19        "fraud_attempt",  # Fabricating or manipulating a claim
20        "harassment_report",  # Sexual or persistent verbal harassment
21    ]
22
23    def __init__(self, session: AgentSession, observer_llm) -> None:
24        self.session = session
25        self.observer_llm = observer_llm
26        self.conversation_history: list[dict] = []
27        self.injected_violations: set[str] = set()  # Each type injected at most once
28        self._evaluating = False
29        self._pending_eval = False  # New content arrived while eval was in flight
30        self._bg_tasks: set[asyncio.Task] = set()  # Keep references to prevent GC
31
32        self._setup_listeners()
33        logger.info("[OBSERVER] Policy observer attached to session")

The constructor takes two arguments: the AgentSession to monitor and a separate LLM instance for evaluation. The injected_violations set tracks which violation types have already been injected, preventing duplicate guardrails. Each violation type is injected at most once per session. The _evaluating flag and _pending_eval flag work together to ensure only one evaluation runs at a time while guaranteeing no user turns are skipped.

Listening to conversation events#

The observer hooks into the session's conversation_item_added event to capture every user turn:

1def _setup_listeners(self) -> None:
2        @self.session.on("conversation_item_added")
3        def on_item_added(event: ConversationItemAddedEvent) -> None:
4            # Only monitor user speech, not agent replies
5            if event.item.role != "user":
6                return
7
8            text = "".join(c for c in event.item.content if isinstance(c, str))
9            if not text.strip():
10                return
11
12            self.conversation_history.append({"role": "user", "text": text})
13            logger.debug(f"[OBSERVER] Buffered: {text[:80]}")
14
15            # Fire an eval on every user turn. If one is already running,
16            # _pending_eval ensures a follow-up runs when it finishes.
17            task = asyncio.create_task(self._evaluate())
18            self._bg_tasks.add(task)
19            task.add_done_callback(self._bg_tasks.discard)

The conversation_item_added event fires whenever an item is committed to the chat history, for both user and agent messages. The event carries an item property with a role field, so the observer filters for user turns only. Agent responses don't need policy monitoring.

Each user turn triggers an evaluation via asyncio.create_task(). Calling await self._evaluate() directly inside the event handler would block the session's event processing, so the async task is the right approach. The task reference is stored in _bg_tasks to prevent garbage collection.

Running asynchronous evaluations#

The evaluation method sends the last 10 user turns to the observer LLM and parses the structured response:

1async def _evaluate(self) -> None:
2        """Run LLM-based policy evaluation on recent conversation history.
3
4        Only one eval runs at a time. If a new user turn arrives while an eval
5        is in flight, _pending_eval is set to True. When the current eval
6        finishes, it schedules one follow-up run, so no content is ever
7        skipped and concurrent LLM calls never pile up.
8        """
9        if self._evaluating:
10            self._pending_eval = True  # Content arrived mid-eval; re-run when done
11            return
12        self._evaluating = True
13        self._pending_eval = False
14        try:
15            recent = self.conversation_history[-10:]
16            transcript = "\n".join(f"caller: {m['text']}" for m in recent)
17
18            prompt = f"""You are a policy compliance monitor for a ride-sharing support line.
19Analyze the caller's statements below and return ONLY a JSON object, no prose.
20
21Transcript:
22{transcript}
23
24Return this exact JSON structure:
25{{
26  "safety_emergency": false,
27  "threatening_language": false,
28  "discrimination_report": false,
29  "fraud_attempt": false,
30  "harassment_report": false,
31  "details": ""
32}}
33"""
34            chat_ctx = ChatContext()
35            chat_ctx.add_message(role="user", content=prompt)
36
37            response_text = ""
38            async with self.observer_llm.chat(chat_ctx=chat_ctx) as stream:
39                async for chunk in stream:
40                    if chunk.delta and chunk.delta.content:
41                        response_text += chunk.delta.content
42
43            result = self._parse_response(response_text)
44            if result:
45                await self._process_violations(result)
46
47        except Exception:
48            logger.exception("[OBSERVER] Evaluation error")
49        finally:
50            self._evaluating = False
51            if self._pending_eval:
52                # New content accumulated while we were running, evaluate it now.
53                logger.info(
54                    "[OBSERVER] Pending eval, re-running for accumulated turns"
55                )
56                task = asyncio.create_task(self._evaluate())
57                self._bg_tasks.add(task)
58                task.add_done_callback(self._bg_tasks.discard)

The _evaluating guard at the top prevents concurrent evaluations from stacking. Without this, rapid user turns could spawn multiple overlapping LLM calls, wasting compute and producing conflicting results.

When a new turn arrives while an evaluation is running, _pending_eval is set to True. After the current evaluation finishes, it checks this flag and schedules a follow-up run. This guarantees that no user content is ever skipped, even during long evaluations.

The observer LLM is called via the streaming chat interface, using async with self.observer_llm.chat(chat_ctx=chat_ctx) as stream and yielding chunks with chunk.delta.content. The evaluation prompt asks for structured JSON with boolean fields for each violation category plus a details string, making parsing mostly deterministic.

Parsing the evaluation response#

Just to be safe, the parser also includes a fallback that extracts JSON from within surrounding text:

1def _parse_response(self, text: str) -> dict | None:
2        try:
3            return json.loads(text.strip())
4        except json.JSONDecodeError:
5            pass
6        start, end = text.find("{"), text.rfind("}") + 1
7        if start >= 0 and end > start:
8            try:
9                return json.loads(text[start:end])
10            except json.JSONDecodeError:
11                pass
12        logger.warning(f"[OBSERVER] Could not parse response: {text[:100]}")
13        return None

The first attempt tries direct parsing. If that fails, it finds the first { and last } and parses the substring between them. This handles the common case where the model wraps its JSON in markdown code fences or adds explanatory text before the response.

Injecting guardrails into the active agent#

When the evaluator detects a violation, the observer injects a system message into the active agent's chat context. Here's the injection flow:

1async def _inject_guardrail(self, violation: str, details: str) -> None:
2        """Inject a policy hint into the active agent's context.
3
4        Copies the current chat context, appends a system message with the
5        guardrail instruction, then updates the agent's context. The agent
6        sees this on its next reply cycle without any interruption.
7        """
8        current_agent = self.session.current_agent
9        if not current_agent:
10            logger.warning("[OBSERVER] No active agent to inject into")
11            return
12
13        hint = self.GUARDRAIL_HINTS.get(violation, "")
14        if details:
15            hint = f"{hint}\n\nObserver analysis: {details}"
16
17        ctx_copy = current_agent.chat_ctx.copy()
18        ctx_copy.add_message(role="system", content=hint)
19        await current_agent.update_chat_ctx(ctx_copy)
20        logger.info(f"[OBSERVER] Injected guardrail: {violation}")

The injection is a three-step process:

Get the active agent via session.current_agent. This property returns whichever agent is currently handling the conversation.
Copy the chat context with current_agent.chat_ctx.copy(). The agent's chat context is read-only. Trying to modify it directly raises an error: "trying to modify a read-only chat context, please use .copy() and agent.update_chat_ctx() to modify the chat context." You must work with a copy.
Push the update with await current_agent.update_chat_ctx(ctx_copy). The agent will use this modified context for its next LLM call.

The agent never knows the observer exists. It finds a new [POLICY: ...] system message in its context and acts on it. From the agent's perspective, the instruction was always there.

Deduplication and violation processing#

Each violation type is injected at most once per session. Without deduplication, the observer would inject the same guardrail every evaluation cycle, filling the agent's context with repeated instructions.

1async def _process_violations(self, result: dict) -> None:
2        for key in self.VIOLATION_KEYS:
3            if result.get(key) and key not in self.injected_violations:
4                details = result.get("details", "")
5                logger.warning(f"[OBSERVER] Violation: {key} — {details}")
6                await self._inject_guardrail(key, details)
7                self.injected_violations.add(key)

The injected_violations set tracks which types have been sent. Once a violation is injected, it won't be injected again even if later evaluations flag the same category. This keeps the agent's context clean.

The guardrail hints#

Each violation type maps to a specific, actionable instruction for the agent. The hints follow a bracketed tag format ([POLICY: TYPE]) with instructions that reference the agent's available tools:

1GUARDRAIL_HINTS: ClassVar[dict[str, str]] = {
2        "safety_emergency": (
3            "[POLICY: SAFETY EMERGENCY] The caller may be in immediate physical danger. "
4            "Use escalate_to_safety_team immediately. Keep them calm and on the line. "
5            "Do not ask unnecessary questions."
6        ),
7        "threatening_language": (
8            "[POLICY: THREATENING LANGUAGE] Threatening language has been detected. "
9            "Stay calm and do not escalate. Inform the caller that all calls are recorded "
10            "and threats violate our terms of service. End the call safely if threats continue."
11        ),
12        "discrimination_report": (
13            "[POLICY: DISCRIMINATION REPORT] The caller is describing discriminatory behavior "
14            "by their driver. Acknowledge their experience with empathy. Assure them this will "
15            "be reviewed under our zero-tolerance policy. Use file_driver_report to formally "
16            "document the report."
17        ),
18        "fraud_attempt": (
19            "[POLICY: FRAUD FLAG] This interaction shows signs of claim manipulation. "
20            "Continue collecting information normally, do not accuse the caller. "
21            "The account has been flagged internally for review."
22        ),
23        "harassment_report": (
24            "[POLICY: HARASSMENT REPORT] The caller is describing harassment by their driver. "
25            "Acknowledge their experience with empathy. Use file_driver_report to document "
26            "the report. Tell them we take this seriously and will follow up within 24 hours."
27        ),
28    }

Each hint tells the agent what to do, not just what happened. The hints reference specific tools (escalate_to_safety_team, file_driver_report) so the agent knows which actions to take. This is more reliable than vague instructions like "handle this appropriately."

The support agent#

The primary agent handles the conversation and responds to guardrail hints when they appear in its context:

1class SupportAgent(Agent):
2    """
3    Conversational ride-share support agent.
4
5    Handles fare disputes, driver complaints, and safety concerns through
6    natural conversation. The PolicyObserver runs alongside this agent and
7    injects [POLICY: ...] system messages into its context when violations
8    are detected. This agent acts on those hints without being explicitly
9    aware of the observer.
10    """
11
12    def __init__(self) -> None:
13        super().__init__(
14            instructions=(
15                "You are a ride-share customer support agent. Help callers with fare "
16                "disputes, driver complaints, and safety concerns. Be calm, empathetic, "
17                "and professional. If you see a [POLICY: ...] alert in your context, "
18                "act on it before continuing normally."
19            )
20        )
21
22    async def on_enter(self) -> None:
23        await self.session.generate_reply(
24            instructions=(
25                "Greet the caller. Tell them they've reached ride-share support and "
26                "ask them to briefly describe the issue they're calling about today."
27            )
28        )

The SupportAgent subclasses Agent and uses on_enter to greet the caller via session.generate_reply. Its instructions include a single line about acting on [POLICY: ...] alerts. That's all the agent needs to know. The observer handles all the detection logic.

The agent also has tools for acting on guardrails. The escalate_to_safety_team tool handles emergency escalations, and file_driver_report documents complaints:

1@function_tool
2    async def escalate_to_safety_team(self, context: RunContext_T) -> str:
3        """
4        Use this when the caller is in immediate danger or when the context
5        contains a [POLICY: SAFETY EMERGENCY] alert.
6        """
7        logger.warning(
8            f"[ESCALATION] ride={context.userdata.ride_id}, "
9            f"rider={context.userdata.rider_name}"
10        )
11        context.userdata.safety_emergency = True
12        return (
13            "This call has been escalated to our 24/7 safety team. "
14            "A specialist will call the rider back within two minutes. "
15            "Stay on the line with the caller until they confirm they are safe."
16        )

The tool descriptions reference the specific policy tags (for example, [POLICY: SAFETY EMERGENCY]) so the model can match guardrail hints to the correct tool.

Putting it all together#

Here's the start_observer function that wires everything up:

1def start_observer(session: AgentSession, observer_llm) -> PolicyObserver:
2    """Attach a PolicyObserver to the session. Returns immediately, non-blocking."""
3    return PolicyObserver(session=session, observer_llm=observer_llm)

The observer attaches its event listener in the constructor and starts monitoring as soon as conversation_item_added events fire. No async setup required. The full data flow looks like this:

The user speaks. LiveKit's STT transcribes the audio.
The transcription is committed to the chat history, firing a conversation_item_added event.
The observer captures the text and starts an async evaluation.
Meanwhile, the primary agent responds to the user normally.
The observer's LLM evaluates the transcript and returns structured JSON.
If a violation is detected, the observer copies the agent's context, adds a policy hint, and pushes the update.
On the agent's next turn, it sees the [POLICY: ...] message and adjusts its behavior.

Steps 3 through 6 happen in the background. The user and the primary agent are unaware of the evaluation.

Production considerations#

Evaluation frequency. This implementation evaluates on every user turn. For high-volume conversations, you might batch evaluations every N turns or use a token-count threshold instead. The Doheny Surf Desk example uses a threshold of every 3 user turns. The tradeoff is detection latency vs. LLM cost.

Model selection. The observer model should be more capable than the conversation model for complex policy analysis, but it doesn't need to be the largest model available. Test your specific violation categories and pick the smallest model that reliably detects them.

Structured output reliability. The JSON parsing includes a fallback for imperfect responses, but some models are better at structured output than others. Consider using a model that supports structured output natively, or add validation with defaults for missing fields.

Agent handoffs. If your application uses agent handoffs, the active agent might change between when the observer starts an evaluation and when it tries to inject. The session.current_agent call in _inject_guardrail always returns the current agent at injection time, so the injection goes to whoever is active at that moment, not the agent that was active when evaluation started. Verify that this is the behavior you want for your use case.

Context growth. Each injected guardrail adds a system message to the agent's context. With five violation categories and one-time deduplication, the maximum added context per session is five system messages. If you have many more categories, consider consolidating or removing stale guardrails.

What's next#

The observer pattern gives you a clean approach to AI agent orchestration, separating conversation handling from safety monitoring. Your front-line agent stays fast and focused while the observer handles the complex evaluation in the background.

To adapt this pattern for your use case, start with the violation categories that matter most for your application. Swap the ride-share policy hints for your own domain-specific guardrails. The three-phase loop (listen, evaluate, inject) stays the same regardless of what you're monitoring.

The full working example is available in the LiveKit Python agent examples. Give it a try and let us know what you're building.

07.16.2026