Redacting PII from agent logs and transcripts

Handling credit cards, SSNs, or health data? Here's how to redact PII from your agent's output.

Real-time redaction with `llm_node`

To redact PII from agent speech before it's spoken, override the llm_node method:

1import re
2
3class PIIRedactingAgent(Agent):
4    def redact_pii(self, text: str) -> str:
5        patterns = {
6            r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',
7            r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',
8            r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',
9            r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',
10        }
11        for pattern, replacement in patterns.items():
12            text = re.sub(pattern, replacement, text)
13        return text
14
15    async def llm_node(self, chat_ctx, tools, model_settings=None):
16        async def process_stream():
17            async with self.session.llm.chat(
18                chat_ctx=chat_ctx, tools=tools, tool_choice=None
19            ) as stream:
20                async for chunk in stream:
21                    content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else None
22                    if content:
23                        yield self.redact_pii(content)
24                    else:
25                        yield chunk
26        return process_stream()

This intercepts LLM output and scrubs patterns before they reach TTS or logs.

Redacting transcripts for export

To export redacted transcripts for analytics or record-keeping:

1def redact_pii(text: str) -> str:
2    patterns = {
3        r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',
4        r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',
5        r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',
6        r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',
7    }
8    for pattern, replacement in patterns.items():
9        text = re.sub(pattern, replacement, text)
10    return text
11
12async def on_session_end(ctx: JobContext) -> None:
13    report = ctx.make_session_report()
14    data = report.to_dict()
15
16    # Redact PII from conversation history
17    for item in data.get('history', []):
18        if 'content' in item:
19            item['content'] = redact_pii(item['content'])
20
21    # Save to your own storage
22    save_to_s3(data)  # or wherever you need it
23
24@server.rtc_session(agent_name="my-agent", on_session_end=on_session_end)
25async def entrypoint(ctx: JobContext):
26    session = AgentSession()
27    # ...

Which approach to use

Scenario	Approach
Prevent PII in agent speech	`llm_node` override with regex patterns
Redacted transcripts for analytics	Custom export with `on_session_end` callback
Both real-time and export redaction	Combine both approaches

For compliance requirements, regex patterns can miss edge cases. Consider using a dedicated PII detection service for production systems.

Redacting PII from agent logs and transcripts

Real-time redaction with `llm_node`

Redacting transcripts for export

Which approach to use

Agents overview

Quickstart guide

Agent models

Building multi-agent architectures with LiveKit agents

Can you increase agent deployment limits?

Real-time redaction with llm_node

Redacting transcripts for export

Which approach to use

Read related documentation

Agents overview

Quickstart guide

Agent models

Find more Agents guides

Building multi-agent architectures with LiveKit agents

Can you increase agent deployment limits?

Real-time redaction with `llm_node`