Redacting PII from agent logs and transcripts
Learn how to protect sensitive user data by redacting PII from agent logs, transcripts, and observability data in LiveKit.
Last Updated:
Handling credit cards, SSNs, or health data? Here's how to redact PII from your agent's output.
Real-time redaction with llm_node
To redact PII from agent speech before it's spoken, override the llm_node method:
1import re23class PIIRedactingAgent(Agent):4def redact_pii(self, text: str) -> str:5patterns = {6r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',7r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',8r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',9r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',10}11for pattern, replacement in patterns.items():12text = re.sub(pattern, replacement, text)13return text1415async def llm_node(self, chat_ctx, tools, model_settings=None):16async def process_stream():17async with self.session.llm.chat(18chat_ctx=chat_ctx, tools=tools, tool_choice=None19) as stream:20async for chunk in stream:21content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else None22if content:23yield self.redact_pii(content)24else:25yield chunk26return process_stream()
This intercepts LLM output and scrubs patterns before they reach TTS or logs.
Redacting transcripts for export
To export redacted transcripts for analytics or record-keeping:
1def redact_pii(text: str) -> str:2patterns = {3r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',4r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',5r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',6r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',7}8for pattern, replacement in patterns.items():9text = re.sub(pattern, replacement, text)10return text1112async def on_session_end(ctx: JobContext) -> None:13report = ctx.make_session_report()14data = report.to_dict()1516# Redact PII from conversation history17for item in data.get('history', []):18if 'content' in item:19item['content'] = redact_pii(item['content'])2021# Save to your own storage22save_to_s3(data) # or wherever you need it2324@server.rtc_session(agent_name="my-agent", on_session_end=on_session_end)25async def entrypoint(ctx: JobContext):26session = AgentSession()27# ...
Which approach to use
| Scenario | Approach |
|---|---|
| Prevent PII in agent speech | llm_node override with regex patterns |
| Redacted transcripts for analytics | Custom export with on_session_end callback |
| Both real-time and export redaction | Combine both approaches |
For compliance requirements, regex patterns can miss edge cases. Consider using a dedicated PII detection service for production systems.
Read related documentation
Find more Agents guides
Building multi-agent architectures with LiveKit agents
Learn best practices for building multi-agent architectures including session state management, chat context handling, TaskGroup patterns, and dynamic per-client routing.
Can you increase agent deployment limits?
Understand the hard cap on agent deployments and how to build a multi-tenant agent that scales without provisioning more slots.