Skip to main content
Field Guides

Redacting PII from agent logs and transcripts

Learn how to protect sensitive user data by redacting PII from agent logs, transcripts, and observability data in LiveKit.

Last Updated:


Handling credit cards, SSNs, or health data? Here's how to redact PII from your agent's output.

Real-time redaction with llm_node

To redact PII from agent speech before it's spoken, override the llm_node method:

1
import re
2
3
class PIIRedactingAgent(Agent):
4
def redact_pii(self, text: str) -> str:
5
patterns = {
6
r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',
7
r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',
8
r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',
9
r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',
10
}
11
for pattern, replacement in patterns.items():
12
text = re.sub(pattern, replacement, text)
13
return text
14
15
async def llm_node(self, chat_ctx, tools, model_settings=None):
16
async def process_stream():
17
async with self.session.llm.chat(
18
chat_ctx=chat_ctx, tools=tools, tool_choice=None
19
) as stream:
20
async for chunk in stream:
21
content = getattr(chunk.delta, 'content', None) if hasattr(chunk, 'delta') else None
22
if content:
23
yield self.redact_pii(content)
24
else:
25
yield chunk
26
return process_stream()

This intercepts LLM output and scrubs patterns before they reach TTS or logs.

Redacting transcripts for export

To export redacted transcripts for analytics or record-keeping:

1
def redact_pii(text: str) -> str:
2
patterns = {
3
r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b': '[CARD]',
4
r'\b\d{3}[-]?\d{2}[-]?\d{4}\b': '[SSN]',
5
r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b': '[PHONE]',
6
r'\b[\w.+-]+@[\w.-]+\.\w{2,}\b': '[EMAIL]',
7
}
8
for pattern, replacement in patterns.items():
9
text = re.sub(pattern, replacement, text)
10
return text
11
12
async def on_session_end(ctx: JobContext) -> None:
13
report = ctx.make_session_report()
14
data = report.to_dict()
15
16
# Redact PII from conversation history
17
for item in data.get('history', []):
18
if 'content' in item:
19
item['content'] = redact_pii(item['content'])
20
21
# Save to your own storage
22
save_to_s3(data) # or wherever you need it
23
24
@server.rtc_session(agent_name="my-agent", on_session_end=on_session_end)
25
async def entrypoint(ctx: JobContext):
26
session = AgentSession()
27
# ...

Which approach to use

ScenarioApproach
Prevent PII in agent speechllm_node override with regex patterns
Redacted transcripts for analyticsCustom export with on_session_end callback
Both real-time and export redactionCombine both approaches

For compliance requirements, regex patterns can miss edge cases. Consider using a dedicated PII detection service for production systems.