Data, prompts, and logs
LLM applications create new data paths: prompts may contain PII, commercial secrets, or regulated content; logs and traces replicate that data across systems; retrieval corpora and fine-tuning sets amplify impact if mishandled. This page ties privacy and security engineering to the OWASP LLM items LLM02 (Sensitive information disclosure) and LLM04 (Data and model poisoning).
Data classes to track
| Class | Examples | Typical risk |
|---|---|---|
| End-user content | Chat messages, uploads | Accidental training use; over-retention; subpoena scope. |
| System and developer prompts | Hidden instructions | Leakage via model output or logs (LLM07). |
| Retrieval chunks | Wiki, tickets, code | Over-broad retrieval exposes wrong tenant data; poisoning. |
| Tool payloads | HTTP bodies, SQL | Logged in plaintext; replay; SSRF exfiltration. |
| Model outputs | Answers shown to users | Cached or echoed into logs; used downstream without validation (LLM05). |
Minimization and purpose limitation
- Collect only what the feature needs — Avoid logging full prompts in production if a hash or truncated form suffices for debugging.
- Separate environments — Staging prompts should not sit in the same retention bucket as production without policy.
- Tenant isolation — Retrieval and tool credentials must be scoped per tenant; never trust the model to pick the tenant ID.
Retention and deletion
Define retention periods for prompts, completions, and traces aligned with legal and contractual requirements. Support user deletion and export where regulations apply. If logs are shipped to a SIEM, map the same rules or redact fields at ingest.
Redaction and safe logging
- Structured redaction — Strip patterns (API keys, credit cards) before write to log stores where feasible.
- Sampling — Full request logging for 100% of traffic is often unnecessary for LLM APIs; consider sampled debug tiers.
- Access control — Restrict who can query prompt logs; they are high-sensitivity.
Poisoning and integrity
For RAG and fine-tuning:
- Control who can add documents to corpora; version and scan uploads.
- For third-party datasets, record provenance and run integrity checks before training or indexing.
Operational overlap with AppSec
Secrets in prompts are still secrets: they can leak via error messages, support tickets, and client-side logs. Pair this page with n8n security and MCP security for integration-heavy stacks.
References
- OWASP GenAI — Sensitive information disclosure (LLM02) and Data and model poisoning (LLM04) articles
- NIST AI RMF — Govern/Map/Measure/Manage for organizational alignment