Skip to main content

Data, prompts, and logs

LLM applications create new data paths: prompts may contain PII, commercial secrets, or regulated content; logs and traces replicate that data across systems; retrieval corpora and fine-tuning sets amplify impact if mishandled. This page ties privacy and security engineering to the OWASP LLM items LLM02 (Sensitive information disclosure) and LLM04 (Data and model poisoning).

Data classes to track

ClassExamplesTypical risk
End-user contentChat messages, uploadsAccidental training use; over-retention; subpoena scope.
System and developer promptsHidden instructionsLeakage via model output or logs (LLM07).
Retrieval chunksWiki, tickets, codeOver-broad retrieval exposes wrong tenant data; poisoning.
Tool payloadsHTTP bodies, SQLLogged in plaintext; replay; SSRF exfiltration.
Model outputsAnswers shown to usersCached or echoed into logs; used downstream without validation (LLM05).

Minimization and purpose limitation

  • Collect only what the feature needs — Avoid logging full prompts in production if a hash or truncated form suffices for debugging.
  • Separate environments — Staging prompts should not sit in the same retention bucket as production without policy.
  • Tenant isolation — Retrieval and tool credentials must be scoped per tenant; never trust the model to pick the tenant ID.

Retention and deletion

Define retention periods for prompts, completions, and traces aligned with legal and contractual requirements. Support user deletion and export where regulations apply. If logs are shipped to a SIEM, map the same rules or redact fields at ingest.

Redaction and safe logging

  • Structured redaction — Strip patterns (API keys, credit cards) before write to log stores where feasible.
  • Sampling — Full request logging for 100% of traffic is often unnecessary for LLM APIs; consider sampled debug tiers.
  • Access control — Restrict who can query prompt logs; they are high-sensitivity.

Poisoning and integrity

For RAG and fine-tuning:

  • Control who can add documents to corpora; version and scan uploads.
  • For third-party datasets, record provenance and run integrity checks before training or indexing.

Operational overlap with AppSec

Secrets in prompts are still secrets: they can leak via error messages, support tickets, and client-side logs. Pair this page with n8n security and MCP security for integration-heavy stacks.

References