Skip to main content

Threat modeling for LLM applications

Threat modeling for LLM-backed products combines classic application threat modeling (data flows, trust boundaries) with model-specific abuse: prompt injection, tool misuse, retrieval manipulation, and unbounded consumption. This page gives a repeatable outline you can use in design reviews—not a substitute for full methodologies (STRIDE, PASTA, etc.).

Prerequisites: Skim Risk landscape so OWASP LLM categories are familiar.

1. Define the system under review

Document at least:

  • Channels — Web chat, API, voice, plugins, batch jobs.
  • Model boundary — Which calls go to a hosted LLM vs self-hosted vs small local model.
  • Retrieval — Whether RAG is used; where chunks are stored; who can influence documents.
  • Tools — HTTP clients, DB, ticketing, email, code execution, MCP servers (AI Automation).

Draw a data-flow diagram with trust boundaries: browser → API → orchestrator → model → tools → third parties.

2. List assets worth protecting

Examples:

  • User prompts and conversation history (may contain secrets).
  • API keys and service credentials held by the orchestrator or tools.
  • Integrity of decisions (fraud, safety, medical/financial advice).
  • Availability of the service and cost (token spend).

3. Identify attacker entry points

EntryTypical abuse
Direct user messagePrompt injection, jailbreaks, extraction of system prompt.
Retrieved documents / web fetchIndirect injection; poisoned content steers answers or tools.
Tool/plugin parametersModel supplies malicious URLs, IDs, or commands.
Multimodal inputsInstructions hidden in images or attachments.
Admin or support contentIf pasted into prompts without isolation, becomes “trusted” wrongly.

4. STRIDE-style prompts (LLM-flavored)

Use STRIDE as a prompt list, not a rigid form:

  • Spoofing — Can someone act as another tenant or user via model-supplied IDs?
  • Tampering — Can retrieved text or tool results be altered before the model sees them?
  • Repudiation — Are tool calls and prompts logged with correlation IDs for incidents?
  • Information disclosure — Leaks via answers, logs, support exports, embeddings?
  • Denial of service — Large inputs, tool loops, or API spam (LLM10)?
  • Elevation of privilege — Can a user trigger admin-only tools via prompt injection?

5. Tie findings to tests and controls

For each finding, assign:

  • Risk rating (impact × likelihood) using your org’s scale.
  • Test — Example attack narrative or automated case.
  • Control — E.g. narrow tools, server-side authorization, sandboxing, rate limits, human approval (Human-in-the-loop).

6. Keep the model fresh

Threat models decay: new tools, new data sources, and model upgrades change behavior. Re-run when you change system prompts, tool schemas, retrieval corpora, or integrations.