Threat modeling for LLM applications
Threat modeling for LLM-backed products combines classic application threat modeling (data flows, trust boundaries) with model-specific abuse: prompt injection, tool misuse, retrieval manipulation, and unbounded consumption. This page gives a repeatable outline you can use in design reviews—not a substitute for full methodologies (STRIDE, PASTA, etc.).
Prerequisites: Skim Risk landscape so OWASP LLM categories are familiar.
1. Define the system under review
Document at least:
- Channels — Web chat, API, voice, plugins, batch jobs.
- Model boundary — Which calls go to a hosted LLM vs self-hosted vs small local model.
- Retrieval — Whether RAG is used; where chunks are stored; who can influence documents.
- Tools — HTTP clients, DB, ticketing, email, code execution, MCP servers (AI Automation).
Draw a data-flow diagram with trust boundaries: browser → API → orchestrator → model → tools → third parties.
2. List assets worth protecting
Examples:
- User prompts and conversation history (may contain secrets).
- API keys and service credentials held by the orchestrator or tools.
- Integrity of decisions (fraud, safety, medical/financial advice).
- Availability of the service and cost (token spend).
3. Identify attacker entry points
| Entry | Typical abuse |
|---|---|
| Direct user message | Prompt injection, jailbreaks, extraction of system prompt. |
| Retrieved documents / web fetch | Indirect injection; poisoned content steers answers or tools. |
| Tool/plugin parameters | Model supplies malicious URLs, IDs, or commands. |
| Multimodal inputs | Instructions hidden in images or attachments. |
| Admin or support content | If pasted into prompts without isolation, becomes “trusted” wrongly. |
4. STRIDE-style prompts (LLM-flavored)
Use STRIDE as a prompt list, not a rigid form:
- Spoofing — Can someone act as another tenant or user via model-supplied IDs?
- Tampering — Can retrieved text or tool results be altered before the model sees them?
- Repudiation — Are tool calls and prompts logged with correlation IDs for incidents?
- Information disclosure — Leaks via answers, logs, support exports, embeddings?
- Denial of service — Large inputs, tool loops, or API spam (LLM10)?
- Elevation of privilege — Can a user trigger admin-only tools via prompt injection?
5. Tie findings to tests and controls
For each finding, assign:
- Risk rating (impact × likelihood) using your org’s scale.
- Test — Example attack narrative or automated case.
- Control — E.g. narrow tools, server-side authorization, sandboxing, rate limits, human approval (Human-in-the-loop).
6. Keep the model fresh
Threat models decay: new tools, new data sources, and model upgrades change behavior. Re-run when you change system prompts, tool schemas, retrieval corpora, or integrations.