Threat modeling for LLM applications

Threat modeling for LLM-backed products combines classic application threat modeling (data flows, trust boundaries) with model-specific abuse: prompt injection, tool misuse, retrieval manipulation, and unbounded consumption. This page gives a repeatable outline you can use in design reviews, not a substitute for full methodologies (STRIDE, PASTA, etc.).

Prerequisites: Skim Risk landscape so OWASP LLM categories are familiar.

1. Define the system under review

Document at least:

Channels: Web chat, API, voice, plugins, batch jobs.
Model boundary: Which calls go to a hosted LLM vs self-hosted vs small local model.
Retrieval: Whether RAG is used; where chunks are stored; who can influence documents.
Tools: HTTP clients, DB, ticketing, email, code execution, MCP servers (AI Automation).

Draw a data-flow diagram with trust boundaries: browser → API → orchestrator → model → tools → third parties.

2. List assets worth protecting

Examples:

User prompts and conversation history (may contain secrets).
API keys and service credentials held by the orchestrator or tools.
Integrity of decisions (fraud, safety, medical/financial advice).
Availability of the service and cost (token spend).

3. Identify attacker entry points

Entry	Typical abuse
Direct user message	Prompt injection, jailbreaks, extraction of system prompt.
Retrieved documents / web fetch	Indirect injection; poisoned content steers answers or tools.
Tool/plugin parameters	Model supplies malicious URLs, IDs, or commands.
Multimodal inputs	Instructions hidden in images or attachments.
Admin or support content	If pasted into prompts without isolation, becomes “trusted” wrongly.

4. STRIDE-style prompts (LLM-flavored)

Use STRIDE as a prompt list, not a rigid form:

Spoofing: Can someone act as another tenant or user via model-supplied IDs?
Tampering: Can retrieved text or tool results be altered before the model sees them?
Repudiation: Are tool calls and prompts logged with correlation IDs for incidents?
Information disclosure: Leaks via answers, logs, support exports, embeddings?
Denial of service: Large inputs, tool loops, or API spam (LLM10)?
Elevation of privilege: Can a user trigger admin-only tools via prompt injection?

5. Tie findings to tests and controls

For each finding, assign:

Risk rating (impact × likelihood) using your org’s scale.
Test: Example attack narrative or automated case.
Control: E.g. narrow tools, server-side authorization, sandboxing, rate limits, human approval (Human-in-the-loop).

6. Keep the model fresh

Threat models decay: new tools, new data sources, and model upgrades change behavior. Re-run when you change system prompts, tool schemas, retrieval corpora, or integrations.

1. Define the system under review​

2. List assets worth protecting​

3. Identify attacker entry points​

4. STRIDE-style prompts (LLM-flavored)​

5. Tie findings to tests and controls​

6. Keep the model fresh​

Related​