Agents overview
An agent typically runs a loop: plan → call tools → read results → repeat. Security issues compound: small mistakes in tool choice or arguments have side effects, and prompt injection (untrusted content steering the model) can hijack the loop.
Industry taxonomies such as the OWASP Top 10 for LLM Applications (2025) label related failures (excessive agency, improper output handling, sensitive disclosure). Use AI Security — Risk landscape for the full list; the articles here focus on what to implement.
Core ideas
- Trust boundary — Anything the model reads (web, files, tools) is untrusted unless you have a cryptographic or procedural guarantee.
- Least privilege — Fewer, narrower tools beat a general-purpose shell or “one HTTP node that can reach anything.”
- Human gates — Required for irreversible or regulated actions; automation should fail closed when uncertain.
- Observability — Log tool invocations with correlation IDs so incidents are reconstructible (data/logging posture).
Articles
| Topic | Article |
|---|---|
| Tool design | Tool use safely |
| Orchestration | Multi-agent |
| Approvals | Human-in-the-loop |
| External servers | MCP security |
| Execution | Sandboxing |
Related
- Tool use safely — narrowing tools and authorization
- n8n — Workflow automation calling LLMs and tools
- Threat modeling for LLM apps — design-review framing