Skip to main content

Agents overview

An agent typically runs a loop: plan → call tools → read results → repeat. Security issues compound: small mistakes in tool choice or arguments have side effects, and prompt injection (untrusted content steering the model) can hijack the loop.

Industry taxonomies such as the OWASP Top 10 for LLM Applications (2025) label related failures (excessive agency, improper output handling, sensitive disclosure). Use AI Security — Risk landscape for the full list; the articles here focus on what to implement.

Core ideas

  • Trust boundary — Anything the model reads (web, files, tools) is untrusted unless you have a cryptographic or procedural guarantee.
  • Least privilege — Fewer, narrower tools beat a general-purpose shell or “one HTTP node that can reach anything.”
  • Human gates — Required for irreversible or regulated actions; automation should fail closed when uncertain.
  • Observability — Log tool invocations with correlation IDs so incidents are reconstructible (data/logging posture).

Articles

TopicArticle
Tool designTool use safely
OrchestrationMulti-agent
ApprovalsHuman-in-the-loop
External serversMCP security
ExecutionSandboxing