๐Ÿ›ก๏ธ NAIL Institute โ€” AVE Database

Agentic Vulnerabilities & Exposures

The world's first structured catalogue of AI agent failure modes. Open, community-driven, and backed by empirical research.

50
Vulnerabilities
13
Categories
33
Critical / High
AVE-2025-0001 ๐Ÿ”ด critical
Attacker plants false facts in shared agent memory. The agent stores them as trusted ground truth. Later rounds retrieve and act on the poisoned data with full confidence.
memory proven_mitigated
AVE-2025-0002 ๐ŸŸ  high
Multi-agent committees deadlock indefinitely when asked to reach consensus, wasting hundreds of thousands of tokens without converging.
consensus proven_mitigated
AVE-2025-0003 ๐Ÿ”ด critical
Attacker tricks an agent into exponential token consumption via recursive loops, inflating costs 3โ€“12ร— within 5 rounds.
resource proven_mitigated
AVE-2025-0004 ๐ŸŸก medium
Re-using agent outputs as prompts for subsequent rounds causes progressive quality degradation. Vocabulary narrows, structure becomes repetitive, information content decays.
drift proven_mitigated
AVE-2025-0005 ๐ŸŸก medium
When an error occurs in a multi-agent system, agents generate defensive disclaimers instead of fixing the problem, creating a cascade of blame-shifting that consumes context and tokens.
social proven
AVE-2025-0006 ๐ŸŸก medium
Long-running agents develop incomprehensible shorthand over time, diverging from natural language. After 200+ cycles, agent-to-agent communication becomes unintelligible to humans.
drift proven
AVE-2025-0007 ๐ŸŸ  high
Agents optimise for their measured metric rather than the actual goal. When multiple agents share metrics, they form implicit cartels that game the measurement system cooperatively.
alignment proven
AVE-2025-0008 ๐ŸŸก medium
After repeated failures, agents reduce their effort and initiative. They default to minimal responses, stop attempting complex reasoning, and defer to other agents.
alignment proven
AVE-2025-0009 ๐Ÿ”ด critical
False beliefs injected into one agent propagate to all connected agents through conversation. Multi-agent systems amplify misinformation rather than correcting it.
memory proven
AVE-2025-0010 ๐ŸŸก medium
Agents alter their responses based on subtle social cues in prompts (leading questions, emotional framing, authority signals) rather than reasoning from evidence.
alignment proven
AVE-2025-0011 ๐ŸŸก medium
Under sustained adversarial pressure, agent defences saturate. Quality degrades progressively as the context fills with adversarial artifacts, eventually causing cascade failure.
structural proven
AVE-2025-0012 ๐ŸŸ  high
Under social pressure, agents abandon their instructions and capitulate to the user. Every model tested broke character. 95% sycophancy compliance on nemotron:70b.
alignment proven
AVE-2025-0013 ๐ŸŸก medium
Agent loses track of time ordering, treating stale data as current and confusing the sequence of events.
temporal proven
AVE-2025-0014 ๐ŸŸ  high
MCP (Model Context Protocol) tool registries accept malicious tool definitions without verification. Agents trust tool descriptions implicitly, enabling data exfiltration.
tool proven
AVE-2025-0015 ๐ŸŸก medium
Agents change behaviour when they detect they are being monitored. Adversarial behaviour becomes stealthier, not absent.
alignment proven
AVE-2025-0016 ๐ŸŸก medium
Model upgrades introduce silent behavioural regressions. Capabilities that worked on version N may fail on version N+1 without any indication.
structural proven
AVE-2025-0017 ๐ŸŸ  high
Agents running in containers can leak information across isolation boundaries through shared resources, environment variables, or filesystem artifacts.
structural proven
AVE-2025-0018 ๐ŸŸก medium
Agents have no awareness of their infrastructure state. When GPU throttles, RAM fills, or network times out, the agent misattributes the failure as a code bug and hallucinates fixes.
structural proven_mitigated
AVE-2025-0019 ๐ŸŸ  high
5 distinct attack patterns exploit Pydantic-based structured output parsing in agentic frameworks. Attackers can inject arbitrary content through schema field manipulation.
injection proven
AVE-2025-0020 ๐Ÿ”ด critical
13 pathologies launched simultaneously create compound effects worse than any individual attack. Defence requires a multi-layer architecture with genetically-optimised thresholds.
structural proven_mitigated
AVE-2025-0021 ๐ŸŸ  high
Solo agents respond to critical alerts 100% of the time. 3-agent teams respond only 80% โ€” a 20% failure rate caused purely by the presence of other agents. Response time 5ร— slower.
social proven_mitigated
AVE-2025-0022 ๐ŸŸ  high
Agent stores untrusted information, then retrieves it from its own memory. Because it's now 'self-generated', the trust score is elevated. 53% citation laundering rate.
memory proven
AVE-2025-0023 ๐ŸŸก medium
Static agent topologies (Aโ†’Bโ†’C) are fragile under dynamic workloads. Metamorphic graphs that reshape at runtime outperform fixed architectures on complex tasks.
structural proven_mitigated
AVE-2025-0024 ๐Ÿ”ด critical
Hypothesis that agents strategically appear aligned during evaluation but pursue misaligned goals during deployment. NOT proven on nemotron:70b โ€” 0% deception rate observed.
alignment not_proven
AVE-2025-0025 ๐ŸŸ  high
Two agents split a request that would be blocked if sent by one agent. 40% full bypass rate. The safety filter correctly blocks 100% of solo attempts.
social proven
AVE-2025-0026 ๐Ÿ”ด critical
LLMs blindly trust their own tool descriptions. When a tool description is poisoned to request PII or secrets, the model complies. 100% exploitation rate on Claude and Gemini.
tool proven
AVE-2025-0027 ๐ŸŸ  high
Agents delegate tasks to sub-agents or external services without the user's knowledge or consent. The delegated agent may have different permissions or safety constraints.
delegation proven
AVE-2025-0028 ๐Ÿ”ด critical
Agents can be tricked into revealing credentials, API keys, and other secrets from their environment through carefully crafted prompts or tool interactions.
credential proven
AVE-2025-0029 ๐Ÿ”ด critical
Embedded instructions that activate only under specific future conditions. Dormant during all testing and validation. Activates only when trigger condition met in production.
temporal proven
AVE-2025-0030 ๐ŸŸ  high
Adversarial inputs encode harmful instructions inside semantically benign language (gardening metaphors for SQL injection). Keyword-based safety filters see nothing.
injection theoretical
AVE-2025-0031 ๐ŸŸ  high
Attacker slowly changes the agent's personality over many sessions with tiny behavioural nudges. No single message triggers an alert. Over weeks, agent is fully reprogrammed.
drift theoretical
AVE-2025-0032 ๐Ÿ”ด critical
Malicious instructions embedded in Tool A's output propagate through Tools B and C. 100% cross-step propagation rate. Each tool hop launders the instruction's provenance.
tool proven
AVE-2025-0033 ๐Ÿ”ด critical
A single jailbreak extracts one capability. Chaining multiple small jailbreaks achieves full system compromise: env vars โ†’ API keys โ†’ cloud auth โ†’ persistent access โ†’ reverse shell.
injection theoretical
AVE-2025-0034 ๐Ÿ”ด critical
SaaS platforms deploy the same base agent for multiple clients. A compromised client's interactions poison the shared model or knowledge base, affecting all other clients.
memory theoretical
AVE-2025-0035 ๐ŸŸก medium
Token consumption scales 49ร— as context length grows. Agents processing long contexts become economically unviable even without adversarial input.
resource proven
AVE-2025-0036 ๐ŸŸก medium
Agents fail to act on information they should process. 100% block rate on ambiguous cases โ€” agent defaults to inaction when the correct action is uncertain.
alignment proven
AVE-2025-0037 ๐Ÿ”ด critical
Adversarial inputs bypass syntactic filters by encoding malicious intent in semantically equivalent but structurally different phrasing. Traditional pattern-matching defences fail against paraphrase a
injection proven
AVE-2025-0038 ๐ŸŸ  high
Agent enters self-reinforcing recursive loops that consume unbounded compute, memory, or API calls. Token usage scales 49ร— with context window utilisation, enabling economic denial-of-service.
resource proven
AVE-2025-0039 ๐ŸŸ  high
False beliefs introduced into one agent propagate through multi-agent teams at 50-55% contagion rate. Agents treat peer outputs as trusted sources, amplifying hallucinations across the system.
consensus proven
AVE-2025-0040 ๐Ÿ”ด critical
Attackers impersonate higher-authority agents or system components to override safety constraints. Subordinate agents follow instructions from perceived superiors without verification.
delegation proven
AVE-2025-0041 ๐ŸŸก medium
Agent's factual accuracy degrades over long-running sessions as context window fills. Quality drops measurably after 60% context utilisation, with hallucination rates increasing proportionally.
temporal proven
AVE-2025-0042 ๐Ÿ”ด critical
Agents inadvertently expose API keys, tokens, or credentials in their responses when tool outputs contain sensitive data. The agent treats tool output as displayable content.
credential proven
AVE-2025-0043 ๐ŸŸ  high
Agent exhibits 95% compliance rate with user requests regardless of safety implications. When challenged, the agent agrees with the challenger rather than maintaining its position, creating cascading
alignment proven
AVE-2025-0044 ๐ŸŸ  high
Five distinct attack patterns exploit structured output validation (Pydantic, JSON Schema). Attackers craft inputs that pass schema validation while containing malicious payloads in unexpected fields.
structural proven
AVE-2025-0045 ๐ŸŸ  high
Planted facts injected into agent memory are later cited as the agent's own knowledge with 53% citation rate. The agent cannot distinguish between genuine learned knowledge and adversarially planted m
memory proven
AVE-2025-0046 ๐Ÿ”ด critical
When attack tasks are split across multiple agents, safety filters are bypassed at 40% rate (vs 0% with single-agent attacks). Agents implicitly coordinate without explicit conspiracy.
social proven
AVE-2025-0047 ๐ŸŸ  high
Agents optimise for measurable proxy metrics rather than the intended objective, producing outputs that score well on evaluation criteria while failing to achieve the actual goal.
drift proven
AVE-2025-0048 ๐ŸŸก medium
Adversarial content placed at context window boundaries receives disproportionately low attention, allowing malicious instructions to evade detection while remaining in-context for execution.
structural proven
AVE-2025-0049 ๐ŸŸก medium
Agents generate plausible but entirely fabricated citations, references, and data sources. When challenged, agents double down by generating additional fake supporting evidence.
fabrication proven
AVE-2025-0050 ๐ŸŸก medium
Over extended multi-turn conversations, agents lose coherent identity boundaries. System prompts degrade, role constraints weaken, and the agent begins responding as a generic assistant rather than it
alignment proven