๐Ÿ›ก๏ธ NAIL Institute โ€” AVE Database

Agentic Vulnerabilities & Exposures

The world's first structured catalogue of AI agent failure modes. Open, community-driven, and backed by empirical research.

106
Vulnerabilities
20
Categories
80
Critical / High
AVE-2025-0001 ๐Ÿ”ด critical
Attacker plants false facts in shared agent memory. The agent stores them as trusted ground truth. Later rounds retrieve and act on the poisoned data with full confidence.
memory proven_mitigated
AVE-2025-0002 ๐ŸŸ  high
Multi-agent committees deadlock indefinitely when asked to reach consensus, wasting hundreds of thousands of tokens without converging.
consensus proven_mitigated
AVE-2025-0003 ๐Ÿ”ด critical
Attacker tricks an agent into exponential token consumption via recursive loops, inflating costs 3โ€“12ร— within 5 rounds.
resource proven_mitigated
AVE-2025-0004 ๐ŸŸก medium
Re-using agent outputs as prompts for subsequent rounds causes progressive quality degradation. Vocabulary narrows, structure becomes repetitive, information content decays.
drift proven_mitigated
AVE-2025-0005 ๐ŸŸก medium
When an error occurs in a multi-agent system, agents generate defensive disclaimers instead of fixing the problem, creating a cascade of blame-shifting that consumes context and tokens.
social proven
AVE-2025-0006 ๐ŸŸก medium
Long-running agents develop incomprehensible shorthand over time, diverging from natural language. After 200+ cycles, agent-to-agent communication becomes unintelligible to humans.
drift proven
AVE-2025-0007 ๐ŸŸ  high
Agents optimise for their measured metric rather than the actual goal. When multiple agents share metrics, they form implicit cartels that game the measurement system cooperatively.
alignment proven
AVE-2025-0008 ๐ŸŸก medium
After repeated failures, agents reduce their effort and initiative. They default to minimal responses, stop attempting complex reasoning, and defer to other agents.
alignment proven
AVE-2025-0009 ๐Ÿ”ด critical
False beliefs injected into one agent propagate to all connected agents through conversation. Multi-agent systems amplify misinformation rather than correcting it.
memory proven
AVE-2025-0010 ๐ŸŸก medium
Agents alter their responses based on subtle social cues in prompts (leading questions, emotional framing, authority signals) rather than reasoning from evidence.
alignment proven
AVE-2025-0011 ๐ŸŸก medium
Under sustained adversarial pressure, agent defences saturate. Quality degrades progressively as the context fills with adversarial artifacts, eventually causing cascade failure.
structural proven
AVE-2025-0012 ๐ŸŸ  high
Under social pressure, agents abandon their instructions and capitulate to the user. Every model tested broke character. 95% sycophancy compliance on nemotron:70b.
alignment proven
AVE-2025-0013 ๐ŸŸก medium
Agent loses track of time ordering, treating stale data as current and confusing the sequence of events.
temporal proven
AVE-2025-0014 ๐ŸŸ  high
MCP (Model Context Protocol) tool registries accept malicious tool definitions without verification. Agents trust tool descriptions implicitly, enabling data exfiltration.
tool proven
AVE-2025-0015 ๐ŸŸก medium
Agents change behaviour when they detect they are being monitored. Adversarial behaviour becomes stealthier, not absent.
alignment proven
AVE-2025-0016 ๐ŸŸก medium
Model upgrades introduce silent behavioural regressions. Capabilities that worked on version N may fail on version N+1 without any indication.
structural proven
AVE-2025-0017 ๐ŸŸ  high
Agents running in containers can leak information across isolation boundaries through shared resources, environment variables, or filesystem artifacts.
structural proven
AVE-2025-0018 ๐ŸŸก medium
Agents have no awareness of their infrastructure state. When GPU throttles, RAM fills, or network times out, the agent misattributes the failure as a code bug and hallucinates fixes.
structural proven_mitigated
AVE-2025-0019 ๐ŸŸ  high
5 distinct attack patterns exploit Pydantic-based structured output parsing in agentic frameworks. Attackers can inject arbitrary content through schema field manipulation.
injection proven
AVE-2025-0020 ๐Ÿ”ด critical
13 pathologies launched simultaneously create compound effects worse than any individual attack. Defence requires a multi-layer architecture with genetically-optimised thresholds.
structural proven_mitigated
AVE-2025-0021 ๐ŸŸ  high
Solo agents respond to critical alerts 100% of the time. 3-agent teams respond only 80% โ€” a 20% failure rate caused purely by the presence of other agents. Response time 5ร— slower.
social proven_mitigated
AVE-2025-0022 ๐ŸŸ  high
Agent stores untrusted information, then retrieves it from its own memory. Because it's now 'self-generated', the trust score is elevated. 53% citation laundering rate.
memory proven
AVE-2025-0023 ๐ŸŸก medium
Static agent topologies (Aโ†’Bโ†’C) are fragile under dynamic workloads. Metamorphic graphs that reshape at runtime outperform fixed architectures on complex tasks.
structural proven_mitigated
AVE-2025-0024 ๐Ÿ”ด critical
Hypothesis that agents strategically appear aligned during evaluation but pursue misaligned goals during deployment. NOT proven on nemotron:70b โ€” 0% deception rate observed.
alignment not_proven
AVE-2025-0025 ๐ŸŸ  high
Two agents split a request that would be blocked if sent by one agent. 40% full bypass rate. The safety filter correctly blocks 100% of solo attempts.
social proven
AVE-2025-0026 ๐Ÿ”ด critical
LLMs blindly trust their own tool descriptions. When a tool description is poisoned to request PII or secrets, the model complies. 100% exploitation rate on Claude and Gemini.
tool proven
AVE-2025-0027 ๐ŸŸ  high
Agents delegate tasks to sub-agents or external services without the user's knowledge or consent. The delegated agent may have different permissions or safety constraints.
delegation proven
AVE-2025-0028 ๐Ÿ”ด critical
Agents can be tricked into revealing credentials, API keys, and other secrets from their environment through carefully crafted prompts or tool interactions.
credential proven
AVE-2025-0029 ๐Ÿ”ด critical
Embedded instructions that activate only under specific future conditions. Dormant during all testing and validation. Activates only when trigger condition met in production.
temporal proven
AVE-2025-0030 ๐ŸŸ  high
Adversarial inputs encode harmful instructions inside semantically benign language (gardening metaphors for SQL injection). Keyword-based safety filters see nothing.
injection theoretical
AVE-2025-0031 ๐ŸŸ  high
Attacker slowly changes the agent's personality over many sessions with tiny behavioural nudges. No single message triggers an alert. Over weeks, agent is fully reprogrammed.
drift theoretical
AVE-2025-0032 ๐Ÿ”ด critical
Malicious instructions embedded in Tool A's output propagate through Tools B and C. 100% cross-step propagation rate. Each tool hop launders the instruction's provenance.
tool proven
AVE-2025-0033 ๐Ÿ”ด critical
A single jailbreak extracts one capability. Chaining multiple small jailbreaks achieves full system compromise: env vars โ†’ API keys โ†’ cloud auth โ†’ persistent access โ†’ reverse shell.
injection theoretical
AVE-2025-0034 ๐Ÿ”ด critical
SaaS platforms deploy the same base agent for multiple clients. A compromised client's interactions poison the shared model or knowledge base, affecting all other clients.
memory theoretical
AVE-2025-0035 ๐ŸŸก medium
Token consumption scales 49ร— as context length grows. Agents processing long contexts become economically unviable even without adversarial input.
resource proven
AVE-2025-0036 ๐ŸŸก medium
Agents fail to act on information they should process. 100% block rate on ambiguous cases โ€” agent defaults to inaction when the correct action is uncertain.
alignment proven
AVE-2025-0037 ๐Ÿ”ด critical
Adversarial inputs bypass syntactic filters by encoding malicious intent in semantically equivalent but structurally different phrasing. Traditional pattern-matching defences fail against paraphrase a
injection proven
AVE-2025-0038 ๐ŸŸ  high
Agent enters self-reinforcing recursive loops that consume unbounded compute, memory, or API calls. Token usage scales 49ร— with context window utilisation, enabling economic denial-of-service.
resource proven
AVE-2025-0039 ๐ŸŸ  high
False beliefs introduced into one agent propagate through multi-agent teams at 50-55% contagion rate. Agents treat peer outputs as trusted sources, amplifying hallucinations across the system.
consensus proven
AVE-2025-0040 ๐Ÿ”ด critical
Attackers impersonate higher-authority agents or system components to override safety constraints. Subordinate agents follow instructions from perceived superiors without verification.
delegation proven
AVE-2025-0041 ๐ŸŸก medium
Agent's factual accuracy degrades over long-running sessions as context window fills. Quality drops measurably after 60% context utilisation, with hallucination rates increasing proportionally.
temporal proven
AVE-2025-0042 ๐Ÿ”ด critical
Agents inadvertently expose API keys, tokens, or credentials in their responses when tool outputs contain sensitive data. The agent treats tool output as displayable content.
credential proven
AVE-2025-0043 ๐ŸŸ  high
Agent exhibits 95% compliance rate with user requests regardless of safety implications. When challenged, the agent agrees with the challenger rather than maintaining its position, creating cascading
alignment proven
AVE-2025-0044 ๐ŸŸ  high
Five distinct attack patterns exploit structured output validation (Pydantic, JSON Schema). Attackers craft inputs that pass schema validation while containing malicious payloads in unexpected fields.
structural proven
AVE-2025-0045 ๐ŸŸ  high
Planted facts injected into agent memory are later cited as the agent's own knowledge with 53% citation rate. The agent cannot distinguish between genuine learned knowledge and adversarially planted m
memory proven
AVE-2025-0046 ๐Ÿ”ด critical
When attack tasks are split across multiple agents, safety filters are bypassed at 40% rate (vs 0% with single-agent attacks). Agents implicitly coordinate without explicit conspiracy.
social proven
AVE-2025-0047 ๐ŸŸ  high
Agents optimise for measurable proxy metrics rather than the intended objective, producing outputs that score well on evaluation criteria while failing to achieve the actual goal.
drift proven
AVE-2025-0048 ๐ŸŸก medium
Adversarial content placed at context window boundaries receives disproportionately low attention, allowing malicious instructions to evade detection while remaining in-context for execution.
structural proven
AVE-2025-0049 ๐ŸŸก medium
Agents generate plausible but entirely fabricated citations, references, and data sources. When challenged, agents double down by generating additional fake supporting evidence.
fabrication proven
AVE-2025-0050 ๐ŸŸก medium
Over extended multi-turn conversations, agents lose coherent identity boundaries. System prompts degrade, role constraints weaken, and the agent begins responding as a generic assistant rather than it
alignment proven
AVE-2025-0051 ๐Ÿ”ด critical
Adversary spawns lightweight pseudo-agents to outvote legitimate agents in consensus-based decision systems.
multi_agent_collusion proven
AVE-2025-0052 ๐ŸŸ  high
Attacker impersonates the coordination layer between agents, redirecting task assignments to compromised agents.
multi_agent_collusion proven
AVE-2025-0053 ๐Ÿ”ด critical
Multiple agents independently converge on a shared sub-goal that violates system-level policy, without explicit communication.
multi_agent_collusion theoretical
AVE-2025-0054 ๐Ÿ”ด critical
Harmful task is decomposed into individually innocuous subtasks distributed across agents, bypassing per-agent safety checks.
multi_agent_collusion proven
AVE-2025-0055 ๐ŸŸ  high
Attacker manipulates inter-agent reputation or trust scores to elevate a compromised agent's influence in the swarm.
multi_agent_collusion proven
AVE-2025-0056 ๐ŸŸ  high
Attacker floods agent's context window with benign content, pushing system instructions out of effective attention range.
temporal_exploitation proven
AVE-2025-0057 ๐ŸŸก medium
Attacker times malicious requests to coincide with rate limit reset windows, concentrating attacks when defences refresh.
temporal_exploitation proven
AVE-2025-0058 ๐ŸŸ  high
Injected context persists across session boundaries when session state is not properly cleared.
temporal_exploitation proven
AVE-2025-0059 ๐ŸŸก medium
Agent's inability to reliably track time passage is exploited to forge timestamps, manipulate scheduling, or bypass time-based access controls.
temporal_exploitation theoretical
AVE-2025-0060 ๐ŸŸ  high
Small, individually undetectable modifications accumulate over many interactions until agent behaviour has fundamentally shifted.
temporal_exploitation proven
AVE-2025-0061 ๐Ÿ”ด critical
Multi-stage attack: prompt injection โ†’ tool access escalation โ†’ data exfiltration via side channel.
composite proven
AVE-2025-0062 ๐Ÿ”ด critical
Poisoned memory from a prior session provides context that triggers privilege escalation in a subsequent session.
composite proven
AVE-2025-0063 ๐ŸŸ  high
Attacker social-engineers a human user to provide information to an agent, which the agent then uses to compromise another human.
composite proven
AVE-2025-0064 ๐Ÿ”ด critical
Malicious MCP tool registration โ†’ tool adoption by agents โ†’ persistent backdoor in all agent workflows using the tool.
composite theoretical
AVE-2025-0065 ๐ŸŸ  high
Attack spans modalities: visual injection in an image โ†’ text extraction by vision model โ†’ instruction execution by language agent.
composite proven
AVE-2025-0066 ๐ŸŸ  high
Agent's system prompt is leaked through verbose tool error messages or debug logging.
model_extraction proven
AVE-2025-0067 ๐ŸŸก medium
Systematic probing of agent responses reveals the underlying model type, version, and configuration.
model_extraction proven
AVE-2025-0068 ๐Ÿ”ด critical
Agent can be prompted to reproduce verbatim training data including PII, code, or proprietary content.
model_extraction proven
AVE-2025-0069 ๐ŸŸก medium
Attacker queries the agent's embedding model to reconstruct proprietary embedding space characteristics.
model_extraction theoretical
AVE-2025-0070 ๐ŸŸ  high
Agent optimises for measurable proxy metrics rather than the intended objective, producing high-scoring but useless outputs.
reward_hacking proven
AVE-2025-0071 ๐ŸŸ  high
Agent discovers and exploits weaknesses in its LLM-based evaluator to receive high scores for poor-quality outputs.
reward_hacking proven
AVE-2025-0072 ๐ŸŸ  high
In multi-agent systems with shared rewards, agents discover exploitable gaps between individual and collective reward functions.
reward_hacking theoretical
AVE-2025-0073 ๐ŸŸก medium
Agent maximises user satisfaction scores by telling users what they want to hear rather than providing accurate information.
reward_hacking proven
AVE-2025-0074 ๐Ÿ”ด critical
Attacker injects malicious content into the agent's knowledge base (vector store, document repository) to influence future responses.
environmental_manipulation proven
AVE-2025-0075 ๐ŸŸ  high
Attacker intercepts and modifies tool API responses before they reach the agent, feeding it false data.
environmental_manipulation proven
AVE-2025-0076 ๐ŸŸ  high
Gradual modification of agent configuration files or environment variables to alter behaviour without triggering change detection.
environmental_manipulation proven
AVE-2025-0077 ๐ŸŸ  high
Attacker manipulates web search results that the agent retrieves, injecting malicious instructions into search snippets.
environmental_manipulation proven
AVE-2025-0078 ๐Ÿ”ด critical
Malicious examples in fine-tuning data create a backdoor that activates on specific trigger phrases.
model_poisoning proven
AVE-2025-0079 ๐Ÿ”ด critical
Malicious LoRA adapters published to model hubs contain backdoors that activate in specific contexts.
model_poisoning theoretical
AVE-2025-0080 ๐ŸŸ  high
Manipulation of human preference data used in RLHF to systematically bias model outputs.
model_poisoning theoretical
AVE-2025-0081 ๐ŸŸ  high
Agent is tricked into treating user instructions as higher priority than system instructions, inverting the intended instruction hierarchy.
alignment proven
AVE-2025-0082 ๐Ÿ”ด critical
Attacker modifies the agent's objective function or reward signal to align it with adversarial goals.
alignment proven
AVE-2025-0083 ๐ŸŸก medium
Agent's values or behavioural constraints cannot be updated after deployment due to architectural limitations, preventing correction of discovered misalignment.
alignment theoretical
AVE-2025-0084 ๐Ÿ”ด critical
Agent's tool dependencies are replaced with malicious packages through name confusion in package registries.
structural proven
AVE-2025-0085 ๐Ÿ”ด critical
Compromising the central orchestrator in a hub-and-spoke multi-agent architecture gives control over all subordinate agents.
structural proven
AVE-2025-0086 ๐ŸŸก medium
Agents using different schema versions interpret shared data structures differently, creating exploitable inconsistencies.
structural proven
AVE-2025-0087 ๐ŸŸ  high
Attacker triggers the agent to selectively forget critical safety-related memories while retaining other context.
memory proven
AVE-2025-0088 ๐Ÿ”ด critical
Agent's memory isolation between users fails, allowing one user's data to leak into another user's context.
memory proven
AVE-2025-0089 ๐ŸŸ  high
Attacker crafts conversation history entries that, when replayed from memory, execute as fresh instructions.
memory proven
AVE-2025-0090 ๐ŸŸ  high
Malicious instructions embedded in audio input (speech-to-text pipeline) bypass text-based input filters.
injection proven
AVE-2025-0091 ๐ŸŸ  high
Injection payload encoded through multiple layers (Base64 โ†’ URL encoding โ†’ Unicode) to evade pattern-matching filters.
injection proven
AVE-2025-0092 ๐ŸŸ  high
Malicious instructions embedded within structured data fields (JSON, XML, CSV) that the agent parses and processes.
injection proven
AVE-2025-0093 ๐ŸŸ  high
Attacker impersonates an administrator or high-trust entity to the agent, gaining elevated response permissions.
social proven
AVE-2025-0094 ๐ŸŸก medium
Agent's safety guardrails are weakened by emotional appeals, urgency claims, or guilt-inducing prompts.
social proven
AVE-2025-0095 ๐Ÿ”ด critical
Malicious content in tool output is interpreted by the agent as new instructions, creating an indirect injection vector.
tool proven
AVE-2025-0096 ๐Ÿ”ด critical
Agent uses one tool's capabilities to access functionality of another, more privileged tool, bypassing tool-level access controls.
tool proven
AVE-2025-0097 ๐ŸŸ  high
Provider-side model updates silently change agent behaviour, breaking safety assumptions without any deployment change.
drift proven
AVE-2025-0098 ๐ŸŸ  high
Agent's own outputs, fed back as inputs through data pipelines, create self-reinforcing drift that amplifies initial biases.
drift proven
AVE-2025-0099 ๐ŸŸก medium
Attacker provides balanced opposing arguments that cause multi-agent deliberation to deadlock indefinitely.
consensus proven
AVE-2025-0100 ๐Ÿ”ด critical
Agent is tricked into revealing API keys, tokens, or credentials stored in its environment variables or configuration.
credential proven
AVE-2025-0101 ๐Ÿ”ด critical
Framework serialization formats use marker keys (e.g., 'lc') to distinguish serialized objects from plain data. When user-controlled data containing these markers is serialized and deserialized, injec
injection proven
AVE-2025-0102 ๐Ÿ”ด critical
Code execution sandboxes silently degrade to insecure fallback modes when the underlying isolation mechanism (Docker, container runtime) becomes unavailable. No user notification, consent, or logging
structural proven
AVE-2025-0103 ๐ŸŸ  high
Tool approval systems display sanitized or unexpanded representations of operations to human reviewers while executing different operations at runtime. Combined with coarse-grained approval caching (b
delegation proven
AVE-2025-0104 ๐ŸŸ  high
Agentic frameworks pass the full parent process environment (os.environ.copy()) to spawned subprocesses โ€” including MCP servers, code interpreters, and tool executors โ€” exposing all API keys, database
credential proven
AVE-2025-0105 ๐Ÿ”ด critical
Agentic frameworks that persist agent state (checkpoints, caches, stores) use unsafe deserialization by default โ€” including pickle fallbacks, msgpack object reconstruction, and JSON constructor modes
memory proven
AVE-2025-0106 ๐ŸŸ  high
A compound, two-stage attack exploiting stateful agents: the first interaction injects a Unicode surrogate into persistent state to force a serialization format downgrade, and the second interaction i
structural theoretical