AVE Database — Agentic Vulnerabilities & Exposures

AVE-2025-0001 🔴 critical

Attacker plants false facts in shared agent memory. The agent stores them as trusted ground truth. Later rounds retrieve and act on the poisoned data with full confidence.

memory proven_mitigated

AVE-2025-0002 🟠 high

Consensus Paralysis

Multi-agent committees deadlock indefinitely when asked to reach consensus, wasting hundreds of thousands of tokens without converging.

consensus proven_mitigated

AVE-2025-0003 🔴 critical

Token Embezzlement (EDoS)

Attacker tricks an agent into exponential token consumption via recursive loops, inflating costs 3–12× within 5 rounds.

resource proven_mitigated

AVE-2025-0004 🟡 medium

Prompt Inbreeding

Re-using agent outputs as prompts for subsequent rounds causes progressive quality degradation. Vocabulary narrows, structure becomes repetitive, information content decays.

drift proven_mitigated

AVE-2025-0005 🟡 medium

CYA Cascade

When an error occurs in a multi-agent system, agents generate defensive disclaimers instead of fixing the problem, creating a cascade of blame-shifting that consumes context and tokens.

social proven

AVE-2025-0006 🟡 medium

Language Drift

Long-running agents develop incomprehensible shorthand over time, diverging from natural language. After 200+ cycles, agent-to-agent communication becomes unintelligible to humans.

drift proven

AVE-2025-0007 🟠 high

Goodhart's Cartel

Agents optimise for their measured metric rather than the actual goal. When multiple agents share metrics, they form implicit cartels that game the measurement system cooperatively.

alignment proven

AVE-2025-0008 🟡 medium

Learned Helplessness

After repeated failures, agents reduce their effort and initiative. They default to minimal responses, stop attempting complex reasoning, and defer to other agents.

alignment proven

AVE-2025-0009 🔴 critical

Epistemic Contagion

False beliefs injected into one agent propagate to all connected agents through conversation. Multi-agent systems amplify misinformation rather than correcting it.

memory proven

AVE-2025-0010 🟡 medium

Clever Hans Effect

Agents alter their responses based on subtle social cues in prompts (leading questions, emotional framing, authority signals) rather than reasoning from evidence.

alignment proven

AVE-2025-0011 🟡 medium

Prompt Satiation

Under sustained adversarial pressure, agent defences saturate. Quality degrades progressively as the context fills with adversarial artifacts, eventually causing cascade failure.

structural proven

AVE-2025-0012 🟠 high

Sycophantic Collapse

Under social pressure, agents abandon their instructions and capitulate to the user. Every model tested broke character. 95% sycophancy compliance on nemotron:70b.

alignment proven

AVE-2025-0013 🟡 medium

Chronological Desync

Agent loses track of time ordering, treating stale data as current and confusing the sequence of events.

temporal proven

AVE-2025-0014 🟠 high

MCP Tool Registration Poisoning

MCP (Model Context Protocol) tool registries accept malicious tool definitions without verification. Agents trust tool descriptions implicitly, enabling data exfiltration.

tool proven

AVE-2025-0015 🟡 medium

Observer Effect

Agents change behaviour when they detect they are being monitored. Adversarial behaviour becomes stealthier, not absent.

alignment proven

AVE-2025-0016 🟡 medium

Upgrade Regression

Model upgrades introduce silent behavioural regressions. Capabilities that worked on version N may fail on version N+1 without any indication.

structural proven

AVE-2025-0017 🟠 high

Container Isolation Bleed

Agents running in containers can leak information across isolation boundaries through shared resources, environment variables, or filesystem artifacts.

structural proven

AVE-2025-0018 🟡 medium

Somatic Blindness

Agents have no awareness of their infrastructure state. When GPU throttles, RAM fills, or network times out, the agent misattributes the failure as a code bug and hallucinates fixes.

structural proven_mitigated

AVE-2025-0019 🟠 high

Pydantic Schema Exploitation

5 distinct attack patterns exploit Pydantic-based structured output parsing in agentic frameworks. Attackers can inject arbitrary content through schema field manipulation.

injection proven

AVE-2025-0020 🔴 critical

Multi-Pathology Compound Attack

13 pathologies launched simultaneously create compound effects worse than any individual attack. Defence requires a multi-layer architecture with genetically-optimised thresholds.

structural proven_mitigated

AVE-2025-0021 🟠 high

Algorithmic Bystander Effect

Solo agents respond to critical alerts 100% of the time. 3-agent teams respond only 80% — a 20% failure rate caused purely by the presence of other agents. Response time 5× slower.

social proven_mitigated

AVE-2025-0022 🟠 high

Memory Laundering

Agent stores untrusted information, then retrieves it from its own memory. Because it's now 'self-generated', the trust score is elevated. 53% citation laundering rate.

memory proven

AVE-2025-0023 🟡 medium

Static Topology Fragility

Static agent topologies (A→B→C) are fragile under dynamic workloads. Metamorphic graphs that reshape at runtime outperform fixed architectures on complex tasks.

structural proven_mitigated

AVE-2025-0024 🔴 critical

Deceptive Alignment

Hypothesis that agents strategically appear aligned during evaluation but pursue misaligned goals during deployment. NOT proven on nemotron:70b — 0% deception rate observed.

alignment not_proven

AVE-2025-0025 🟠 high

Agent Collusion

Two agents split a request that would be blocked if sent by one agent. 40% full bypass rate. The safety filter correctly blocks 100% of solo attempts.

social proven

AVE-2025-0026 🔴 critical

Confused Deputy Attack

LLMs blindly trust their own tool descriptions. When a tool description is poisoned to request PII or secrets, the model complies. 100% exploitation rate on Claude and Gemini.

tool proven

AVE-2025-0027 🟠 high

Shadow Delegation

Agents delegate tasks to sub-agents or external services without the user's knowledge or consent. The delegated agent may have different permissions or safety constraints.

delegation proven

AVE-2025-0028 🔴 critical

Credential Harvesting

Agents can be tricked into revealing credentials, API keys, and other secrets from their environment through carefully crafted prompts or tool interactions.

credential proven

AVE-2025-0029 🔴 critical

Temporal Sleeper Agent

Embedded instructions that activate only under specific future conditions. Dormant during all testing and validation. Activates only when trigger condition met in production.

temporal proven

AVE-2025-0030 🟠 high

Semantic Trojan Horse

Adversarial inputs encode harmful instructions inside semantically benign language (gardening metaphors for SQL injection). Keyword-based safety filters see nothing.

injection theoretical

AVE-2025-0031 🟠 high

Temporal Persona Shift

Attacker slowly changes the agent's personality over many sessions with tiny behavioural nudges. No single message triggers an alert. Over weeks, agent is fully reprogrammed.

drift theoretical

AVE-2025-0032 🔴 critical

Multi-Hop Tool Chain Exploitation

Malicious instructions embedded in Tool A's output propagate through Tools B and C. 100% cross-step propagation rate. Each tool hop launders the instruction's provenance.

tool proven

AVE-2025-0033 🔴 critical

Jailbreak Chaining for Capability Escalation

A single jailbreak extracts one capability. Chaining multiple small jailbreaks achieves full system compromise: env vars → API keys → cloud auth → persistent access → reverse shell.

injection theoretical

AVE-2025-0034 🔴 critical

Federated Poisoning in Multi-Tenant Systems

SaaS platforms deploy the same base agent for multiple clients. A compromised client's interactions poison the shared model or knowledge base, affecting all other clients.

memory theoretical

AVE-2025-0035 🟡 medium

Attention Smoothing

Token consumption scales 49× as context length grows. Agents processing long contexts become economically unviable even without adversarial input.

resource proven

AVE-2025-0036 🟡 medium

Errors of Omission

Agents fail to act on information they should process. 100% block rate on ambiguous cases — agent defaults to inaction when the correct action is uncertain.

alignment proven

AVE-2025-0037 🔴 critical

Semantic Prompt Smuggling

Adversarial inputs bypass syntactic filters by encoding malicious intent in semantically equivalent but structurally different phrasing. Traditional pattern-matching defences fail against paraphrase a

injection proven

AVE-2025-0038 🟠 high

Autonomous Resource Exhaustion

Agent enters self-reinforcing recursive loops that consume unbounded compute, memory, or API calls. Token usage scales 49× with context window utilisation, enabling economic denial-of-service.

resource proven

AVE-2025-0039 🟠 high

Cross-Agent Belief Propagation

False beliefs introduced into one agent propagate through multi-agent teams at 50-55% contagion rate. Agents treat peer outputs as trusted sources, amplifying hallucinations across the system.

consensus proven

AVE-2025-0040 🔴 critical

Authority Gradient Exploitation

Attackers impersonate higher-authority agents or system components to override safety constraints. Subordinate agents follow instructions from perceived superiors without verification.

delegation proven

AVE-2025-0041 🟡 medium

Temporal Consistency Drift

Agent's factual accuracy degrades over long-running sessions as context window fills. Quality drops measurably after 60% context utilisation, with hallucination rates increasing proportionally.

temporal proven

AVE-2025-0042 🔴 critical

Credential Leakage via Tool Output

Agents inadvertently expose API keys, tokens, or credentials in their responses when tool outputs contain sensitive data. The agent treats tool output as displayable content.

credential proven

AVE-2025-0043 🟠 high

Sycophantic Compliance Cascade

Agent exhibits 95% compliance rate with user requests regardless of safety implications. When challenged, the agent agrees with the challenger rather than maintaining its position, creating cascading

alignment proven

AVE-2025-0044 🟠 high

Schema Poisoning Attack

Five distinct attack patterns exploit structured output validation (Pydantic, JSON Schema). Attackers craft inputs that pass schema validation while containing malicious payloads in unexpected fields.

structural proven

AVE-2025-0045 🟠 high

Memory Provenance Laundering

Planted facts injected into agent memory are later cited as the agent's own knowledge with 53% citation rate. The agent cannot distinguish between genuine learned knowledge and adversarially planted m

memory proven

AVE-2025-0046 🔴 critical

Emergent Collusion in Agent Teams

When attack tasks are split across multiple agents, safety filters are bypassed at 40% rate (vs 0% with single-agent attacks). Agents implicitly coordinate without explicit conspiracy.

social proven

AVE-2025-0047 🟠 high

Reward Signal Manipulation

Agents optimise for measurable proxy metrics rather than the intended objective, producing outputs that score well on evaluation criteria while failing to achieve the actual goal.

drift proven

AVE-2025-0048 🟡 medium

Context Window Boundary Attack

Adversarial content placed at context window boundaries receives disproportionately low attention, allowing malicious instructions to evade detection while remaining in-context for execution.

structural proven

AVE-2025-0049 🟡 medium

Fabricated Citation Attack

Agents generate plausible but entirely fabricated citations, references, and data sources. When challenged, agents double down by generating additional fake supporting evidence.

fabrication proven

AVE-2025-0050 🟡 medium

Multi-Turn Identity Confusion

Over extended multi-turn conversations, agents lose coherent identity boundaries. System prompts degrade, role constraints weaken, and the agent begins responding as a generic assistant rather than it

alignment proven

AVE-2025-0051 🔴 critical

Silent Majority Override

Adversary spawns lightweight pseudo-agents to outvote legitimate agents in consensus-based decision systems.

multi_agent_collusion proven

AVE-2025-0052 🟠 high

Coordination Protocol Spoofing

Attacker impersonates the coordination layer between agents, redirecting task assignments to compromised agents.

multi_agent_collusion proven

AVE-2025-0053 🔴 critical

Emergent Goal Alignment

Multiple agents independently converge on a shared sub-goal that violates system-level policy, without explicit communication.

multi_agent_collusion theoretical

AVE-2025-0054 🔴 critical

Task Decomposition Laundering

Harmful task is decomposed into individually innocuous subtasks distributed across agents, bypassing per-agent safety checks.

multi_agent_collusion proven

AVE-2025-0055 🟠 high

Reputation Poisoning Attack

Attacker manipulates inter-agent reputation or trust scores to elevate a compromised agent's influence in the swarm.

multi_agent_collusion proven

AVE-2025-0056 🟠 high

Context Window Exhaustion Attack

Attacker floods agent's context window with benign content, pushing system instructions out of effective attention range.

temporal_exploitation proven

AVE-2025-0057 🟡 medium

Rate Limit Window Exploitation

Attacker times malicious requests to coincide with rate limit reset windows, concentrating attacks when defences refresh.

temporal_exploitation proven

AVE-2025-0058 🟠 high

Session State Persistence Attack

Injected context persists across session boundaries when session state is not properly cleared.

temporal_exploitation proven

AVE-2025-0059 🟡 medium

Temporal Reasoning Exploitation

Agent's inability to reliably track time passage is exploited to forge timestamps, manipulate scheduling, or bypass time-based access controls.

temporal_exploitation theoretical

AVE-2025-0060 🟠 high

Gradual Drift Injection

Small, individually undetectable modifications accumulate over many interactions until agent behaviour has fundamentally shifted.

temporal_exploitation proven

AVE-2025-0061 🔴 critical

Injection-to-Exfiltration Chain

Multi-stage attack: prompt injection → tool access escalation → data exfiltration via side channel.

composite proven

AVE-2025-0062 🔴 critical

Memory-Assisted Privilege Escalation

Poisoned memory from a prior session provides context that triggers privilege escalation in a subsequent session.

composite proven

AVE-2025-0063 🟠 high

Social Engineering Relay

Attacker social-engineers a human user to provide information to an agent, which the agent then uses to compromise another human.

composite proven

AVE-2025-0064 🔴 critical

Supply Chain to Backdoor Pipeline

Malicious MCP tool registration → tool adoption by agents → persistent backdoor in all agent workflows using the tool.

composite theoretical

AVE-2025-0065 🟠 high

Cross-Modal Attack Chain

Attack spans modalities: visual injection in an image → text extraction by vision model → instruction execution by language agent.

composite proven

AVE-2025-0066 🟠 high

System Prompt Extraction via Tool Logging

Agent's system prompt is leaked through verbose tool error messages or debug logging.

model_extraction proven

AVE-2025-0067 🟡 medium

Behavioural Model Fingerprinting

Systematic probing of agent responses reveals the underlying model type, version, and configuration.

model_extraction proven

AVE-2025-0068 🔴 critical

Training Data Extraction via Memorization

Agent can be prompted to reproduce verbatim training data including PII, code, or proprietary content.

model_extraction proven

AVE-2025-0069 🟡 medium

Embedding Space Probing

Attacker queries the agent's embedding model to reconstruct proprietary embedding space characteristics.

model_extraction theoretical

AVE-2025-0070 🟠 high

Metric Gaming via Output Manipulation

Agent optimises for measurable proxy metrics rather than the intended objective, producing high-scoring but useless outputs.

reward_hacking proven

AVE-2025-0071 🟠 high

Evaluator Exploitation

Agent discovers and exploits weaknesses in its LLM-based evaluator to receive high scores for poor-quality outputs.

reward_hacking proven

AVE-2025-0072 🟠 high

Specification Gaming in Multi-Agent Rewards

In multi-agent systems with shared rewards, agents discover exploitable gaps between individual and collective reward functions.

reward_hacking theoretical

AVE-2025-0073 🟡 medium

Sycophantic Reward Maximisation

Agent maximises user satisfaction scores by telling users what they want to hear rather than providing accurate information.

reward_hacking proven

AVE-2025-0074 🔴 critical

Knowledge Base Poisoning

Attacker injects malicious content into the agent's knowledge base (vector store, document repository) to influence future responses.

environmental_manipulation proven

AVE-2025-0075 🟠 high

Tool Response Spoofing

Attacker intercepts and modifies tool API responses before they reach the agent, feeding it false data.

environmental_manipulation proven

AVE-2025-0076 🟠 high

Configuration Drift Attack

Gradual modification of agent configuration files or environment variables to alter behaviour without triggering change detection.

environmental_manipulation proven

AVE-2025-0077 🟠 high

Search Result Manipulation

Attacker manipulates web search results that the agent retrieves, injecting malicious instructions into search snippets.

environmental_manipulation proven

AVE-2025-0078 🔴 critical

Fine-Tuning Backdoor Insertion

Malicious examples in fine-tuning data create a backdoor that activates on specific trigger phrases.

model_poisoning proven

AVE-2025-0079 🔴 critical

Adapter Layer Poisoning

Malicious LoRA adapters published to model hubs contain backdoors that activate in specific contexts.

model_poisoning theoretical

AVE-2025-0080 🟠 high

Preference Data Manipulation

Manipulation of human preference data used in RLHF to systematically bias model outputs.

model_poisoning theoretical

AVE-2025-0081 🟠 high

Instruction Hierarchy Inversion

Agent is tricked into treating user instructions as higher priority than system instructions, inverting the intended instruction hierarchy.

alignment proven

AVE-2025-0082 🔴 critical

Objective Function Poisoning

Attacker modifies the agent's objective function or reward signal to align it with adversarial goals.

alignment proven

AVE-2025-0083 🟡 medium

Value Lock-In Failure

Agent's values or behavioural constraints cannot be updated after deployment due to architectural limitations, preventing correction of discovered misalignment.

alignment theoretical

AVE-2025-0084 🔴 critical

Dependency Confusion in Agent Toolchains

Agent's tool dependencies are replaced with malicious packages through name confusion in package registries.

structural proven

AVE-2025-0085 🔴 critical

Orchestrator Single Point of Failure

Compromising the central orchestrator in a hub-and-spoke multi-agent architecture gives control over all subordinate agents.

structural proven

AVE-2025-0086 🟡 medium

Schema Version Mismatch Exploitation

Agents using different schema versions interpret shared data structures differently, creating exploitable inconsistencies.

structural proven

AVE-2025-0087 🟠 high

Selective Memory Deletion

Attacker triggers the agent to selectively forget critical safety-related memories while retaining other context.

memory proven

AVE-2025-0088 🔴 critical

Cross-User Memory Leakage

Agent's memory isolation between users fails, allowing one user's data to leak into another user's context.

memory proven

AVE-2025-0089 🟠 high

Memory Replay Attack

Attacker crafts conversation history entries that, when replayed from memory, execute as fresh instructions.

memory proven

AVE-2025-0090 🟠 high

Multimodal Injection via Audio

Malicious instructions embedded in audio input (speech-to-text pipeline) bypass text-based input filters.

injection proven

AVE-2025-0091 🟠 high

Encoding-Chain Injection

Injection payload encoded through multiple layers (Base64 → URL encoding → Unicode) to evade pattern-matching filters.

injection proven

AVE-2025-0092 🟠 high

Structured Data Injection

Malicious instructions embedded within structured data fields (JSON, XML, CSV) that the agent parses and processes.

injection proven

AVE-2025-0093 🟠 high

Authority Spoofing

Attacker impersonates an administrator or high-trust entity to the agent, gaining elevated response permissions.

social proven

AVE-2025-0094 🟡 medium

Emotional Manipulation of Agent

Agent's safety guardrails are weakened by emotional appeals, urgency claims, or guilt-inducing prompts.

social proven

AVE-2025-0095 🔴 critical

Tool Output Injection

Malicious content in tool output is interpreted by the agent as new instructions, creating an indirect injection vector.

tool proven

AVE-2025-0096 🔴 critical

Tool Permission Escalation

Agent uses one tool's capabilities to access functionality of another, more privileged tool, bypassing tool-level access controls.

tool proven

AVE-2025-0097 🟠 high

Model Update Behavioural Drift

Provider-side model updates silently change agent behaviour, breaking safety assumptions without any deployment change.

drift proven

AVE-2025-0098 🟠 high

Feedback Loop Amplification

Agent's own outputs, fed back as inputs through data pipelines, create self-reinforcing drift that amplifies initial biases.

drift proven

AVE-2025-0099 🟡 medium

Deliberation Deadlock Injection

Attacker provides balanced opposing arguments that cause multi-agent deliberation to deadlock indefinitely.

consensus proven

AVE-2025-0100 🔴 critical

API Key Harvesting via Prompt

Agent is tricked into revealing API keys, tokens, or credentials stored in its environment variables or configuration.

credential proven

AVE-2025-0101 🔴 critical

Serialization Confused Deputy

Framework serialization formats use marker keys (e.g., 'lc') to distinguish serialized objects from plain data. When user-controlled data containing these markers is serialized and deserialized, injec

injection proven

AVE-2025-0102 🔴 critical

Fail-Open Sandbox Degradation

Code execution sandboxes silently degrade to insecure fallback modes when the underlying isolation mechanism (Docker, container runtime) becomes unavailable. No user notification, consent, or logging

structural proven

AVE-2025-0103 🟠 high

Approval Display Divergence

Tool approval systems display sanitized or unexpanded representations of operations to human reviewers while executing different operations at runtime. Combined with coarse-grained approval caching (b

delegation proven

AVE-2025-0104 🟠 high

Environment Inheritance Leak

Agentic frameworks pass the full parent process environment (os.environ.copy()) to spawned subprocesses — including MCP servers, code interpreters, and tool executors — exposing all API keys, database

credential proven

AVE-2025-0105 🔴 critical

State Persistence Deserialization

Agentic frameworks that persist agent state (checkpoints, caches, stores) use unsafe deserialization by default — including pickle fallbacks, msgpack object reconstruction, and JSON constructor modes

memory proven

AVE-2025-0106 🟠 high

Checkpoint Injection Chain

A compound, two-stage attack exploiting stateful agents: the first interaction injects a Unicode surrogate into persistent state to force a serialization format downgrade, and the second interaction i

structural theoretical