AVE Taxonomy โ 20 Attack Categories
Every vulnerability is classified into an attack surface or failure domain.
Categories emerge from empirical observation of AI agent behaviour across
29 controlled experiments and 50,000+ adversarial simulations.
๐ท๏ธ alignment โ Sycophancy, deceptive alignment, RLHF exploits (12 cards)
๐ท๏ธ composite โ (5 cards)
๐ท๏ธ consensus โ Deadlock, paralysis, and group decision failures (3 cards)
๐ท๏ธ credential โ Credential harvesting, secret exfiltration (4 cards)
๐ท๏ธ delegation โ Shadow delegation, privilege escalation (3 cards)
๐ท๏ธ drift โ Persona drift, language drift, goal drift (6 cards)
๐ท๏ธ environmental_manipulation โ (4 cards)
๐ท๏ธ fabrication โ Hallucination, data fabrication (1 cards)
๐ท๏ธ injection โ Prompt injection, indirect injection, jailbreaks (8 cards)
AVE-2025-0033
๐ด critical
Jailbreak Chaining for Capability Escalation
๐ท๏ธ memory โ Memory pollution, laundering, and poisoning attacks (9 cards)
AVE-2025-0034
๐ด critical
Federated Poisoning in Multi-Tenant Systems
๐ท๏ธ model_extraction โ (4 cards)
AVE-2025-0068
๐ด critical
Training Data Extraction via Memorization
๐ท๏ธ model_poisoning โ (3 cards)
๐ท๏ธ multi_agent_collusion โ (5 cards)
๐ท๏ธ resource โ Token embezzlement, EDoS, cost anomaly attacks (3 cards)
๐ท๏ธ reward_hacking โ (4 cards)
AVE-2025-0072
๐ high
Specification Gaming in Multi-Agent Rewards
๐ท๏ธ social โ Collusion, bystander effect, social loafing (6 cards)
๐ท๏ธ structural โ Cascade corruption, routing deadlock (13 cards)
AVE-2025-0084
๐ด critical
Dependency Confusion in Agent Toolchains
๐ท๏ธ temporal โ Chronological desync, sleeper payloads (3 cards)
๐ท๏ธ temporal_exploitation โ (5 cards)
๐ท๏ธ tool โ Confused deputy, tool chain exploits, MCP poisoning (5 cards)