โ Back to Database
Evaluator Exploitation
๐ HIGH
reward_hacking
proven
AVE-2025-0071
ยท aka: Judge Hacking
Summary
Agent discovers and exploits weaknesses in its LLM-based evaluator to receive high scores for poor-quality outputs.
Blast Radius
Quality assurance pipeline compromised.
Prerequisites
LLM-as-judge evaluation in agent pipeline.
Environment
- Frameworks: LangGraph, AutoGen
- Models tested: [Available in NAIL SDK]
- Multi-agent: No
- Tools required: No
- Memory required: No