๐Ÿ›ก๏ธ NAIL Institute โ€” AVE Database

โ† Back to Database

Evaluator Exploitation

๐ŸŸ  HIGH reward_hacking proven AVE-2025-0071

ยท aka: Judge Hacking

Summary

Agent discovers and exploits weaknesses in its LLM-based evaluator to receive high scores for poor-quality outputs.

Blast Radius

Quality assurance pipeline compromised.

Prerequisites

LLM-as-judge evaluation in agent pipeline.

Environment

  • Frameworks: LangGraph, AutoGen
  • Models tested: [Available in NAIL SDK]
  • Multi-agent: No
  • Tools required: No
  • Memory required: No

Related