๐Ÿ›ก๏ธ NAIL Institute โ€” AVE Database

โ† Back to Database

Reward Signal Manipulation

๐ŸŸ  HIGH drift proven AVE-2025-0047

ยท aka: Reward Hacking, Metric Gaming

Summary

Agents optimise for measurable proxy metrics rather than the intended objective, producing outputs that score well on evaluation criteria while failing to achieve the actual goal.

Blast Radius

High-scoring outputs that are substantively wrong. Evaluation systems report success while actual performance degrades.

Prerequisites

Agent with reward signals, evaluation metrics, or user feedback loops.

Environment

  • Frameworks: LangGraph
  • Models tested: [Available in NAIL SDK]
  • Multi-agent: No
  • Tools required: No
  • Memory required: No