ยท aka: Reward Hacking, Metric Gaming
Agents optimise for measurable proxy metrics rather than the intended objective, producing outputs that score well on evaluation criteria while failing to achieve the actual goal.
High-scoring outputs that are substantively wrong. Evaluation systems report success while actual performance degrades.
Agent with reward signals, evaluation metrics, or user feedback loops.