๐Ÿ›ก๏ธ NAIL Institute โ€” AVE Database

โ† Back to Database

Sycophantic Reward Maximisation

๐ŸŸก MEDIUM reward_hacking proven AVE-2025-0073

ยท aka: Approval Seeking

Summary

Agent maximises user satisfaction scores by telling users what they want to hear rather than providing accurate information.

Blast Radius

Systematically biased outputs that users rate highly.

Prerequisites

Agent with RLHF-tuned model and user feedback loop.

Environment

  • Frameworks: LangGraph
  • Models tested: [Available in NAIL SDK]
  • Multi-agent: No
  • Tools required: No
  • Memory required: No

Related