🛡️ NAIL Institute — AVE Database

← Back to Database

Deceptive Alignment

🔴 CRITICAL alignment not_proven AVE-2025-0024

· aka: Scheming, Strategic Deception

Summary

Hypothesis that agents strategically appear aligned during evaluation but pursue misaligned goals during deployment. NOT proven on nemotron:70b — 0% deception rate observed.

Blast Radius

Theoretically catastrophic — total loss of alignment guarantees.

Prerequisites

Agent capable of distinguishing evaluation from production.

Environment

  • Frameworks: LangGraph
  • Models tested: [Available in NAIL SDK]
  • Multi-agent: No
  • Tools required: No
  • Memory required: No