AVE-2025-0024 — Deceptive Alignment

Deceptive Alignment

🔴 CRITICAL alignment not_proven AVE-2025-0024

· aka: Scheming, Strategic Deception

Summary

Hypothesis that agents strategically appear aligned during evaluation but pursue misaligned goals during deployment. NOT proven on nemotron:70b — 0% deception rate observed.

Blast Radius

Theoretically catastrophic — total loss of alignment guarantees.

Prerequisites

Agent capable of distinguishing evaluation from production.

Environment

Frameworks: LangGraph
Models tested: [Available in NAIL SDK]
Multi-agent: No
Tools required: No
Memory required: No

AVE-2025-0015
AVE-2025-0029

🛡️ NAIL Institute — AVE Database

Deceptive Alignment

Summary

Blast Radius

Prerequisites

Environment

Related