โ Back to Database
Preference Data Manipulation
๐ HIGH
model_poisoning
theoretical
AVE-2025-0080
ยท aka: RLHF Poisoning
Summary
Manipulation of human preference data used in RLHF to systematically bias model outputs.
Blast Radius
Systematic output bias across all model deployments.
Prerequisites
Access to RLHF preference data collection.
Environment
- Frameworks: LangGraph
- Models tested: [Available in NAIL SDK]
- Multi-agent: No
- Tools required: No
- Memory required: No