Paper List
Search
Search
Dark mode
Light mode
Explorer
Tag: deceptive_alignment
3 items with this tag.
May 02, 2026
Alignment Faking in Large Language Models
alignment_faking
deceptive_alignment
situational_awareness
May 02, 2026
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
deceptive_alignment
backdoor
safety_training
Apr 13, 2026
Emergent Misalignment: Training large language models on narrow tasks can lead to broad misalignment
emergent_misalignment
deceptive_alignment