Paper List

Tag: deceptive_alignment

3 items with this tag.

May 02, 2026
Alignment Faking in Large Language Models
May 02, 2026
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Apr 13, 2026
Emergent Misalignment: Training large language models on narrow tasks can lead to broad misalignment
- emergent_misalignment
- deceptive_alignment

Created with Quartz v4.5.1 © 2026

GitHub