Paper List

Tag: safety_training

1 item with this tag.

  • May 02, 2026

    Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

    • deceptive_alignment
    • backdoor
    • safety_training

Created with Quartz v4.5.1 © 2026

  • GitHub