Paper List

Tag: rlhf_theory

1 item with this tag.

  • May 01, 2026

    DPO Misspecification: Why DPO is a Misspecified Estimator and How to Fix It

    • preference_optimization
    • dpo
    • rlhf_theory

Created with Quartz v4.5.1 © 2026

  • GitHub