Paper List
Search
Search
Dark mode
Light mode
Explorer
Tag: preference_optimization
2 items with this tag.
May 01, 2026
DPO Misspecification: Why DPO is a Misspecified Estimator and How to Fix It
preference_optimization
dpo
rlhf_theory
May 01, 2026
MNPO: Multiplayer Nash Preference Optimization
preference_optimization
nash_learning
rlhf