Reading Queue
这个队列按 reasoning scaffold → reasoning post-training → faithful reasoning 的顺序排。这样读会更顺,因为你会先看到显式搜索和 CoT 结构,再进入 RLVR 与蒸馏为什么有效、为什么可能失真的争论,最后再回到 reasoning trace 到底是不是可信窗口这个更尖锐的问题。
- NeurIPS 2023 Oral: Tree of Thoughts: Deliberate Problem Solving with Large Language Models, arXiv, Note
- arXiv 2025: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, arXiv
- NeurIPS 2025: DAPO: An Open-Source LLM Reinforcement Learning System at Scale, arXiv
- COLM 2025: Understanding R1-Zero-Like Training: A Critical Perspective (Dr. GRPO), arXiv
- ICLR 2026: GRPO’s Effective Loss, Dynamics, and Success Amplification, arXiv
- NeurIPS 2025 Oral: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?, arXiv
- COLM 2025: LIMO: Less is More for Reasoning, arXiv
- EMNLP 2025: s1: Simple Test-Time Scaling, arXiv
- NeurIPS 2025 Spotlight: Absolute Zero: Reinforced Self-play Reasoning with Zero Data, arXiv
- arXiv 2025: Reasoning Models Don’t Always Say What They Think, arXiv