Reading Queue

这个队列不是按年份堆 paper,而是按 agent scaffold online RL system scaling 与 training pathology 这条主线排的。这样读的好处很直接:你会先搞清楚 agent 究竟在优化什么交互对象,再去看环境反馈如何变成学习信号,最后再进入那些真正决定规模化成败的系统与稳定性问题。若你读到 reward design 或 RLVR 这层觉得背景不够,直接并行回看 Textual Reasoning;如果你更关心监督失灵、faithfulness 或更广的 alignment failure,则回 Safety & Alignment。如果你卡在 PPO、GAE、value function 或 hierarchical RL 的基础抽象,再回 Classical & Deep RL 补算法地基会更省力。

  • arXiv: OpenClaw-RL, arXiv, Note
  • ICLR 2023: ReAct: Synergizing Reasoning and Acting in Language Models, arXiv
  • NeurIPS 2023: Reflexion: Language Agents with Verbal Reinforcement Learning, arXiv
  • ICML 2024: Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models, arXiv
  • NeurIPS 2024: SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, arXiv
  • ICLR 2024 Spotlight: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, arXiv
  • ICML 2024: ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL, arXiv
  • NeurIPS 2024: DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning, arXiv
  • ICLR 2025: WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning, arXiv
  • ICLR 2025: Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents, arXiv
  • arXiv 2025: Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, arXiv
  • arXiv 2025: RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning, arXiv
  • ICLR 2026 Oral: AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning, arXiv
  • arXiv 2025: AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework, arXiv
  • arXiv 2025: ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents, arXiv
  • arXiv 2025: A Practitioner’s Guide to Multi-turn Agentic Reinforcement Learning, arXiv