Tutorial & Overview

  • Book 2018: Sutton & Barto: Reinforcement Learning: An Introduction, Book, Note
  • arXiv 2018: An Introduction to Deep Reinforcement Learning, arXiv,, Note
  • arXiv 2024: Reinforcement Learning: An Overview, arXiv
  • INFORMS Tutorial 2025: Statistical and Algorithmic Foundations of Reinforcement Learning, arXiv, , Slides

Model-Free RL

  • arXiv 2018: Reinforcement Learning and Control as Probabilistic Inference, arXiv
  • ICLR 2021 Oral: What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study, arXiv

Value-Based Methods

  • Nature 2015, DQN: Human-level Control through Deep Reinforcement Learning, Nature
  • AAAI 2016, Double DQN: Deep Reinforcement Learning with Double Q-learning, arXiv
  • ICML 2017, Soft Q-Learning: Reinforcement Learning with Deep Energy-Based Policies, arXiv

Policy Gradient & On-Policy Methods

Tutorial:

  • arXiv 2024: The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations, arXiv, Note

Papers:

  • NIPS 1999: Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS
  • ICML 2002, CPI / Kakade & Langford: Approximately Optimal Approximate Reinforcement Learning, PDF
  • NIPS 2001, NPG: A Natural Policy Gradient, NIPS
  • ICML 2016, A3C: Asynchronous Methods for Deep Reinforcement Learning, arXiv
  • ICML 2015, TRPO: Trust Region Policy Optimization, arXiv
  • arXiv 2017, PPO: Proximal Policy Optimization Algorithms, arXiv

Policy Gradient & Off-Policy Methods

  • ICLR 2016, DDPG: Continuous Control with Deep Reinforcement Learning, arXiv
  • ICML 2018, SAC: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, arXiv

Offline RL

Exploration Bonus

  • ICML 2017, ICM: Curiosity-driven Exploration by Self-supervised Prediction, arXiv, Note
  • ICLR 2019, RND: Exploration by Random Network Distillation, arXiv

Model-Based RL

  • arXiv 2017: Learning Model-based Planning from Scratch, arXiv
  • ICML 2013: Guided Policy Search, Online PDF
  • ICML 2017, Predictron: The Predictron: End-To-End Learning and Planning, arXiv
  • NIPS 2017, VPN: Value Prediction Network, arXiv
  • AAAI 2019, CRAR: Combined Reinforcement Learning via Abstract Representations, arXiv
  • ICML 2019, DeepMDP: DeepMDP: Learning Continuous Latent Space Models for Representation Learning, arXiv

Imitation Learning

Tutorial:

  • arXiv 2018: An Algorithmic Perspective on Imitation Learning, arXiv

Papers:

  • AISTATS 2011, DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, arXiv, Note

  • ICML 2022, ILEED: Imitation Learning by Estimating Expertise of Demonstrators, arXiv, Note

  • NIPS 2016, GAIL: Generative Adversarial Imitation Learning, arXiv, Note

  • NIPS 2017, InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations, arXiv, Note

  • IJCAI 2020, Triple-GAIL: Triple-GAIL: A Multi-Modal Imitation Learning Framework, arXiv

  • IJCAI 2021, SAIL: Robust Adversarial Imitation Learning via Adaptively-Selected Demonstrations, IJCAI

  • ICLR 2023, HOIL: Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning, arXiv

  • ICML 2023, PCIL: Policy Contrastive Imitation Learning, arXiv

Inverse Reinforcement Learning

Tutorial:

  • arXiv 2018: A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress, arXiv

Papers:

  • AAAI 2008, MaxEnt IRL: Maximum Entropy Inverse Reinforcement Learning, AAAI, Note
  • ICML 2016, MaxEnt IOC: Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, arXiv
  • ICLR 2018, AIRL: Learning Robust Rewards with Adverserial Inverse Reinforcement Learning, arXiv

Generalization & Overfitting

  • arXiv 2018: A Study on Overfitting in Deep RL, arXiv
  • arXiv 2018: A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning, arXiv

Representation Learning & Transfer

  • ICML 2017, DARLA: Improving Zero-Shot Transfer in Reinforcement Learning, arXiv
  • ICLR 2018 Workshop: Decoupling Dynamics and Reward for Transfer Learning, arXiv
  • ICLR 2021, DBC: Learning Invariant Representations for Reinforcement Learning without Reconstruction, arXiv
  • ICLR 2021, HiP-BMP: Learning Robust State Abstractions for Hidden-Parameter Block MDPs, arXiv

Hierarchical RL / Temporal Abstraction

  • AIJ 1999, Options framework: Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in RL, AIJ/ScienceDirect
  • AAAI 2017, Option-Critic: The Option-Critic Architecture, arXiv
  • NeurIPS 2016, STRAW: Strategic Attentive Writer for Learning Macro-Actions, arXiv