Tutorial & Overview

  • Book 2018: Sutton & Barto: Reinforcement Learning: An Introduction, Book, Note
  • arXiv 2018: An Introduction to Deep Reinforcement Learning, arXiv,, Note
  • arXiv 2024: Reinforcement Learning: An Overview, arXiv
  • INFORMS Tutorial 2025: Statistical and Algorithmic Foundations of Reinforcement Learning, arXiv, , Slides

Model-Free RL

  • arXiv 2018: Reinforcement Learning and Control as Probabilistic Inference, arXiv
  • ICLR 2021 Oral: What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study, arXiv

Value-Based Methods

  • Nature 2015, DQN: Human-level Control through Deep Reinforcement Learning, Nature
  • AAAI 2016, Double DQN: Deep Reinforcement Learning with Double Q-learning, arXiv
  • ICML 2017, Soft Q-Learning: Reinforcement Learning with Deep Energy-Based Policies, arXiv

Policy Gradient & On-Policy Methods

Tutorial:

  • arXiv 2024: The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations, arXiv, Note

Papers:

  • ICLR 2016, GAE: High-Dimensional Continuous Control Using Generalized Advantage Estimation, arXiv, Note

  • NeurIPS 2022, DAE: Direct Advantage Estimation, arXiv, Note

  • NIPS 1999: Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS

  • ICML 2002, CPI / Kakade & Langford: Approximately Optimal Approximate Reinforcement Learning, PDF

  • NIPS 2001, NPG: A Natural Policy Gradient, NIPS

  • ICML 2016, A3C: Asynchronous Methods for Deep Reinforcement Learning, arXiv

  • ICML 2015, TRPO: Trust Region Policy Optimization, arXiv

  • arXiv 2017, PPO: Proximal Policy Optimization Algorithms, arXiv

Policy Gradient & Off-Policy Methods

  • ICLR 2016, DDPG: Continuous Control with Deep Reinforcement Learning, arXiv
  • ICML 2018, SAC: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, arXiv

Offline RL

Exploration Bonus

  • ICML 2017, ICM: Curiosity-driven Exploration by Self-supervised Prediction, arXiv, Note
  • ICLR 2019, RND: Exploration by Random Network Distillation, arXiv

Model-Based RL

  • arXiv 2017: Learning Model-based Planning from Scratch, arXiv
  • ICML 2013: Guided Policy Search, Online PDF
  • NIPS 2015: Data Generation as Sequential Decision Making, arXiv, Note
  • ICML 2017, Predictron: The Predictron: End-To-End Learning and Planning, arXiv
  • NIPS 2017, VPN: Value Prediction Network, arXiv
  • AAAI 2019, CRAR: Combined Reinforcement Learning via Abstract Representations, arXiv
  • ICML 2019, DeepMDP: DeepMDP: Learning Continuous Latent Space Models for Representation Learning, arXiv

Inverse Reinforcement Learning

Tutorial:

  • arXiv 2018: A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress, arXiv

Papers:

  • AAAI 2008, MaxEnt IRL: Maximum Entropy Inverse Reinforcement Learning, AAAI, Note
  • ICML 2016, MaxEnt IOC: Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, arXiv
  • ICLR 2018, AIRL: Learning Robust Rewards with Adverserial Inverse Reinforcement Learning, arXiv

Generalization & Overfitting

  • arXiv 2018: A Study on Overfitting in Deep RL, arXiv
  • arXiv 2018: A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning, arXiv

Representation Learning & Transfer

  • ICML 2017, DARLA: Improving Zero-Shot Transfer in Reinforcement Learning, arXiv
  • ICLR 2018 Workshop: Decoupling Dynamics and Reward for Transfer Learning, arXiv
  • ICLR 2021, DBC: Learning Invariant Representations for Reinforcement Learning without Reconstruction, arXiv
  • ICLR 2021, HiP-BMP: Learning Robust State Abstractions for Hidden-Parameter Block MDPs, arXiv

Explainable RL

  • NeurIPS 2019: Causal Confusion in Imitation Learning, arXiv
  • ICLR 2018: Learning Sparse Neural Networks through L0 Regularization, arXiv
  • NeurIPS 2023: StateMask: Explaining Deep Reinforcement Learning through State Mask, OpenReview, GitHub

Hierarchical RL / Temporal Abstraction

  • AIJ 1999, Options framework: Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in RL, AIJ/ScienceDirect
  • AAAI 2017, Option-Critic: The Option-Critic Architecture, arXiv
  • NIPS 2016, STRAW: Strategic Attentive Writer for Learning Macro-Actions, arXiv