Tutorial & Overview

  • Book 2018: Sutton & Barto: Reinforcement Learning: An Introduction, Book, Note
  • arXiv 2018: An Introduction to Deep Reinforcement Learning, arXiv,, Note
  • arXiv 2024: Reinforcement Learning: An Overview, arXiv, Note
  • INFORMS Tutorial 2025: Statistical and Algorithmic Foundations of Reinforcement Learning, arXiv, , Slides, Note

Model-Free RL

  • arXiv 2018: Reinforcement Learning and Control as Probabilistic Inference, arXiv, Note
  • ICLR 2021 Oral: What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study, arXiv, Note

Value-Based Methods

  • Nature 2015, DQN: Human-level Control through Deep Reinforcement Learning, Nature, Note
  • AAAI 2016, Double DQN: Deep Reinforcement Learning with Double Q-learning, arXiv, Note
  • ICML 2017, Soft Q-Learning: Reinforcement Learning with Deep Energy-Based Policies, arXiv, Note

Policy Gradient & On-Policy Methods

Tutorial:

  • arXiv 2024: The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations, arXiv, Note

Papers:

  • NIPS 1999: Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, Note
  • NIPS 2001, NPG: A Natural Policy Gradient, NIPS, Note
  • ICML 2016, A3C: Asynchronous Methods for Deep Reinforcement Learning, arXiv, Note
  • ICML 2015, TRPO: Trust Region Policy Optimization, arXiv, Note
  • arXiv 2017, PPO: Proximal Policy Optimization Algorithms, arXiv, Note

Policy Gradient & Off-Policy Methods

  • ICLR 2016, DDPG: Continuous Control with Deep Reinforcement Learning, arXiv, Note
  • ICML 2018, SAC: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, arXiv, Note

Exploration Bonus

  • ICML 2017, ICM: Curiosity-driven Exploration by Self-supervised Prediction, arXiv, Note

Model-Based RL

  • arXiv 2017: Learning Model-based Planning from Scratch, arXiv, Note
  • ICML 2013: Guided Policy Search, Online PDF, Note
  • ICML 2017, Predictron: The Predictron: End-To-End Learning and Planning, arXiv, Note
  • NIPS 2017, VPN: Value Prediction Network, arXiv, Note
  • AAAI 2019, CRAR: Combined Reinforcement Learning via Abstract Representations, arXiv, Note

Imitation Learning

Tutorial:

  • arXiv 2018: An Algorithmic Perspective on Imitation Learning, arXiv, Note

Papers:

  • AISTATS 2011, DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, arXiv, Note
  • NIPS 2016, GAIL: Generative Adversarial Imitation Learning, arXiv, Note
  • NIPS 2017, InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations, arXiv, Note
  • ICLR 2023, HOIL: Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning, arXiv, Note

Inverse Reinforcement Learning

Tutorial:

  • arXiv 2018: A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress, arXiv, Note

Papers:

  • AAAI 2008, MaxEnt IRL: Maximum Entropy Inverse Reinforcement Learning, AAAI, Note
  • ICML 2016, MaxEnt IOC: Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, arXiv, Note
  • ICLR 2018, AIRL: Learning Robust Rewards with Adverserial Inverse Reinforcement Learning, arXiv, Note

Generalization & Overfitting

  • arXiv 2018: A Study on Overfitting in Deep RL, arXiv, Note
  • arXiv 2018: A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning, arXiv, Note

Representation Learning & Transfer

  • ICML 2017, DARLA: Improving Zero-Shot Transfer in Reinforcement Learning, arXiv, Note
  • ICLR 2018 Workshop: Decoupling Dynamics and Reward for Transfer Learning, arXiv, Note

Hierarchical RL / Temporal Abstraction

  • AIJ 1999, Options framework: Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in RL, AIJ/ScienceDirect, Note
  • AAAI 2017, Option-Critic: The Option-Critic Architecture, arXiv, Note
  • NeurIPS 2016, STRAW: Strategic Attentive Writer for Learning Macro-Actions, arXiv, Note