Tutorial & Overview
- Book 2018: Sutton & Barto: Reinforcement Learning: An Introduction, Book, Note
- arXiv 2018: An Introduction to Deep Reinforcement Learning, arXiv,, Note
- arXiv 2024: Reinforcement Learning: An Overview, arXiv
- INFORMS Tutorial 2025: Statistical and Algorithmic Foundations of Reinforcement Learning, arXiv, , Slides
Model-Free RL
- arXiv 2018: Reinforcement Learning and Control as Probabilistic Inference, arXiv
- ICLR 2021 Oral: What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study, arXiv
Value-Based Methods
- Nature 2015, DQN: Human-level Control through Deep Reinforcement Learning, Nature
- AAAI 2016, Double DQN: Deep Reinforcement Learning with Double Q-learning, arXiv
- ICML 2017, Soft Q-Learning: Reinforcement Learning with Deep Energy-Based Policies, arXiv
Policy Gradient & On-Policy Methods
Tutorial:
- arXiv 2024: The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations, arXiv, Note
Papers:
- NIPS 1999: Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS
- ICML 2002, CPI / Kakade & Langford: Approximately Optimal Approximate Reinforcement Learning, PDF
- NIPS 2001, NPG: A Natural Policy Gradient, NIPS
- ICML 2016, A3C: Asynchronous Methods for Deep Reinforcement Learning, arXiv
- ICML 2015, TRPO: Trust Region Policy Optimization, arXiv
- arXiv 2017, PPO: Proximal Policy Optimization Algorithms, arXiv
Policy Gradient & Off-Policy Methods
- ICLR 2016, DDPG: Continuous Control with Deep Reinforcement Learning, arXiv
- ICML 2018, SAC: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, arXiv
Offline RL
Exploration Bonus
- ICML 2017, ICM: Curiosity-driven Exploration by Self-supervised Prediction, arXiv, Note
- ICLR 2019, RND: Exploration by Random Network Distillation, arXiv
Model-Based RL
- arXiv 2017: Learning Model-based Planning from Scratch, arXiv
- ICML 2013: Guided Policy Search, Online PDF
- ICML 2017, Predictron: The Predictron: End-To-End Learning and Planning, arXiv
- NIPS 2017, VPN: Value Prediction Network, arXiv
- AAAI 2019, CRAR: Combined Reinforcement Learning via Abstract Representations, arXiv
- ICML 2019, DeepMDP: DeepMDP: Learning Continuous Latent Space Models for Representation Learning, arXiv
Imitation Learning
Tutorial:
- arXiv 2018: An Algorithmic Perspective on Imitation Learning, arXiv
Papers:
-
AISTATS 2011, DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, arXiv, Note
-
ICML 2022, ILEED: Imitation Learning by Estimating Expertise of Demonstrators, arXiv, Note
-
NIPS 2016, GAIL: Generative Adversarial Imitation Learning, arXiv, Note
-
NIPS 2017, InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations, arXiv, Note
-
IJCAI 2020, Triple-GAIL: Triple-GAIL: A Multi-Modal Imitation Learning Framework, arXiv
-
IJCAI 2021, SAIL: Robust Adversarial Imitation Learning via Adaptively-Selected Demonstrations, IJCAI
-
ICLR 2023, HOIL: Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning, arXiv
-
ICML 2023, PCIL: Policy Contrastive Imitation Learning, arXiv
Inverse Reinforcement Learning
Tutorial:
- arXiv 2018: A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress, arXiv
Papers:
- AAAI 2008, MaxEnt IRL: Maximum Entropy Inverse Reinforcement Learning, AAAI, Note
- ICML 2016, MaxEnt IOC: Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, arXiv
- ICLR 2018, AIRL: Learning Robust Rewards with Adverserial Inverse Reinforcement Learning, arXiv
Generalization & Overfitting
- arXiv 2018: A Study on Overfitting in Deep RL, arXiv
- arXiv 2018: A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning, arXiv
Representation Learning & Transfer
- ICML 2017, DARLA: Improving Zero-Shot Transfer in Reinforcement Learning, arXiv
- ICLR 2018 Workshop: Decoupling Dynamics and Reward for Transfer Learning, arXiv
- ICLR 2021, DBC: Learning Invariant Representations for Reinforcement Learning without Reconstruction, arXiv
- ICLR 2021, HiP-BMP: Learning Robust State Abstractions for Hidden-Parameter Block MDPs, arXiv
Hierarchical RL / Temporal Abstraction
- AIJ 1999, Options framework: Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in RL, AIJ/ScienceDirect
- AAAI 2017, Option-Critic: The Option-Critic Architecture, arXiv
- NeurIPS 2016, STRAW: Strategic Attentive Writer for Learning Macro-Actions, arXiv