Spring 2024, Tue. Thu. 09:30 am - 10:50 am, Equad B205,
Instructor: Chi Jin Office hour: Fri. 3:00-4:00 pm, Equad C332
TA: Wenhao Zhan Office hour: Mon. 3:00-4:00 pm, Friend Center 308.
Contents: Mathematical foundations of RL, mostly about theorems and proofs.
Grades: 5 problem sets (60%), 1 final exam (40%).
No late homework.
See past versions of the course here: 2022 Version, 2020 Version
*Please see preliminary version of subscribed notes in 2020 version of the course.
2/8. Concentration inequalities. [video4]
2/15. Martingale concentrations. [video5]
2/20. Generative models. [video6]
2/22. Introduction to exploration. [video7]
2/27. Exploration in multi-arm bandits. [video8]
2/29. Exploration in RL. [video9]
3/5. Lower bounds for multi-arm bandits. [video10]
3/7. Lower bounds for MDPs. [video11]
3/21. Offline RL. [video12]
3/26. RL in Large State Space. [video13]
3/29. Least-Squares Value Iteration. [video14]
4/2. Exploration in Large State Space. [video15]
4/4. General Function Approximation. [video16]
4/9. Exploration in General Function Approximation. [video17]
4/11. Multiagent Reinforcement Learning. [video18]
4/16. Two-Player Zero-Sum Games. [video19]
4/23. Multiplayer General-Sum Games. [video20]
4/25. Partially Observable Reinforcement Learning I. [video21]
5/2. Partially Observable Reinforcement Learning II. [video22]
Basics (tabular MDP):
Intro, MDP basics and planning.
Concentration inequalities.
Generative models, value iteration.
Online RL, exploration, optimism. [Homework 1 due]
Minimax lower bound.
Offline RL, pessimism. [Homework 2 due]
Advanced Topics:
Policy optimization.
Large state space, linear function approximation. [Homework 3 due]
General function approximation.
Game theory and multiagent RL . [Homework 4 due]
Learning Markov games.
Partial observable MDP. [Homework 5 due]
Reinforcement Learning: Theory and Algorithms (draft), by Alekh Agarwal, Nan Jiang, Sham M. Kakade, Wen Sun
Reinforcement learning: an introduction, by Richard S. Sutton, Andrew G. Barto
Algorithms for Reinforcement Learning, by Csaba Szepesvári
Bandit Algorithms, by Tor Lattimore, Csaba Szepesvari
Mathematical Tools
High dimensional probability. An introduction with applications in Data Science, by Roman Vershynin
Concentration inequalities and martingale inequalities — a survey, by Fan Chung, Linyuan Lu
Nan Jiang, Statistical Reinforcement Learning
Wen Sun and Sham Kakade, Foundations of Reinforcement Learning
Dylan J. Foster and Alexander Rakhlin, Statistical Reinforcement Learning and Decision Making
Alekh Agarwal and Alex Slivkins, Bandits and Reinforcement Learning
More practical/empirical version (will not be covered in this course)
Sergey Levine, Deep Reinforcement Learning