2026 spring

Reinforcement Learning (1 Div.)

Reinforcement learning (RL) is one of the popular machine learning paradigms for solving sequential decision-making problems. In this paradigm, agents learn the optimal policies by repeatedly interacting with an environment to maximize (cumulative) rewards. This courses will cover the foundational concepts of RL, including state-action-reward pairs, the Markov decision process, and exploration versus exploitation. In addition, we will learn key RL algorithms, such as the Monte Carlo method, temporal difference learning, function approximation, and policy gradients. Furthermore, you will work on a small team project to implement an RL agent to solve problems with different difficulties, from simple to complex ones.


Instruction

Course Staff
Time & Location
  • Mon./Thu. 09:00 - 10:15, #609, College of Engineering #6
Office Hours
  • Tue. 13:00 - 15:00
Textbook
  • [Ri20] Reinforcement Learning: An Introduction, 2nd Ed., Richard S. Sutton and Andrew G. Barto, The MIT Press.
Prerequisite
  • Python Programming
Grading Policy
Reinforcement Learning Competitions (90%)
  • (10%) Competition Round 0: Grid Crossing
  • (20%) Competition Round 1: Zelda's Adventure
  • (30%) Competition Round 2: Avoid Blurps
  • (30%) Competition Round 3: Bullet Bills
Attendance (10%)
  • 1% of credit is deducted for each absence or each 3-lateness
  • At least 11-Absence = F grade

Schedule

Week 01
March 05 — Overview & Logistics
March 09 — Basic Math

Week 02
March 12 — Introduction to Reinforcement Learning
March 16 — Multi-Armed Bandits

Week 03
March 19 — Markov Process
March 23 — Dynamic Programming

Week 04
March 26 — Tutorial on Gymnasium & Dynamical Programming
March 30 — On-Policy Monte-Carlo Methods

Week 05
April 02 — Off-Policy Monte-Carlo Methods
April 06 — Temporal Difference Learning

Week 06
April 09 — Competition Round 0: Grid Crossing
April 13 — Practice on Monte Carlo Methods & Temporal Difference Learning

Week 07
April 16 — n-Step Bootstrapping
April 20 — Planning & Learning

Week 08
April 23 — Linear Function Approximation
  • Lecture
  • Reference
    • [Ri20] Chap. 9 - 10
April 27 — Competition Round 1: Zelda's Adventure

Week 09
April 30 — Nonlinear Function Approximation
  • Lecture
  • Reference
    • [Ge23] Chap. 10, 11, 14
    • [Ri20] Chap. 9 - 10
May 04 — Practice on Function Approximation
  • Lectures
  • Reference
    • [Ge23] Chap. 10, 11, 14
    • [Ri20] Chap. 9 - 10

Week 10
May 07 — Deep-Q Network
May 11 — Policy Gradient Methods
  • Lecture
  • Reference
    • [Ri20] Chap. 13

Week 11
May 14 — Practice on Policy Gradient Methods
  • Lecture
  • Reference
    • [Ri20] Chap. 13
May 18 — Advanced Topics: Variants of DQN
  • Lecture
  • Reference
    • van Hasselt, H., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning". Proceedings of the AAAI Conference on Artificial Intelligence, 30 (1) (2016). https://doi.org/10.1609/aaai.v30i1.10295
    • Schaul, T., Quan, J., Antonoglou, I., Silver, D. "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952 (2015). https://arxiv.org/abs/1511.05952
    • Wang, Ziyu, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. "Dueling network architectures for deep reinforcement learning." In International conference on machine learning, pp. 1995-2003. PMLR, 2016. https://proceedings.mlr.press/v48/wangf16.html

Week 12
May 21 — Competition Round 2: Avoid Blurps
May 25 — Buddha's Birthday
  • No Class

Week 13
May 28 — Advanced Topics: Deterministic Policy Gradient Methods
  • Lecture
  • Reference
    • Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). https://arxiv.org/abs/1509.02971
    • Scott Fujimoto, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." In International conference on machine learning, pp. 1587-1596. PMLR (2018).
June 01 — Advanced Topics: Entropy Maximization
  • Lecture
  • Reference
    • Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine. "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor". International Conference on Machine Learning (2018).
    • Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018). https://arxiv.org/abs/1812.05905

Week 14
June 04 — Advanced Topics: Trust Region Constraint Methods
  • Lecture
  • Reference
    • John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. PMLR (2015).
June 08 — Advanced Topics: Proximal Policy Optimization
  • Lecture
  • Reference
    • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). https://arxiv.org/abs/1707.06347
    • John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel. "High-Dimensional Continuous Control Using Generalized Advantage Estimation." arXiv preprint arXiv:1506.02438 (2018). https://arxiv.org/abs/1506.02438\
    • Volodymyr Mnih et al. "Asynchronous methods for deep reinforcement learning." In International conference on machine learning, pp. 1928-1937. PMLR. (2016).
    • Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. "Curiosity-driven exploration by self-supervised prediction." In International conference on machine learning, pp. 2778-2787. PMLR. (2017).
    • Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. "Exploration by random network distillation." arXiv preprint arXiv:1810.12894 (2018).
Week 15
June 11 — Focus on Final Competition
  • No Class
June 15 — Competition Round 3: Bullet Bills

Week 16
June 18 — Final Remark