2026 spring

Reinforcement Learning (1 Div.)

Reinforcement learning (RL) is one of the popular machine learning paradigms for solving sequential decision-making problems. In this paradigm, agents learn the optimal policies by repeatedly interacting with an environment to maximize (cumulative) rewards. This courses will cover the foundational concepts of RL, including state-action-reward pairs, the Markov decision process, and exploration versus exploitation. In addition, we will learn key RL algorithms, such as the Monte Carlo method, temporal difference learning, function approximation, and policy gradients. Furthermore, you will work on a small team project to implement an RL agent to solve problems with different difficulties, from simple to complex ones.


Instruction

Course Staff
Time & Location
  • Mon./Thu. 09:00 - 10:15, #600, College of Engineering #6
Office Hours
  • Tue. 13:00 - 15:00
Textbook
  • [Ri20] Reinforcement Learning: An Introduction, 2nd Ed., Richard S. Sutton and Andrew G. Barto, The MIT Press.
Prerequisite
  • Python Programming
Grading Policy
Reinforcement Learning Competitions (90%)
  • (5%) Competition Round 0: TBA
  • (15%) Competition Round 1: TBA
  • (20%) Competition Round 2: TBA
  • (20%) Competition Round 3: TBA
  • (30%) Competition Round 4: TBA
Attendance (10%)
  • 1% of credit is deducted for each absence or each 3-lateness
  • At least 11-Absence = F grade

Schedule

Week 01
March 05 — Overview & Logistics
March 09 — Basic Math

Week 02
March 12 — Introduction to Reinforcement Learning
March 16 — Multi-Armed Bandits

Week 03
March 19 — Markov Process
March 23 — Dynamic Programming

Week 04
March 26 — Tutorial on Gymnasium
March 30 — Monte-Carlo Methods: On-Policy Methods

Week 05
April 02 — Monte-Carlo Methods: Off-Policy Methods
April 06 — Competition Round 0: TBA

Week 06
April 09 — Temporal Difference Learning
April 13 — n-Step Bootstrapping

Week 07
April 16 — Planning & Learning
April 20 — Linear Function Approximation
  • Lecture
  • Reference
    • [Ri20] Chap. 9 - 10

Week 08
April 23 — Competition Round 1: TBA
April 27 — Nonlinear Function Approximation: Deep Neural Network
  • Lecture
  • Reference
    • [Ge23] Chap. 10, 11
    • [Ri20] Chap. 9 - 10

Week 09
April 30 — Nonlinear Function Approximation: Convolution Neural Network
  • Lecture
  • Reference
    • [Ge23] Chap. 14
    • [Ri20] Chap. 9 - 10
May 04 — Practice on Function Approximation
  • Lectures
  • Reference
    • [Ge23] Chap. 10, 11, 14
    • [Ri20] Chap. 9 - 10

Week 10
May 07 — Deep-Q Network
May 11 — Competition Round 2: TBA

Week 11
May 14 — Policy Gradient Methods
  • Lecture
  • Reference
    • [Ri20] Chap. 13
May 18 — Practice on Policy Gradient Methods
  • Lecture
  • Reference
    • [Ri20] Chap. 13

Week 12
May 21 — Advanced Topics: Variants of DQN / Asynchronous Methods
  • Lecture
  • Reference
    • Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. & Kavukcuoglu, K.. "Asynchronous Methods for Deep Reinforcement Learning". Proceedings of The 33rd International Conference on Machine Learning. 48:1928-1937 (2016). https://proceedings.mlr.press/v48/mniha16.html.
    • van Hasselt, H., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning". Proceedings of the AAAI Conference on Artificial Intelligence, 30 (1) (2016). https://doi.org/10.1609/aaai.v30i1.10295
    • Schaul, T., Quan, J., Antonoglou, I., Silver, D. "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952 (2015). https://arxiv.org/abs/1511.05952
    • Wang, Ziyu, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. "Dueling network architectures for deep reinforcement learning." In International conference on machine learning, pp. 1995-2003. PMLR, 2016. https://proceedings.mlr.press/v48/wangf16.html
May 25 — Buddha's Birthday
  • No Class

Week 13
May 28 — Competition Round 3: TBA
June 01 — Advanced Topics: Deterministic Policy Gradient Methods
  • Lecture
  • Reference
    • Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). https://arxiv.org/abs/1509.02971
    • Dankwa, Stephen and Zheng, Wenfeng. "Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent". In Proceedings of the 3rd International Conference on Vision, Image and Signal Processing (ICVISP 2019). Association for Computing Machinery, New York, NY, USA, Article 66, 1–5. (2020) https://doi.org/10.1145/3387168.3387199

Week 14
June 04 — Advanced Topics: Entropy-Regularized Methods
  • Lecture
  • Reference
    • Haarnoja, Tuomas, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018). https://arxiv.org/abs/1812.05905
June 08 — Advanced Topics: Trust Region Constraint Methods
  • Lecture
  • Reference
    • Schulman, John, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. PMLR, 2015.
    • Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). https://arxiv.org/abs/1707.06347
Week 15
June 11 — Focus on RL Final Competition
  • No Class
June 15 — Competition Round 4: TBA

Week 16
June 18 — Final Remark