2025 fall

Artificial Intelligence (2 Div.)

Reinforcement learning (RL) is one of the popular machine learning paradigms for solving sequential decision-making problems. In this paradigm, agents learn the optimal policies by repeatedly interacting with an environment to maximize (cumulative) rewards. This courses will cover the foundational concepts of RL, including state-action-reward pairs, the Markov decision process, and exploration versus exploitation. In addition, we will learn key RL algorithms, such as the Monte Carlo method, temporal difference learning, function approximation, and policy gradients. Furthermore, you will work on a small team project to implement an RL agent to solve problems with different difficulties, from simple to complex ones.


Instruction

Course Staff
Time & Location
  • Mon./Thu. 09:00 - 10:15, #609, College of Engineering #6
Office Hours
  • Tue. 13:00 - 15:00
Textbook
  • [Ri20] Reinforcement Learning: An Introduction, 2nd Ed., Richard S. Sutton and Andrew G. Barto, The MIT Press.
Prerequisite
  • Python Programming
Grading Policy
Reinforcement Learning Competitions (90%)
  • Round 1: Grid Crossing! (10%)
  • Round 2: Grid Adventure! (20%)
  • Round 3: Avoid Blurp! (20%)
  • Round 4: Al-kka-gi! (40%)
Attendance (10%)
  • 1% of credit is deducted for each absence
  • 3-Lateness = 1-Absence
  • At least 11-Absence = F grade

Schedule

Week 01
September 01 — Overview & Logistics
September 04 — Basic Math

Week 02
September 08 — Introduction to Reinforcement Learning
September 11 — Multi-Armed Bandits

Week 03
September 15 — Markov Process
September 18 — Dynamic Programming

Week 04
September 22 — Tutorial on Gymnasium
September 25 — Monte-Carlo Methods: On-Policy Methods

Week 05
September 29 — Monte-Carlo Methods: Off-Policy Methods
October 02 — Temporal Difference Learning

Week 06
October 06 — Chuseok Holiday
  • No Class
October 09 — Hangul day
  • No Class

Week 07
October 13 — Competition Round 1: Grid Crossing!
October 16 — n-Step Bootstrapping

Week 08
October 20 — Planning & Learning
October 23 — Focus on Midterm Exam
  • No Class

Week 09
October 27 — Linear Function Approximation
  • Lecture
  • Reference
    • [Ri20] Chap. 9 - 10
October 30 — Competition Round 2: Grid Adventure!

Week 10
November 03 — Nonlinear Function Approximation: Deep Neural Network
  • Lecture
  • Reference
    • [Ge23] Chap. 10, 11
    • [Ri20] Chap. 9 - 10
November 06 — Nonlinear Function Approximation: Convolution Neural Network
  • Lecture
  • Reference
    • [Ge23] Chap. 14
    • [Ri20] Chap. 9 - 10

Week 11
November 10 — Practice on Function Approximation
  • Lectures
  • Reference
    • [Ge23] Chap. 10, 11, 14
    • [Ri20] Chap. 9 - 10
November 13 — Deep-Q Network
  • Lecture
  • Reference
    • [Ri20] Chap. 11
    • Mnih, V., Kavukcuoglu, K., Silver, D. et al. “Human-level control through deep reinforcement learning”. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236
    • van Hasselt, H., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning". Proceedings of the AAAI Conference on Artificial Intelligence, 30 (1) (2016). https://doi.org/10.1609/aaai.v30i1.10295
    • Schaul, T., Quan, J., Antonoglou, I., Silver, D. "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952 (2015). https://arxiv.org/abs/1511.05952

Week 12
November 17 — Policy Gradient Methods
  • Lecture
  • Reference
    • [Ri20] Chap. 13
November 20 — Competition Round 3: Avoid Blurp!

Week 13
November 24 — Practice on Policy Gradient Methods
  • Lecture
  • Reference
    • [Ri20] Chap. 13

November 27 — Advanced Algorithms: Distributed Reinforcement Learning
  • Lecture
  • Reference
    • [Ri20] Chap. 13
    • Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. & Kavukcuoglu, K.. "Asynchronous Methods for Deep Reinforcement Learning". Proceedings of The 33rd International Conference on Machine Learning. 48:1928-1937 (2016). https://proceedings.mlr.press/v48/mniha16.html.

Week 14
December 01 — Advanced Algorithms: Policy Optimization
  • Lecture
  • Reference
    • Schulman, John, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. PMLR, 2015.
    • Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). https://arxiv.org/abs/1707.06347
December 04 — Advanced Algorithms: Deterministic Policy Gradient
  • Lecture
  • Reference
    • Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). https://arxiv.org/abs/1509.02971
    • Dankwa, Stephen and Zheng, Wenfeng. "Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent". In Proceedings of the 3rd International Conference on Vision, Image and Signal Processing (ICVISP 2019). Association for Computing Machinery, New York, NY, USA, Article 66, 1–5. (2020) https://doi.org/10.1145/3387168.3387199
    • Haarnoja, Tuomas, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018).
Week 15
December 08 — Preparing for Final Competition
  • No Class
December 11 — Competition Round 4: Al-Kka-Gi! - League Stage (Rise Group)

Week 16
December 15 — Competition Round 4: Al-Kka-Gi! - League Stage (Legend Group)
December 18 — Competition Round 5: Al-Kka-Gi! - Playoff