Lecture - 2026 Reinforcement Learning | P4C Lab

2026 spring

Reinforcement Learning (1 Div.)

Reinforcement learning (RL) is one of the popular machine learning paradigms for solving sequential decision-making problems. In this paradigm, agents learn the optimal policies by repeatedly interacting with an environment to maximize (cumulative) rewards. This courses will cover the foundational concepts of RL, including state-action-reward pairs, the Markov decision process, and exploration versus exploitation. In addition, we will learn key RL algorithms, such as the Monte Carlo method, temporal difference learning, function approximation, and policy gradients. Furthermore, you will work on a small team project to implement an RL agent to solve problems with different difficulties, from simple to complex ones.

Instruction

Course Staff

Lecturer: Woohyeok Choi
- Office: #407, College of Engineering #6
- Mail: woohyeok.choi@kangwon.ac.kr
Teaching Assistant: TBA

Time & Location

Mon./Thu. 09:00 - 10:15, #609, College of Engineering #6

Office Hours

Tue. 13:00 - 15:00

Textbook

[Ri20] Reinforcement Learning: An Introduction, 2nd Ed., Richard S. Sutton and Andrew G. Barto, The MIT Press.

Prerequisite

Python Programming

Grading Policy

Reinforcement Learning Competitions (90%)

(10%) Competition Round 0: TBA
(20%) Competition Round 1: TBA
(30%) Competition Round 2: TBA
(40%) Competition Round 3: TBA

Attendance (10%)

1% of credit is deducted for each absence or each 3-lateness
At least 11-Absence = F grade

Schedule

Week 01

March 05 — Overview & Logistics

Lecture

March 09 — Basic Math

Lecture

Week 02

March 12 — Introduction to Reinforcement Learning

Lecture
Reference
- [Ri20] Chap. 1

March 16 — Multi-Armed Bandits

Lecture
Reference
- [Ri20] Chap. 2

Week 03

March 19 — Markov Process

Lecture
Reference
- [Ri20] Chap. 3

March 23 — Dynamic Programming

Lecture
Reference
- [Ri20] Chap. 4

Week 04

March 26 — Tutorial on Gymnasium

Lecture
(Announce) Competition Round 0: Grid Crossing!
- Due: April 06
Readings
- OpenAI Gymnasium

March 30 — Monte-Carlo Methods: On-Policy Methods

Lecture
Reference
- [Ri20] Chap. 5

Week 05

April 02 — Monte-Carlo Methods: Off-Policy Methods

Lecture
Reference
- [Ri20] Chap. 5

April 06 — Competition Round 0: TBA

Leaderboard
(Announce) Competition Round 1: TBA
- Due: April 20

Week 06

April 09 — Temporal Difference Learning

Lecture
Reference
- [Ri20] Chap. 6

April 13 — n-Step Bootstrapping

Lecture
Reference
- [Ri20] Chap. 7

Week 07

April 16 — Planning & Learning

Lecture
Reference
- [Ri20] Chap. 8

April 20 — Competition Round 1: TBA

Leaderboard
(Announce) Competition Round 2: TBA
- Due: May 21

Week 08

April 23 — Focus on Midterm Exam

No Class

April 27 — Linear Function Approximation

Lecture
Reference
- [Ri20] Chap. 9 - 10

Week 09

April 30 — Nonlinear Function Approximation: Deep Neural Network

Lecture
Reference
- [Ge23] Chap. 10, 11
- [Ri20] Chap. 9 - 10

May 04 — Nonlinear Function Approximation: Convolution Neural Network

Lecture
Reference
- [Ge23] Chap. 14
- [Ri20] Chap. 9 - 10

Week 10

May 07 — Practice on Function Approximation

Lectures
Reference
- [Ge23] Chap. 10, 11, 14
- [Ri20] Chap. 9 - 10

May 11 — Deep-Q Network

Lecture
Reference
- [Ri20] Chap. 11
- Mnih, V., Kavukcuoglu, K., Silver, D. et al. “Human-level control through deep reinforcement learning”. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236

Week 11

May 14 — Policy Gradient Methods

Lecture
Reference
- [Ri20] Chap. 13

May 18 — Practice on Policy Gradient Methods

Lecture
Reference
- [Ri20] Chap. 13

Week 12

May 21 — Competition Round 2: TBA

Leaderboard
(Announce) Competition Round 3: TBA
- Due: June 15

May 25 — Buddha's Birthday

No Class

Week 13

May 28 — Advanced Topics: Variants of DQN / Asynchronous Methods

Lecture
Reference
- Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. & Kavukcuoglu, K.. "Asynchronous Methods for Deep Reinforcement Learning". Proceedings of The 33rd International Conference on Machine Learning. 48:1928-1937 (2016). https://proceedings.mlr.press/v48/mniha16.html.
- van Hasselt, H., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning". Proceedings of the AAAI Conference on Artificial Intelligence, 30 (1) (2016). https://doi.org/10.1609/aaai.v30i1.10295
- Schaul, T., Quan, J., Antonoglou, I., Silver, D. "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952 (2015). https://arxiv.org/abs/1511.05952
- Wang, Ziyu, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. "Dueling network architectures for deep reinforcement learning." In International conference on machine learning, pp. 1995-2003. PMLR, 2016. https://proceedings.mlr.press/v48/wangf16.html

June 01 — Advanced Topics: Deterministic Policy Gradient Methods

Lecture
Reference
- Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). https://arxiv.org/abs/1509.02971
- Dankwa, Stephen and Zheng, Wenfeng. "Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent". In Proceedings of the 3rd International Conference on Vision, Image and Signal Processing (ICVISP 2019). Association for Computing Machinery, New York, NY, USA, Article 66, 1–5. (2020) https://doi.org/10.1145/3387168.3387199

Week 14

June 04 — Advanced Topics: Entropy-Regularized Methods

Lecture
Reference
- Haarnoja, Tuomas, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018). https://arxiv.org/abs/1812.05905

June 08 — Advanced Topics: Trust Region Constraint Methods

Lecture
Reference
- Schulman, John, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. PMLR, 2015.
- Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). https://arxiv.org/abs/1707.06347

Week 15

June 11 — Focus on RL Final Competition

No Class

June 15 — Competition Round 3: TBA

Leaderboard

Week 16

June 18 — Final Remark

Lecture