Lecture - 2026 Reinforcement Learning | P4C Lab

2026 spring

Reinforcement Learning (1 Div.)

Reinforcement learning (RL) is one of the popular machine learning paradigms for solving sequential decision-making problems. In this paradigm, agents learn the optimal policies by repeatedly interacting with an environment to maximize (cumulative) rewards. This courses will cover the foundational concepts of RL, including state-action-reward pairs, the Markov decision process, and exploration versus exploitation. In addition, we will learn key RL algorithms, such as the Monte Carlo method, temporal difference learning, function approximation, and policy gradients. Furthermore, you will work on a small team project to implement an RL agent to solve problems with different difficulties, from simple to complex ones.

Instruction

Course Staff

Lecturer: Woohyeok Choi
- Office: #407, College of Engineering #6
- Mail: woohyeok.choi@kangwon.ac.kr
Teaching Assistant: TBA

Time & Location

Mon./Thu. 09:00 - 10:15, #609, College of Engineering #6

Office Hours

Tue. 13:00 - 15:00

Textbook

[Ri20] Reinforcement Learning: An Introduction, 2nd Ed., Richard S. Sutton and Andrew G. Barto, The MIT Press.

Prerequisite

Python Programming

Grading Policy

Reinforcement Learning Competitions (90%)

(10%) Competition Round 0: Grid Crossing
(20%) Competition Round 1: Zelda's Adventure
(30%) Competition Round 2: Avoid Blurps
(30%) Competition Round 3: Bullet Bills

Attendance (10%)

1% of credit is deducted for each absence or each 3-lateness
At least 11-Absence = F grade

Schedule

Week 01

March 05 — Overview & Logistics

Lecture

March 09 — Basic Math

Lecture

Week 02

March 12 — Introduction to Reinforcement Learning

Lecture
Reference
- [Ri20] Chap. 1

March 16 — Multi-Armed Bandits

Lecture
Reference
- [Ri20] Chap. 2

Week 03

March 19 — Markov Process

Lecture
Reference
- [Ri20] Chap. 3

March 23 — Dynamic Programming

Lecture
Reference
- [Ri20] Chap. 4

Week 04

March 26 — Tutorial on Gymnasium & Dynamical Programming

Lecture
(Announce) Competition Round 0: Grid Crossing!
- Due: April 09
Readings
- OpenAI Gymnasium

March 30 — On-Policy Monte-Carlo Methods

Lecture
Reference
- [Ri20] Chap. 5

Week 05

April 02 — Off-Policy Monte-Carlo Methods

Lecture
Reference
- [Ri20] Chap. 5

April 06 — Temporal Difference Learning

Lecture
Reference
- [Ri20] Chap. 6

Week 06

April 09 — Competition Round 0: Grid Crossing

Leaderboard
(Announce) Competition Round 1: Zelda's Adventure
- Due: April 27

April 13 — Practice on Monte Carlo Methods & Temporal Difference Learning

Lecture

Week 07

April 16 — n-Step Bootstrapping

Lecture
Reference
- [Ri20] Chap. 7

April 20 — Planning & Learning

Lecture
Reference
- [Ri20] Chap. 8

Week 08

April 23 — Linear Function Approximation

Lecture
Reference
- [Ri20] Chap. 9 - 10

April 27 — Competition Round 1: Zelda's Adventure

Leaderboard *[(Announce) Competition Round 2: Avoid Blurps
- Due: May 21

Week 09

April 30 — Nonlinear Function Approximation

Lecture
Reference
- [Ge23] Chap. 10, 11, 14
- [Ri20] Chap. 9 - 10

May 04 — Practice on Function Approximation

Lectures
Reference
- [Ge23] Chap. 10, 11, 14
- [Ri20] Chap. 9 - 10

Week 10

May 07 — Deep-Q Network

Lecture
Reference
- [Ri20] Chap. 11
- Mnih, V., Kavukcuoglu, K., Silver, D. et al. “Human-level control through deep reinforcement learning”. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236

May 11 — Policy Gradient Methods

Lecture
Reference
- [Ri20] Chap. 13

Week 11

May 14 — Practice on Policy Gradient Methods

Lecture
Reference
- [Ri20] Chap. 13

May 18 — Advanced Topics: Variants of DQN

Lecture
Reference
- van Hasselt, H., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning". Proceedings of the AAAI Conference on Artificial Intelligence, 30 (1) (2016). https://doi.org/10.1609/aaai.v30i1.10295
- Schaul, T., Quan, J., Antonoglou, I., Silver, D. "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952 (2015). https://arxiv.org/abs/1511.05952
- Wang, Ziyu, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. "Dueling network architectures for deep reinforcement learning." In International conference on machine learning, pp. 1995-2003. PMLR, 2016. https://proceedings.mlr.press/v48/wangf16.html

Week 12

May 21 — Competition Round 2: Avoid Blurps

Leaderboard
(Announce) Competition Round 3: Bullet Bills
- Due: June 15

May 25 — Buddha's Birthday

No Class

Week 13

May 28 — Advanced Topics: Deterministic Policy Gradient Methods

Lecture
Reference
- Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). https://arxiv.org/abs/1509.02971
- Scott Fujimoto, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." In International conference on machine learning, pp. 1587-1596. PMLR (2018).

June 01 — Advanced Topics: Entropy Maximization

Lecture
Reference
- Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine. "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor". International Conference on Machine Learning (2018).
- Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018). https://arxiv.org/abs/1812.05905

Week 14

June 04 — Advanced Topics: Trust Region Constraint Methods

Lecture
Reference
- John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. PMLR (2015).

June 08 — Advanced Topics: Proximal Policy Optimization

Lecture
Reference
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). https://arxiv.org/abs/1707.06347
- John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel. "High-Dimensional Continuous Control Using Generalized Advantage Estimation." arXiv preprint arXiv:1506.02438 (2018). https://arxiv.org/abs/1506.02438\
- Volodymyr Mnih et al. "Asynchronous methods for deep reinforcement learning." In International conference on machine learning, pp. 1928-1937. PMLR. (2016).
- Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. "Curiosity-driven exploration by self-supervised prediction." In International conference on machine learning, pp. 2778-2787. PMLR. (2017).
- Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. "Exploration by random network distillation." arXiv preprint arXiv:1810.12894 (2018).

Week 15

June 11 — Focus on Final Competition

No Class

June 15 — Competition Round 3: Bullet Bills

Leaderboard

Week 16

June 18 — Final Remark