Lecture - 2025 Artificial Intelligence | P4C Lab

2025 fall

Artificial Intelligence (2 Div.)

Reinforcement learning (RL) is one of the popular machine learning paradigms for solving sequential decision-making problems. In this paradigm, agents learn the optimal policies by repeatedly interacting with an environment to maximize (cumulative) rewards. This courses will cover the foundational concepts of RL, including state-action-reward pairs, the Markov decision process, and exploration versus exploitation. In addition, we will learn key RL algorithms, such as the Monte Carlo method, temporal difference learning, function approximation, and policy gradients. Furthermore, you will work on a small team project to implement an RL agent to solve problems with different difficulties, from simple to complex ones.

Instruction

Course Staff

Lecturer: Woohyeok Choi
- Office: #407, College of Engineering #6
- Mail: woohyeok.choi@kangwon.ac.kr
Teaching Assistant: TBA

Time & Location

Mon./Thu. 09:00 - 10:15, #609, College of Engineering #6

Office Hours

Tue. 13:00 - 15:00

Textbook

[Ri 20] Reinforcement Learning: An Introduction, 2nd Ed., Richard S. Sutton and Andrew G. Barto, The MIT Press.

Prerequisite

Python Programming

Grading Policy

Reinforcement Learning Competitions (90%)

Round 1 (10%)
Round 2 (14%)
Round 3 (18%)
Round 4 (22%)
Round 5 (26%)

Attendance (10%)

1% of credit is deducted for each absence
3-Lateness = 1-Absence
At least 11-Absence = F grade

Schedule

Week 01

September 01 — Overview & Logistics

Lecture

September 04 — Basic Math

Lecture

Week 02

September 08 — Introduction to Reinforcement Learning

Lecture
Reference
- [Ri 20] Chap. 1

September 11 — Multi-Armed Bandits

Lecture
Reference
- [Ri 20] Chap. 2

Week 03

September 15 — Markov Process

Lecture
Reference
- [Ri 20] Chap. 3

September 18 — Dynamic Programming: Value Iteration

Lecture
Reference
- [Ri 20] Chap. 4

Week 04

September 22 — Dynamic Programming: Policy Iteration

Lecture
Reference
- [Ri 20] Chap. 4
(Announce) Competition Round 1
- Due: October 02
Readings
- OpenAI Gymnasium
- Tutorial on Gymnasium

September 25 — Monte-Carlo Methods: On-Policy Methods

Lecture
Reference
- [Ri 20] Chap. 5

Week 05

September 29 — Monte-Carlo Methods: Off-Policy Methods

Lecture
Reference
- [Ri 20] Chap. 5

October 02 — Competition Round 1

Leaderboard
(Announce) Competition Round 2
- Due: October 20

Week 06

October 06 — Chuseok Holiday

No Class

October 09 — Hangul day

No Class

Week 07

October 13 — Temporal Difference Learning

Lecture
Reference
- [Ri 20] Chap. 6

October 16 — n-Step Bootstrapping

Lecture
Reference
- [Ri 20] Chap. 7

Week 08

October 20 — Competition Round 2

Leaderboard
(Announce) Competition Round 3
- Due: November 06

October 23 — Focus on Midterm Exam

No Class

Week 09

October 27 — Planning & Learning: Model-based Methods & Experience Sampling

Lecture
Reference
- [Ri 20] Chap. 8

October 30 — Planning & Learning: Trajectory Sampling & Decision-Time Planning

Lecture
Reference
- [Ri 20] Chap. 8

Week 10

November 03 — Function Approximation: Basics & Non-Parametric FA

Lecture
Reference
- [Ri 20] Chap. 8

November 06 — Competition Round 3

Leaderboard
(Announce) Competition Round 4
- Due: November 24

Week 11:

November 10 — Function Approximation: Linear FA

Lecture
Reference
- [Ri 20] Chap. 9 - 10

November 13 — Function Approximation: Artificial Neural Network

Lecture
Reference
- [Ri 20] Chap. 9 - 10

Week 12

November 17 — Function Approximation: Convolution Neural Network

Lecture
Reference
- [Ri 20] Chap. 9 - 10

November 20 — Function Approximation: Deep-Q Network

Lecture
Reference
- Mnih, V., Kavukcuoglu, K., Silver, D. et al. “Human-level control through deep reinforcement learning”. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236
- van Hasselt, H., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning". Proceedings of the AAAI Conference on Artificial Intelligence, 30 (1) (2016). https://doi.org/10.1609/aaai.v30i1.10295
- Schaul, T., Quan, J., Antonoglou, I., Silver, D. "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952 (2015). https://arxiv.org/abs/1511.05952

Week 13

November 24 — Competition Round 4

Leaderboard
(Announce) Competition Round 5
- Due: December 11

November 27 — Policy Gradient: Basics

Lecture
Reference
- [Ri 20] Chap. 13

Week 14

December 01 — Policy Gradient: REINFORCE & Actor-Critic Methods

Lecture
Reference
- [Ri 20] Chap. 13

December 04 — Policy Gradient: Actor-Critic with Eligibility Traces & Asynchronous Advantage Actor-Critic

Lecture
Reference
- [Ri 20] Chap. 13
- Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. & Kavukcuoglu, K.. "Asynchronous Methods for Deep Reinforcement Learning". Proceedings of The 33rd International Conference on Machine Learning. 48:1928-1937 (2016). https://proceedings.mlr.press/v48/mniha16.html.

Week 15

December 08 — Focus on Final Exam

No Class

December 11 — Competition Round 5: League Stage

Leaderboard

Week 16

December 15 — Competition Round 5: Playoff Stage

Leaderboard

December 18 — Final Remark

Lecture