Instruction
Course Staff
- Lecturer: Woohyeok Choi
- Office: #407, College of Engineering #6
- Mail: woohyeok.choi@kangwon.ac.kr
- Teaching Assistant: TBA
Time & Location
- Mon./Thu. 09:00 - 10:15, #600, College of Engineering #6
Office Hours
- Tue. 13:00 - 15:00
Textbook
- [Ri20] Reinforcement Learning: An Introduction, 2nd Ed., Richard S. Sutton and Andrew G. Barto, The MIT Press.
Prerequisite
- Python Programming
Grading Policy
Reinforcement Learning Competitions (90%)
- (5%) Competition Round 0: TBA
- (15%) Competition Round 1: TBA
- (20%) Competition Round 2: TBA
- (20%) Competition Round 3: TBA
- (30%) Competition Round 4: TBA
Attendance (10%)
- 1% of credit is deducted for each absence or each 3-lateness
- At least 11-Absence = F grade
Schedule
Week 01
March 05 — Overview & Logistics
March 09 — Basic Math
Week 02
March 12 — Introduction to Reinforcement Learning
- Lecture
- Reference
- [Ri20] Chap. 1
March 16 — Multi-Armed Bandits
- Lecture
- Reference
- [Ri20] Chap. 2
Week 03
March 19 — Markov Process
- Lecture
- Reference
- [Ri20] Chap. 3
March 23 — Dynamic Programming
- Lecture
- Reference
- [Ri20] Chap. 4
Week 04
March 26 — Tutorial on Gymnasium
- Lecture
- (Announce) Competition Round 0: Grid Crossing!
- Due: April 06
- Readings
March 30 — Monte-Carlo Methods: On-Policy Methods
- Lecture
- Reference
- [Ri20] Chap. 5
Week 05
April 02 — Monte-Carlo Methods: Off-Policy Methods
- Lecture
- Reference
- [Ri20] Chap. 5
April 06 — Competition Round 0: TBA
- Leaderboard
- (Announce) Competition Round 1: TBA
- Due: April 23
Week 06
April 09 — Temporal Difference Learning
- Lecture
- Reference
- [Ri20] Chap. 6
April 13 — n-Step Bootstrapping
- Lecture
- Reference
- [Ri20] Chap. 7
Week 07
April 16 — Planning & Learning
- Lecture
- Reference
- [Ri20] Chap. 8
April 20 — Linear Function Approximation
- Lecture
- Reference
- [Ri20] Chap. 9 - 10
Week 08
April 23 — Competition Round 1: TBA
- Leaderboard
- (Announce) Competition Round 2: TBA
- Due: May 11
April 27 — Nonlinear Function Approximation: Deep Neural Network
- Lecture
- Reference
- [Ge23] Chap. 10, 11
- [Ri20] Chap. 9 - 10
Week 09
April 30 — Nonlinear Function Approximation: Convolution Neural Network
- Lecture
- Reference
- [Ge23] Chap. 14
- [Ri20] Chap. 9 - 10
May 04 — Practice on Function Approximation
- Lectures
- Reference
- [Ge23] Chap. 10, 11, 14
- [Ri20] Chap. 9 - 10
Week 10
May 07 — Deep-Q Network
- Lecture
- Reference
- [Ri20] Chap. 11
- Mnih, V., Kavukcuoglu, K., Silver, D. et al. “Human-level control through deep reinforcement learning”. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236
May 11 — Competition Round 2: TBA
- Leaderboard
- (Announce) Competition Round 3: TBA
- Due: June 18
Week 11
May 14 — Policy Gradient Methods
- Lecture
- Reference
- [Ri20] Chap. 13
May 18 — Practice on Policy Gradient Methods
- Lecture
- Reference
- [Ri20] Chap. 13
Week 12
May 21 — Advanced Topics: Variants of DQN / Asynchronous Methods
- Lecture
- Reference
- Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. & Kavukcuoglu, K.. "Asynchronous Methods for Deep Reinforcement Learning". Proceedings of The 33rd International Conference on Machine Learning. 48:1928-1937 (2016). https://proceedings.mlr.press/v48/mniha16.html.
- van Hasselt, H., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning". Proceedings of the AAAI Conference on Artificial Intelligence, 30 (1) (2016). https://doi.org/10.1609/aaai.v30i1.10295
- Schaul, T., Quan, J., Antonoglou, I., Silver, D. "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952 (2015). https://arxiv.org/abs/1511.05952
- Wang, Ziyu, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. "Dueling network architectures for deep reinforcement learning." In International conference on machine learning, pp. 1995-2003. PMLR, 2016. https://proceedings.mlr.press/v48/wangf16.html
May 25 — Buddha's Birthday
- No Class
Week 13
May 28 — Competition Round 3: TBA
- Leaderboard
- (Announce) Competition Round 4: TBA
- Due: June 15
June 01 — Advanced Topics: Deterministic Policy Gradient Methods
- Lecture
- Reference
- Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). https://arxiv.org/abs/1509.02971
- Dankwa, Stephen and Zheng, Wenfeng. "Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent". In Proceedings of the 3rd International Conference on Vision, Image and Signal Processing (ICVISP 2019). Association for Computing Machinery, New York, NY, USA, Article 66, 1–5. (2020) https://doi.org/10.1145/3387168.3387199
Week 14
June 04 — Advanced Topics: Entropy-Regularized Methods
- Lecture
- Reference
- Haarnoja, Tuomas, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018). https://arxiv.org/abs/1812.05905
June 08 — Advanced Topics: Trust Region Constraint Methods
- Lecture
- Reference
- Schulman, John, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. PMLR, 2015.
- Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). https://arxiv.org/abs/1707.06347
Week 15
June 11 — Focus on RL Final Competition
- No Class