Instruction
Course Staff
- Lecturer: Woohyeok Choi
- Office: #407, College of Engineering #6
- Mail: woohyeok.choi@kangwon.ac.kr
- Teaching Assistant: TBA
Time & Location
- Mon./Thu. 09:00 - 10:15, #609, College of Engineering #6
Office Hours
- Tue. 13:00 - 15:00
Textbook
- [Ri20] Reinforcement Learning: An Introduction, 2nd Ed., Richard S. Sutton and Andrew G. Barto, The MIT Press.
Prerequisite
- Python Programming
Grading Policy
Reinforcement Learning Competitions (90%)
- (10%) Competition Round 0: Grid Crossing
- (20%) Competition Round 1: Zelda's Adventure
- (30%) Competition Round 2: Avoid Blurps
- (30%) Competition Round 3: Bullet Bills
Attendance (10%)
- 1% of credit is deducted for each absence or each 3-lateness
- At least 11-Absence = F grade
Schedule
Week 01
March 05 — Overview & Logistics
March 09 — Basic Math
Week 02
March 12 — Introduction to Reinforcement Learning
- Lecture
- Reference
- [Ri20] Chap. 1
March 16 — Multi-Armed Bandits
- Lecture
- Reference
- [Ri20] Chap. 2
Week 03
March 19 — Markov Process
- Lecture
- Reference
- [Ri20] Chap. 3
March 23 — Dynamic Programming
- Lecture
- Reference
- [Ri20] Chap. 4
Week 04
March 26 — Tutorial on Gymnasium & Dynamical Programming
- Lecture
- (Announce) Competition Round 0: Grid Crossing!
- Due: April 09
- Readings
March 30 — On-Policy Monte-Carlo Methods
- Lecture
- Reference
- [Ri20] Chap. 5
Week 05
April 02 — Off-Policy Monte-Carlo Methods
- Lecture
- Reference
- [Ri20] Chap. 5
April 06 — Temporal Difference Learning
- Lecture
- Reference
- [Ri20] Chap. 6
Week 06
April 09 — Competition Round 0: Grid Crossing
- Leaderboard
- (Announce) Competition Round 1: Zelda's Adventure
- Due: April 27
April 13 — Practice on Monte Carlo Methods & Temporal Difference Learning
Week 07
April 16 — n-Step Bootstrapping
- Lecture
- Reference
- [Ri20] Chap. 7
April 20 — Planning & Learning
- Lecture
- Reference
- [Ri20] Chap. 8
Week 08
April 23 — Linear Function Approximation
- Lecture
- Reference
- [Ri20] Chap. 9 - 10
April 27 — Competition Round 1: Zelda's Adventure
- Leaderboard
*[(Announce) Competition Round 2: Avoid Blurps
- Due: May 21
Week 09
April 30 — Nonlinear Function Approximation
- Lecture
- Reference
- [Ge23] Chap. 10, 11, 14
- [Ri20] Chap. 9 - 10
May 04 — Practice on Function Approximation
- Lectures
- Reference
- [Ge23] Chap. 10, 11, 14
- [Ri20] Chap. 9 - 10
Week 10
May 07 — Deep-Q Network
- Lecture
- Reference
- [Ri20] Chap. 11
- Mnih, V., Kavukcuoglu, K., Silver, D. et al. “Human-level control through deep reinforcement learning”. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236
May 11 — Policy Gradient Methods
- Lecture
- Reference
- [Ri20] Chap. 13
Week 11
May 14 — Practice on Policy Gradient Methods
- Lecture
- Reference
- [Ri20] Chap. 13
May 18 — Advanced Topics: Variants of DQN
- Lecture
- Reference
- van Hasselt, H., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning". Proceedings of the AAAI Conference on Artificial Intelligence, 30 (1) (2016). https://doi.org/10.1609/aaai.v30i1.10295
- Schaul, T., Quan, J., Antonoglou, I., Silver, D. "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952 (2015). https://arxiv.org/abs/1511.05952
- Wang, Ziyu, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. "Dueling network architectures for deep reinforcement learning." In International conference on machine learning, pp. 1995-2003. PMLR, 2016. https://proceedings.mlr.press/v48/wangf16.html
Week 12
May 21 — Competition Round 2: Avoid Blurps
- Leaderboard
- (Announce) Competition Round 3: Bullet Bills
- Due: June 15
May 25 — Buddha's Birthday
- No Class
Week 13
May 28 — Advanced Topics: Deterministic Policy Gradient Methods
- Lecture
- Reference
- Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). https://arxiv.org/abs/1509.02971
- Scott Fujimoto, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." In International conference on machine learning, pp. 1587-1596. PMLR (2018).
June 01 — Advanced Topics: Entropy Maximization
- Lecture
- Reference
- Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine. "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor". International Conference on Machine Learning (2018).
- Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018). https://arxiv.org/abs/1812.05905
Week 14
June 04 — Advanced Topics: Trust Region Constraint Methods
- Lecture
- Reference
- John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. PMLR (2015).
June 08 — Advanced Topics: Proximal Policy Optimization
- Lecture
- Reference
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). https://arxiv.org/abs/1707.06347
- John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel. "High-Dimensional Continuous Control Using Generalized Advantage Estimation." arXiv preprint arXiv:1506.02438 (2018). https://arxiv.org/abs/1506.02438\
- Volodymyr Mnih et al. "Asynchronous methods for deep reinforcement learning." In International conference on machine learning, pp. 1928-1937. PMLR. (2016).
- Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. "Curiosity-driven exploration by self-supervised prediction." In International conference on machine learning, pp. 2778-2787. PMLR. (2017).
- Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. "Exploration by random network distillation." arXiv preprint arXiv:1810.12894 (2018).
Week 15
June 11 — Focus on Final Competition
- No Class