Instruction
Course Staff
- Lecturer: Woohyeok Choi
- Office: #407, College of Engineering #6
- Mail: woohyeok.choi@kangwon.ac.kr
- Teaching Assistant: Geonwoo Choi
- Office: #416, College of Engineering #6
- Mail: geonwoo.choi@kangwon.ac.kr
Time & Location
- Mon./Thu. 09:00 - 10:15, #609, College of Engineering #6
Office Hours
- Tue. 13:00 - 15:00
Textbook
- [Ri20] Reinforcement Learning: An Introduction, 2nd Ed., Richard S. Sutton and Andrew G. Barto, The MIT Press.
Prerequisite
- Python Programming
Grading Policy
Reinforcement Learning Competitions (90%)
- Round 1: Grid Crossing! (10%)
- Round 2: Grid Adventure! (20%)
- Round 3: Avoid Blurp! (20%)
- Round 4: Al-kka-gi! (40%)
Attendance (10%)
- 1% of credit is deducted for each absence
- 3-Lateness = 1-Absence
- At least 11-Absence = F grade
Schedule
Week 01
September 01 — Overview & Logistics
September 04 — Basic Math
Week 02
September 08 — Introduction to Reinforcement Learning
- Lecture
- Reference
- [Ri20] Chap. 1
September 11 — Multi-Armed Bandits
- Lecture
- Reference
- [Ri20] Chap. 2
Week 03
September 15 — Markov Process
- Lecture
- Reference
- [Ri20] Chap. 3
September 18 — Dynamic Programming
- Lecture
- Reference
- [Ri20] Chap. 4
Week 04
September 22 — Tutorial on Gymnasium
- Practice
- (Announce) Competition Round 1: Grid Crossing!
- Due: Oct. 13
- Readings
September 25 — Monte-Carlo Methods: On-Policy Methods
- Lecture
- Reference
- [Ri20] Chap. 5
Week 05
September 29 — Monte-Carlo Methods: Off-Policy Methods
- Lecture
- Reference
- [Ri20] Chap. 5
October 02 — Temporal Difference Learning
- Lecture
- Reference
- [Ri20] Chap. 6
Week 06
October 06 — Chuseok Holiday
- No Class
October 09 — Hangul day
- No Class
Week 07
October 13 — Competition Round 1: Grid Crossing!
October 16 — n-Step Bootstrapping
- Lecture
- Reference
- [Ri20] Chap. 7
Week 08
October 20 — Planning & Learning
- Lecture
- Reference
- [Ri20] Chap. 8
October 23 — Focus on Midterm Exam
- No Class
Week 09
October 27 — Linear Function Approximation
- Lecture
- Reference
- [Ri20] Chap. 9 - 10
October 30 — Competition Round 2: Grid Adventure!
- Leaderboard
- (Announce) Competition Round 3: Avoid Blurp!
- Due: Nov. 20
Week 10
November 03 — Nonlinear Function Approximation: Deep Neural Network
- Lecture
- Reference
- [Ge23] Chap. 10, 11
- [Ri20] Chap. 9 - 10
November 06 — Nonlinear Function Approximation: Convolution Neural Network
- Lecture
- Reference
- [Ge23] Chap. 14
- [Ri20] Chap. 9 - 10
Week 11
November 10 — Practice on Function Approximation
- Lectures
- Reference
- [Ge23] Chap. 10, 11, 14
- [Ri20] Chap. 9 - 10
November 13 — Deep-Q Network
- Lecture
- Reference
- [Ri20] Chap. 11
- Mnih, V., Kavukcuoglu, K., Silver, D. et al. “Human-level control through deep reinforcement learning”. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236
- van Hasselt, H., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning". Proceedings of the AAAI Conference on Artificial Intelligence, 30 (1) (2016). https://doi.org/10.1609/aaai.v30i1.10295
- Schaul, T., Quan, J., Antonoglou, I., Silver, D. "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952 (2015). https://arxiv.org/abs/1511.05952
Week 12
November 17 — Policy Gradient Methods
- Lecture
- Reference
- [Ri20] Chap. 13
November 20 — Competition Round 3: Avoid Blurp!
- Leaderboard
- (Announce) Competition Round 4: Al-Kka-Gi!
- Due: Dec. 11
Week 13
November 24 — Practice on Policy Gradient Methods
- Lecture
- Reference
-
[Ri20] Chap. 13
-
November 27 — Advanced Algorithms: Distributed Reinforcement Learning
- Lecture
- Reference
- [Ri20] Chap. 13
- Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. & Kavukcuoglu, K.. "Asynchronous Methods for Deep Reinforcement Learning". Proceedings of The 33rd International Conference on Machine Learning. 48:1928-1937 (2016). https://proceedings.mlr.press/v48/mniha16.html.
Week 14
December 01 — Advanced Algorithms: Policy Optimization
- Lecture
- Reference
- Schulman, John, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. "Trust region policy optimization." In International conference on machine learning, pp. 1889-1897. PMLR, 2015.
- Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). https://arxiv.org/abs/1707.06347
December 04 — Advanced Algorithms: Deterministic Policy Gradient
- Lecture
- Reference
- Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). https://arxiv.org/abs/1509.02971
- Dankwa, Stephen and Zheng, Wenfeng. "Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent". In Proceedings of the 3rd International Conference on Vision, Image and Signal Processing (ICVISP 2019). Association for Computing Machinery, New York, NY, USA, Article 66, 1–5. (2020) https://doi.org/10.1145/3387168.3387199
- Haarnoja, Tuomas, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018).
Week 15
December 08 — Preparing for Final Competition
- No Class