2024 fall

인공지능 / Artificial Intelligence (4분반)

Reinforcement learning (RL) is one of the popular machine learning paradigms for solving sequential decision-making problems. In this paradigm, agents learn the optimal policies by repeatedly interacting with an environment to maximize (cumulative) rewards. This course will cover the foundational concepts of RL, including state-action-reward pairs, the Markov decision process, and exploration versus exploitation. In addition, we will learn key RL algorithms, such as the Monte Carlo method, temporal difference learning, function approximation, and policy gradients. Furthermore, you will work on a small team project to implement an RL agent to solve problems with different difficulties, from simple to complex ones.


Instruction

Course Staff
Time & Location
  • 월/목요일 09:00 - 10:15, 공학 6호관 609호
Office Hours
  • 화요일 13:00 - 15:00
  • 주의 사항
    • 수업 및 과제 관련 내용은 면담 대신 e루리 질의 응답 게시판에 올려서 모든 학생이 공유할 수 있도록 할 것
    • 수업 및 과제 관련 내용을 제외한 면담이 필요시 미리 이메일로 연락하여 일정을 잡을 것
Textbook
  • [Ri 20] Reinforcement Learning: An Introduction, 2nd Ed., Richard S. Sutton and Andrew G. Barto, The MIT Press.
Prerequisite
  • (필수) 파이썬 프로그래밍
  • (선택) 이산수학, 알고리즘
Grading Policy
  • Team Competitions: 90%
    • Competition Round 1: 25%
    • Competition Round 2: 30%
    • Competition Round 3: 35%
  • Attendance: 10%
    • 지각 3회 = 결석 1회
    • 결석 1회에 출석 점수 1% 차감
    • 총 수업 일의 1/3 (10회) 초과 결석 시 F
      • 즉, 11회 이상 결석 시 F
    • 별도의 사유(예. 예비군 훈련 등)가 있을 시 수업 시간 전에 교수 및 조교에게 이메일 송부
      • 단, 급하게 벌어진 사유(예. 급병, 친족상 등)는 소명 자료를 제출

Schedule

W01: Overview
September 02: Course Overview & Logistics
September 05: Basic Math

W02: Introduction / Multi-Armed Bandits
September 09: Introduction to Reinforcement Learning
  • Lecture
  • Reference
    • [Ri 20] Chap. 1
September 12: Multi-Armed Bandits
  • Lecture
  • Reference
    • [Ri 20] Chap. 2

W03: Markov Process
September 16: Chuseok Holiday
  • No Class
September 19: Markov Process
  • Lecture
  • Reference
    • [Ri 20] Chap. 3

W04: Dynamic Programming
September 23: Dynamic Programming? / Value Iteration
  • Lecture
  • Reference
    • [Ri 20] Chap. 4
September 26: Policy Iteration / Dynamic Programming vs. Reinforcement Learning
  • Lecture
  • Reference
    • [Ri 20] Chap. 4

W05: Tutorial on Gymnasium
September 30: Tutorial on Gymnasium
October 03: National Foundation Day
  • No Class

W06: Monte-Carlo Method
October 07: Monte-Carlo Method #1
  • Lecture
  • Reference
    • [Ri 20] Chap. 5
October 10: Monte-Carlo Method #2
  • Lecture
  • Reference
    • [Ri 20] Chap. 5

W07: Temporal Difference Learning / n-Step Bootstrapping
October 14: Temporal Difference Learning
  • Lecture
  • Reference
    • [Ri 20] Chap. 6
October 17: n-Step Bootstrapping
  • Lecture
  • Reference
    • [Ri 20] Chap. 7

W08: Planning & Learning
October 21: Planning & Learning #1
October 24: No Class
  • Focus on other midterm exams

W09: Planning & Learning
October 28: Planning & Learning #2
  • Lecture
  • Reference
    • [Ri 20] Chap. 8
October 31: Function Approximation #1
  • Lecture
  • Reference
    • [Ri 20] Chap. 9 - 10

W10: Team Competition Round 1
November 04: Team Competition Round 1 (Early-Bird Slot)
November 07: Team Competition Round 1 (Regular-Bird Slot)

W11: Function Approximation
November 11: Function Approximation #2
November 14: Function Approximation #3
  • Lecture
  • Reference
    • [Ri 20] Chap. 9 - 10

W12: Function Approximation
November 18: Function Approximation #4
  • Lecture
  • Reference
    • [Ri 20] Chap. 9 - 10
November 21: Deep-Q Network

W13: Team Competition Round 2
November 25: Team Competition Round 2 (Early-Bird Slot)
November 28: Team Competition Round 2 (Regular-Bird Slot)

W14: Policy Gradient Methods #1
December 02: Policy Gradient?
December 05: REINFORCE / Actor-Critic Methods
  • Lecture
  • Reference
    • [Ri 20] Chap. 13

W15: Policy Gradient Methods #2
December 09: Asynchronous Advantage Actor-Critic
  • Lecture
  • Reference
    • Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. & Kavukcuoglu, K.. :Asynchronous Methods for Deep Reinforcement Learning". Proceedings of The 33rd International Conference on Machine Learning. 48:1928-1937 (2016). https://proceedings.mlr.press/v48/mniha16.html.
December 12: No Class
  • Focus on (other) final exams

W16: Team Competition Round 3
December 16: Team Competition Round 3 (Early-Bird Slot)
December 19: Team Competition Round 3 (Regular-Bird Slot)