Help support MIT OpenCourseWare by shopping at Amazon.com! MIT OpenCourseWare offers direct links to Amazon.com to purchase the books cited in this course. Click on the Amazon logo to the left of any citation and purchase the book from Amazon.com, and MIT OpenCourseWare will receive up to 10% of all purchases you make. Your support will enable MIT to continue offering open access to MIT courses. |
Bertsekas = Bertsekas, Dimitri P. Dynamic Programming and Optimal Control. 2 vols. Belmont, MA: Athena Scientific, 2007. ISBN: 9781886529083.
Bertsekas and Tsitsiklis = Bertsekas, Dimitri P., and John N. Tsitsiklis. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996. ISBN: 9781886529106.
LEC # | TOPICS | READINGS |
---|---|---|
1 | Markov Decision Processes Finite-Horizon Problems: Backwards Induction Discounted-Cost Problems: Cost-to-Go Function, Bellman's Equation | Bertsekas Vol. 1, Chapter 1. |
2 | Value Iteration Existence and Uniqueness of Bellman's Equation Solution Gauss-Seidel Value Iteration | Bertsekas Vol. 2, Chapter 1. Bertsekas and Tsitsiklis, Chapter 2. |
3 | Optimality of Policies derived from the Cost-to-Go Function Policy Iteration Asynchronous Policy Iteration | Bertsekas Vol. 2, Chapter 1. Bertsekas and Tsitsiklis, Chapter 2. |
4 | Average-Cost Problems Relationship with Discounted-Cost Problems Bellman's Equation Blackwell Optimality | Bertsekas Vol. 2, Chapter 4. |
5 | Average-Cost Problems Computational Methods | Bertsekas Vol. 2, Chapter 4. |
6 | Application of Value Iteration to Optimization of Multiclass Queueing Networks Introduction to Simulation-based Methods Real-Time Value Iteration | Chen, R. R., and S. P. Meyn. "Value Iteration and Optimization of Multiclass Queueing Networks."Queueing Systems 32 (1999): 65-97. Bertsekas and Tsitsiklis, Chapter 5. |
7 | Q-Learning Stochastic Approximations | Bertsekas and Tsitsiklis, Chapters 4 and 5. |
8 | Stochastic Approximations: Lyapunov Function Analysis The ODE Method Convergence of Q-Learning | Bertsekas and Tsitsiklis, Chapters 4 and 5. |
9 | Exploration versus Exploitation: The Complexity of Reinforcement Learning | Kearns, M. , and S. Singh. "Near-Optional Reinforcement Learning in Polynomial Time." Machine Learning 49, no. 2 (Nov 2002): 209-232. |
10 | Introduction to Value Function Approximation Curse of Dimensionality Approximation Architectures | Bertsekas and Tsitsiklis, Chapter 6. |
11 | Model Selection and Complexity | Hastie, Tibshirani, and Friedmann. Chapter 7 in The Elements of Statistical Learning. New York: Springer, 2003. ISBN: 9780387952840. |
12 | Introduction to Value Function Approximation Algorithms Performance Bounds | Bertsekas and Tsitsiklis, Chapter 6. |
13 | Temporal-Difference Learning with Value Function Approximation | Bertsekas and Tsitsiklis, Chapter 6. |
14 | Temporal-Difference Learning with Value Function Approximation (cont.) | Bertsekas and Tsitsiklis, Chapter 6. de Farias, D. P., and B. Van Roy. "On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning." |
15 | Temporal-Difference Learning with Value Function Approximation (cont.) Optimal Stopping Problems General Control Problems | Bertsekas and Tsitsiklis, Chapter 6. de Farias, D. P., and B. Van Roy. "On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning." Bertsekas, Borkar, and Nedic. "Improved temporal Difference Methods with Linear Function Approximation." |
16 | Approximate Linear Programming | de Farias, D. P., and B. Van Roy. "The Linear Programming Approach to Approximate Dynamic Programming." |
17 | Approximate Linear Programming (cont.) | de Farias, D. P., and B. Van Roy. "The Linear Programming Approach to Approximate Dynamic Programming." |
18 | Efficient Solutions for Approximate Linear Programming | de Farias D. P., and B. Van Roy. "On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming." Calafiori, and Campi. "Uncertain Convex Programs: Randomized Solutions and Confidence Levels." |
19 | Efficient Solutions for Approximate Linear Programming: Factored MDPs | Guestrin, et al. "Efficient Solution Algorithms for Factored MDPs." Schuurmans, and Patrascu. "Direct Value Approximation for Factored MDPs." |
20 | Policy Search Methods | Marbach, and Tsitsiklis. "Simulation-Based Optimization of Markov Reward Processes." (PDF) |
21 | Policy Search Methods (cont.) | Baxter, and Bartlett. "Infinite-Horizon Policy-Gradient Estimation." |
22 | Policy Search Methods for POMDPs Application: Call Admission Control Actor-Critic Methods | Baxter, and Bartlett. "Infinite-Horizon Policy-Gradient Estimation." Baxter, and Bartlett. "Experiments with Infinite-Horizon Policy-Gradient Estimation." Konda, and Tsitsiklis. "Actor-Critic Algorithms." (PDF) |
23 | Guest Lecture: Prof. Nick Roy Approximate POMDP Compression | Roy, and Gordon. "Exponential Family PCA for Belief Compression in POMDPs." |
24 | Policy Search Methods: PEGASUS Application: Helicopter Control | Ng, and Jordan. "PEGASUS: A policy search method for large MDPs and POMDPs." Ng, et al. "Autonomous Helicopter Flight via Reinforcement Learning." |