Approximating Optimal Control with Value Gradient Learning
Fairbank, M., Prokhorov, D. & Alonso, E. (2013). Approximating Optimal Control with Value Gradient Learning. In: Lewis, F. & Liu, D. (Eds.), Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. (pp. 142-161). Hoboken, NJ, USA: Wiley-IEEE Press. doi: 10.1002/9781118453988
Abstract
In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to increased stability in the learning of control problems by a neural network
Publication Type: | Book Section |
---|---|
Additional Information: | Copyright © 2013 The Institute of Electrical and Electronics Engineers, Inc. Fairbank, M., Prokhorov, D. and Alonso, E. (2012) Approximating Optimal Control with Value Gradient Learning, in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (eds F. L. Lewis and D. Liu), John Wiley & Sons, Inc., Hoboken, NJ, USA. Published version can be found here: http://onlinelibrary.wiley.com/doi/10.1002/9781118453988.ch7/references. |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Departments: | School of Science & Technology > Computer Science |
Download (448kB) | Preview
Export
Downloads
Downloads per month over past year