City Research Online

Approximating Optimal Control with Value Gradient Learning

Fairbank, M., Prokhorov, D. and Alonso, E. (2013). Approximating Optimal Control with Value Gradient Learning. In: Lewis, F. and Liu, D. (Eds.), Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. (pp. 142-161). Hoboken, NJ, USA: Wiley-IEEE Press. ISBN 111810420X

Abstract

In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to increased stability in the learning of control problems by a neural network

Publication Type: Book Section
Additional Information: Copyright © 2013 The Institute of Electrical and Electronics Engineers, Inc. Fairbank, M., Prokhorov, D. and Alonso, E. (2012) Approximating Optimal Control with Value Gradient Learning, in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (eds F. L. Lewis and D. Liu), John Wiley & Sons, Inc., Hoboken, NJ, USA. Published version can be found here: http://onlinelibrary.wiley.com/doi/10.1002/9781118453988.ch7/references.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Mathematics, Computer Science & Engineering > Computer Science
URI: http://openaccess.city.ac.uk/id/eprint/5192
[img]
Preview
Text - Accepted Version
Download (448kB) | Preview

Export

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login