Approximating Optimal Control with Value Gradient Learning

Fairbank, M., Prokhorov, D. & Alonso, E. (2013). Approximating Optimal Control with Value Gradient Learning. In: F. Lewis & D. Liu (Eds.), Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. (pp. 142-161). Hoboken, NJ, USA: Wiley-IEEE Press. ISBN 111810420X

[img]
Preview
Text - Accepted Version
Download (448kB) | Preview

Abstract

In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to increased stability in the learning of control problems by a neural network

Item Type: Book Section
Additional Information: Copyright © 2013 The Institute of Electrical and Electronics Engineers, Inc. Fairbank, M., Prokhorov, D. and Alonso, E. (2012) Approximating Optimal Control with Value Gradient Learning, in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (eds F. L. Lewis and D. Liu), John Wiley & Sons, Inc., Hoboken, NJ, USA. Published version can be found here: http://onlinelibrary.wiley.com/doi/10.1002/9781118453988.ch7/references.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: School of Informatics > Department of Computing
URI: http://openaccess.city.ac.uk/id/eprint/5192

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics