An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

Fairbank, M., Alonso, E. & Prokhorov, D. (2013). An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time. IEEE Transactions on Neural Networks and Learning Systems, 24(12), pp. 2088-2100. doi: 10.1109/TNNLS.2013.2271778

[img]
Preview
PDF - Accepted Version
Download (582kB) | Preview

Abstract

We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP into a new algorithm that we call Value-Gradient Learning, VGL(λ), and prove equivalence of an instance of the new algorithm to Backpropagation Through Time for Control with a greedy policy. Not only does this equivalence provide a link between these two different approaches, but it also enables our variant of DHP to have guaranteed convergence, under certain smoothness conditions and a greedy policy, when using a general smooth nonlinear function approximator for the critic. We consider several experimental scenarios including some that prove divergence of DHP under a greedy policy, which contrasts against our proven-convergent algorithm.

Item Type: Article
Additional Information: (c) 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Uncontrolled Keywords: Adaptive Dynamic Programming, Dual Heuristic Programming, Value-Gradient Learning, Backpropagation Through Time, Neural Networks
Subjects: B Philosophy. Psychology. Religion > BF Psychology
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
R Medicine > RC Internal medicine > RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry
Divisions: School of Informatics > Department of Computing
Related URLs:
URI: http://openaccess.city.ac.uk/id/eprint/5184

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics