City Research Online

An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time

Fairbank, M., Alonso, E. & Prokhorov, D. (2013). An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time. IEEE Transactions on Neural Networks and Learning Systems, 24(12), pp. 2088-2100. doi: 10.1109/tnnls.2013.2271778

Abstract

We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP into a new algorithm that we call Value-Gradient Learning, VGL(λ), and prove equivalence of an instance of the new algorithm to Backpropagation Through Time for Control with a greedy policy. Not only does this equivalence provide a link between these two different approaches, but it also enables our variant of DHP to have guaranteed convergence, under certain smoothness conditions and a greedy policy, when using a general smooth nonlinear function approximator for the critic. We consider several experimental scenarios including some that prove divergence of DHP under a greedy policy, which contrasts against our proven-convergent algorithm.

Publication Type: Article
Additional Information: (c) 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Publisher Keywords: Adaptive Dynamic Programming, Dual Heuristic Programming, Value-Gradient Learning, Backpropagation Through Time, Neural Networks
Subjects: B Philosophy. Psychology. Religion > BF Psychology
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
R Medicine > RC Internal medicine > RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry
Departments: School of Science & Technology > Computer Science
Related URLs:
SWORD Depositor:
[thumbnail of EQUIV-RCO.pdf]
Preview
PDF - Accepted Version
Download (582kB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login