City Research Online

The divergence of reinforcement learning algorithms with value-iteration and function approximation

Fairbank, M. and Alonso, E. (2012). The divergence of reinforcement learning algorithms with value-iteration and function approximation. Paper presented at the The 2012 International Joint Conference on Neural Networks (IJCNN), 10-06-2012 - 15-06-2012, Brisbane, Australia.

Abstract

This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a “value iteration” scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.

Publication Type: Conference or Workshop Item (Paper)
Additional Information: © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Publisher Keywords: Adaptive Dynamic Programming, Reinforcement Learning, Greedy Policy, Value Iteration, Divergence
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Mathematics, Computer Science & Engineering > Computer Science
URI: http://openaccess.city.ac.uk/id/eprint/5203
[img]
Preview
Text - Accepted Version
Download (406kB) | Preview

Export

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login