The divergence of reinforcement learning algorithms with value-iteration and function approximation

Fairbank, M. & Alonso, E. (2012). The divergence of reinforcement learning algorithms with value-iteration and function approximation. Paper presented at the The 2012 International Joint Conference on Neural Networks (IJCNN), 10-06-2012 - 15-06-2012, Brisbane, Australia.

[img]
Preview
Text - Accepted Version
Download (406kB) | Preview

Abstract

This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a “value iteration” scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.

Item Type: Conference or Workshop Item (Paper)
Additional Information: © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords: Adaptive Dynamic Programming, Reinforcement Learning, Greedy Policy, Value Iteration, Divergence
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: School of Informatics > Department of Computing
URI: http://openaccess.city.ac.uk/id/eprint/5203

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics