The divergence of reinforcement learning algorithms with value-iteration and function approximation
Fairbank, M. & Alonso, E. (2012). The divergence of reinforcement learning algorithms with value-iteration and function approximation. Paper presented at the The 2012 International Joint Conference on Neural Networks (IJCNN), 10-06-2012 - 15-06-2012, Brisbane, Australia. doi: 10.1109/IJCNN.2012.6252792
Abstract
This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a “value iteration” scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.
Publication Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Publisher Keywords: | Adaptive Dynamic Programming, Reinforcement Learning, Greedy Policy, Value Iteration, Divergence |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Departments: | School of Science & Technology > Computer Science |
Download (406kB) | Preview
Export
Downloads
Downloads per month over past year