The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation

Pezzulo, G.; Rigoli, F.; Chersi, F.

The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation

Pezzulo, G., Rigoli, F. & Chersi, F. (2013). The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation. Frontiers in Psychology, 4, article number 92. doi: 10.3389/fpsyg.2013.00092

Abstract

Instrumental behavior depends on both goal-directed and habitual mechanisms of choice. Normative views cast these mechanisms in terms of model-free and model-based methods of reinforcement learning, respectively. An influential proposal hypothesizes that model-free and model-based mechanisms coexist and compete in the brain according to their relative uncertainty. In this paper we propose a novel view in which a single Mixed Instrumental Controller produces both goal-directed and habitual behavior by flexibly balancing and combining model-based and model-free computations. The Mixed Instrumental Controller performs a cost-benefits analysis to decide whether to chose an action immediately based on the available “cached” value of actions (linked to model-free mechanisms) or to improve value estimation by mentally simulating the expected outcome values (linked to model-based mechanisms). Since mental simulation entails cognitive effort and increases the reward delay, it is activated only when the associated “Value of Information” exceeds its costs. The model proposes a method to compute the Value of Information, based on the uncertainty of action values and on the distance of alternative cached action values. Overall, the model by default chooses on the basis of lighter model-free estimates, and integrates them with costly model-based predictions only when useful. Mental simulation uses a sampling method to produce reward expectancies, which are used to update the cached value of one or more actions; in turn, this updated value is used for the choice. The key predictions of the model are tested in different settings of a double T-maze scenario. Results are discussed in relation with neurobiological evidence on the hippocampus – ventral striatum circuit in rodents, which has been linked to goal-directed spatial navigation.

Publication Type:	Article
Publisher Keywords:	model-based reinforcement learning, hippocampus, ventral striatum, goal-directed decision-making, exploration-exploitation, value of information, forward sweeps
Subjects:	B Philosophy. Psychology. Religion > BF Psychology
Departments:	School of Health & Medical Sciences > Department of Psychology & Neuroscience
SWORD Depositor:	Symplectic Administrator

Preview

Text - Published Version
Available under License Creative Commons Attribution.
Download (3MB) | Preview

Official URL: https://doi.org/10.3389/fpsyg.2013.00092

Export

Downloads

Downloads per month over past year

View more statistics

Metadata

Altmetric

View Altmetric information about this item.

CORE (COnnecting REpositories)

Actions (login required)

Admin Login

Creators:	Pezzulo, G. Rigoli, F. Chersi, F.
Status:	Published
Refereed:	Yes
Journal or Publication Title:	Frontiers in Psychology
Publisher:	Frontiers Media SA
ISSN:	1664-1078
e-ISSN:	1664-1078
URI:	https://openaccess.city.ac.uk/id/eprint/16668
Date available in CRO:	07 Mar 2017 11:28
Date deposited:	7 March 2017
Dates:	Date Event 4 March 2013 Published 8 February 2013 Accepted