Humans often face sequential decision-making problems, in which information about the environmental reward structure is detached from rewards for a subset of actions. For example, a medicated patient may consider partaking in a clinical trial on the effectiveness of a new drug. Taking part in the trial can provide the patient with information about the personal effectiveness of the new drug and the potential reward of a better treatment. Not taking part in the trial does not provide the patient with this information, but is associated with the reward of a (potentially less) effective treatment. In the current study, we introduce a novel information-selective reversal bandit task to model such situations and obtained choice data on this task from 24 participants. To arbitrate between different decision-making strategies that participants may use on this task, we developed a set of probabilistic agent-based behavioural models, including exploitative and explorative Bayesian agents, as well as heuristic control agents. Upon validating the model and parameter recovery properties of our model set and summarizing the participants’ choice data in a descriptive way, we used a maximum likelihood approach to evaluate the participants’ choice data from the perspective of our model set. In brief, we provide evidence that participants employ a belief state-based hybrid explorative-exploitative strategy on the information-selective reversal bandit task, lending further support to the finding that humans are guided by their subjective uncertainty when solving exploration-exploitation dilemmas.
bioRxiv Subject Collection: Neuroscience