Recent studies have established that one-trial-back decision policies (Win-Stay/Lose-Shift) and measures of reinforcement learning (RL), e.g. learning rate, can explain how animals perform two-armed bandit tasks. In many published studies, outcomes reverse after one option is selected repeatedly (e.g. 8 selections in a row), and the primary measure of performance is the number of reversals completed. Performance and Win-Stay likelihood are confounded by using recent performance to drive reversals. An alternative design reverses outcomes across options over fixed blocks of trials. We used this blocked design and tested rats in a spatial two-armed bandit task. We analyzed performance using Win-Stay/Lose-Shift (WSLS) metrics and a RL algorithm. We found that WSLS policies remain stable with increasing reward uncertainty, while choice accuracy decreases. Within test sessions, learning rates increased as rats adapted their strategies over the first few reversals but inverse temperature remains stable. We found that muscimol inactivation of medial orbital cortex (mOFC) mediates task performance and negative feedback sensitivity. Finally, we examined the role of the adrenergic system in bandit performance, and found yohimbine (2 mg/kg) dramatically decreased sensitivity to positive feedback, leading to decreases in accuracy and inverse temperature. These effects are partially dependent on a2 adrenergic receptors in OFC. Our findings demonstrate a correspondence between reward schedule, WSLS policies and RL metrics in a task design that is free of the confound between Wins and reversals, and that the noradrenergic influence of mOFC on WSLS policy is dissociable from the regions general role in cognitive flexibility.
bioRxiv Subject Collection: Neuroscience