Balancing the speed and accuracy of decisions is crucial for survival, but how organisms manage this trade-off during learning is largely unknown. Here, we track this trade-off during perceptual learning in rats and simulated agents. At the start of learning, rats chose long reaction times that did not optimize instantaneous reward rate, but by the end of learning chose near-optimal reaction times. To understand this behavior, we analyzed learning dynamics in a recurrent neural network model of the task. The model reveals a fundamental trade-off between instantaneous reward rate and perceptual learning speed, putting the goals of learning quickly and accruing immediate reward in tension. We find that the rats’ strategy of long initial responses can dramatically expedite learning, yielding higher total reward over task engagement. Our results demonstrate that prioritizing learning can be advantageous from a total reward perspective, and suggest that rats engage in cognitive control of learning.
bioRxiv Subject Collection: Neuroscience