It has been proposed that the activity of dopamine neurons approximates temporal difference (TD) prediction error, a teaching signal developed in reinforcement learning, a field of machine learning. However, whether this similarity holds true during learning remains elusive. In particular, some TD learning models predict that the error signal gradually shifts backward in time from reward delivery to a reward-predictive cue, but previous experiments failed to observe such a gradual shift in dopamine activity. Here we demonstrate conditions in which such a shift can be detected experimentally. These shared dynamics of TD error and dopamine activity narrow the gap between machine learning theory and biological brains, tightening a long-sought link.
bioRxiv Subject Collection: Neuroscience