Feedforward deep neural networks for object recognition are a promising model of visual processing and can accurately predict firing-rate responses along the ventral stream. Yet, these networks have limitations as models of various aspects of cortical processing related to recurrent connectivity, including neuronal synchronization and the integration of sensory inputs with spatio-temporal context. We trained self-supervised, generative neural networks to predict small regions of natural images based on the spatial context (i.e. inpainting). Using these network predictions, we determined the spatial predictability of visual inputs into (macaque) V1 receptive fields (RFs), and distinguished low- from high-level predictability. Spatial predictability strongly modulated V1 activity, with distinct effects on firing rates and synchronization in gamma- (30-80Hz) and beta-bands (18-30Hz). Furthermore, firing rates, but not synchronization, were accurately predicted by a deep neural network for object recognition. Neural networks trained to specifically predict V1 gamma-band synchronization developed large, grating-like RFs in the deepest layer. These findings suggest complementary roles for firing rates and synchronization in self-supervised learning of natural-image statistics.
bioRxiv Subject Collection: Neuroscience