The ontogenetic development of human vision, and the real-time neural processing of visual input, both exhibit a striking similarity – a sensitivity towards spatial frequencies that progress in a coarse-to-fine manner. During early human development, sensitivity for higher spatial frequencies increases with age. In adulthood, when humans receive new visual input, low spatial frequencies are typically processed first before subsequently guiding the processing of higher spatial frequencies. We investigated to what extent this coarse-to-fine progression might impact visual representations in artificial vision and compared this to adult human representations. We simulated the coarse-to-fine progression of image processing in deep convolutional neural networks (CNNs) by gradually increasing spatial frequency information during training. We compared CNN performance, after standard and coarse-to-fine training, with a wide range of datasets from behavioural and neuroimaging experiments. In contrast to humans, CNNs that are trained using the standard protocol are very insensitive to low spatial frequency information, showing very poor performance in being able to classify such object images. By training CNNs using our coarse-to-fine method, we improved the classification accuracy of CNNs from 0% to 32% on low-pass filtered images taken from the ImageNet dataset. When comparing differently trained networks on images containing full spatial frequency information, we saw no representational differences. Overall, this integration of computational, neural, and behavioural findings shows the relevance of the exposure to and processing of input with a variation in spatial frequency content for some aspects of high-level object representations.
bioRxiv Subject Collection: Neuroscience