Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e., differentiates phonetic prototypes from ambiguous speech sounds). We recorded high density EEGs as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine (SVM) classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event related potentials (ERPs). We found that early (120 ms) whole-brain data decoded speech categories (i.e., prototypical vs. ambiguous speech tokens) with 95.16% accuracy [area under the curve (AUC) 95.14%; F1-score 95.00%]. Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more robust and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions (including auditory cortex, supramarginal gyrus, and Brocas area) that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, Brocas area, motor cortex) were necessary to describe later decision stages (later 300 ms) of categorization but these areas were highly associated with the strength of listeners categorical hearing (i.e., slope of behavioral identification functions). Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (~120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.
bioRxiv Subject Collection: Neuroscience