In natural conversations, listeners must attend to what others are saying while ignoring extraneous background sounds. Recent studies have used encoding models to predict electroencephalography (EEG) responses to speech in noise-free listening situations, sometimes referred to as "speech tracking" in EEG. Researchers have analyzed how speech tracking changes with different types of background noise. It is unclear, however, whether neural responses from noisy and naturalistic environments can be generalized to more controlled stimuli. If encoding models for noisy, naturalistic stimuli are generalizable to other tasks, this could aid in data collection from populations who may not tolerate listening to more controlled, less-engaging stimuli for long periods of time. We recorded non-invasive scalp EEG while participants listened to speech without noise and audiovisual speech stimuli containing overlapping speakers and background sounds. We fit multivariate temporal receptive field (mTRF) encoding models to predict EEG responses to pitch, the acoustic envelope, phonological features, and visual cues in both noise-free and noisy stimulus conditions. Our results suggested that neural responses to naturalistic stimuli were generalizable to more controlled data sets. EEG responses to speech in isolation were predicted accurately using phonological features alone, while responses to noisy speech were more accurate when including both phonological and acoustic features. These findings may inform basic science research on speech-in-noise processing. Ultimately, they may also provide insight into auditory processing in people who are hard of hearing, who use a combination of audio and visual cues to understand speech in the presence of noise.
bioRxiv Subject Collection: Neuroscience