Segmental speech units (e.g. phonemes) are described as multidimensional categories wherein perception involves contributions from multiple acoustic input dimensions, and the relative perceptual weights of these dimensions respond dynamically to context. Can prosodic aspects of speech spanning multiple phonemes, syllables or words be characterized similarly? Here we investigated the relative contribution of two acoustic dimensions to word emphasis. Participants categorized instances of a two-word phrase pronounced with typical covariation of fundamental frequency (F0) and duration, and in the context of an artificial ‘accent’ in which F0 and duration covaried atypically. When categorizing ‘accented’ speech, listeners rapidly down-weighted the secondary dimension (duration) while continuing to rely on the primary dimension (F0). This clarifies two core theoretical questions: 1) prosodic categories are signalled by multiple input acoustic dimensions and 2) perceptual cue weights for prosodic categories dynamically adapt to local regularities of speech input.
bioRxiv Subject Collection: Neuroscience