Being able to accurately perceive the emotion expressed by the facial or verbal expression from others is critical to successful social interaction. However, only few studies examined the multimodal interactions on speech emotion, and there is no consistence in studies on the speech emotion perception. It remains unclear, how the speech emotion of different valence is perceived on the multimodal stimuli by our human brain. In this paper, we conducted a functional magnetic resonance imaging (fMRI) study with an event-related design, using dynamic facial expressions and emotional speech stimuli to express different emotions, in order to explore the perception mechanism of speech emotion in audio-visual modality. The representational similarity analysis (RSA), whole-brain searchlight analysis, and conjunction analysis of emotion were used to interpret the representation of speech emotion in different aspects. Significantly, a weighted RSA approach was creatively proposed to evaluate the contribution of each candidate model to the best fitted model. The results of weighted RSA indicated that the fitted models were superior to all candidate models and the weights could be used to explain the representation of ROIs. The bilateral amygdala has been shown to be associated with the processing of both positive and negative emotions except neutral emotion. It is indicated that the left posterior insula and the left anterior superior temporal gyrus (STG) play important roles in the perception of multimodal speech emotion.
bioRxiv Subject Collection: Neuroscience