Despite the success of models making use of word embeddings on many natural language tasks, these models often perform significantly worse than humans on several natural language understanding tasks. This difference in performance motivates us to ask: (1) if existing word vector representations have any basis in the brain’s representational structure for individual words, and (2) whether features from the brain can be used to improve word embedding model performance, defined as their correlation with human semantic judgements. To answer the first question, we compare the representational spaces of existing word embedding models with that of brain imaging data through representational similarity analysis. We answer the second question by using regression-based learning to constrain word vectors to the features of the brain imaging data, thereby determining if these modified word vectors exhibit increased performance over their unmodified counterparts. To collect semantic judgements as a measure of performance, we employed a novel multi-arrangement method. Our results show that there is variance in the representational space of the brain imaging data that remains uncaptured by word embedding models, and that brain imaging data can be used to increase their coherence with human performance.
bioRxiv Subject Collection: Neuroscience