9 research outputs found

    Assessing the influence of attractor-verb distance on grammatical agreement in humans and language models

    Full text link
    Subject-verb agreement in the presence of an attractor noun located between the main noun and the verb elicits complex behavior: judgments of grammaticality are modulated by the grammatical features of the attractor. For example, in the sentence "The girl near the boys likes climbing", the attractor (boys) disagrees in grammatical number with the verb (likes), creating a locally implausible transition probability. Here, we parametrically modulate the distance between the attractor and the verb while keeping the length of the sentence equal. We evaluate the performance of both humans and two artificial neural network models: both make more mistakes when the attractor is closer to the verb, but neural networks get close to the chance level while humans are mostly able to overcome the attractor interference. Additionally, we report a linear effect of attractor distance on reaction times. We hypothesize that a possible reason for the proximity effect is the calculation of transition probabilities between adjacent words. Nevertheless, classical models of attraction such as the cue-based model might suffice to explain this phenomenon, thus paving the way for new research. Data and analyses available at https://osf.io/d4g6kComment: 10 pages (5 main, 2 refs, 3 supplementary) ; 5 figures (3 main, 2 supplementary) ; accepted at EMNLP 2023 (no DOI yet

    À la recherche des bases neurales de la compositionnalité

    No full text
    Wilhelm von Humboldt famously said that language makes “infinite use of finite means”. Concretely, humans are able to combine words following a fixed set of grammatical rules, yielding a limitless reservoir of meanings. This is what I mean by compositionality, and what this thesis has aimed to characterize. Using a combination of tools from linguistics, natural language processing and neuroimaging, I sought to describe how this impressive feat is implemented in the human brain. In the first part, I present a study that examines the rise of compositional representations. We isolate semantic processes by comparing normal sentences with ones made of meaningless pseudowords, aka Jabberwocky. Using joint magnetoencephalography and intracranial electroencephalography recordings, we show that the intrinsic dimensionality of the neural signals grows over time, and more so for normal sentences than Jabberwocky, portraying the progressive recruitment of neurons in the semantic representation. Furthermore, by means of multivariate decoding, we demonstrate that the dynamics of neural signals follow theoretically driven patterns, especially ramping and sentence-final signatures. In addition, we take advantage of the fine spatial resolution of intracranial recordings to quantify the participation of various brain regions in each of these steps and identify a chief role of the prefrontal cortex in compositional processes. Crucially, these signatures are present in state-of-the-art neural language models, but absent in untrained models, suggesting that learning language is associated with a predictable shaping of the neural vector space. Overall, we show that the neural representations of sentences grow with each additional meaning that can be added to the existing semantic manifold. The second study takes a finer experimental approach by focusing on the cortical representation of phrases composed of a small number of nouns and adjectives, in a working memory task with distinct sentence encoding, delay, and picture comparison stages. Using magnetoencephalography recordings and multivariate decoding, we collect brain responses to words in isolation and within increasingly longer phrases, and use these data to investigate the organization and temporal evolution of compositional representations. During the encoding phase, a cascade of activations follows each new word, and crucially, the representation of individual words is partially sustained until it can be coherently integrated into a phrase, at which point it fades away. Later, during the delay period, in which the subjects keep in mind the sentence to match it to a subsequent image, neural activity reflects the complexity of the sentence, as quantified by the number of different words it comprises. Finally, when the compositional representation has to be read-out, the speed of this mechanism is also modulated by complexity, as well as by the syntactic depth of the query: surface properties are detected faster than syntactically deeper ones. These findings suggest that the compositional word representations are compressed in working memory and require task-specific decompression to be accessed. Taken together, these findings shed new light on the nature of compositional representations in the human brain. Both studies point towards the idea that semantic representations are encoded in distributed vector spaces, perhaps similar to artificial neural language models and vector-symbolic architectures. We provide the first steps towards the characterization of these neural semantic spaces, their dimensionality and how they evolve over time.Wilhelm von Humboldt a déclaré que le langage « fait un usage infini de moyens finis». En effet, les humains sont capables de combiner des mots en suivant un ensemble fixe de règles grammaticales, produisant ainsi un réservoir illimité de sens. C'est ce que j'entends par compositionnalité et ce que cette thèse s'est attachée à caractériser. En utilisant une combinaison d'outils venant de la linguistique, du traitement automatique du langage et de la neuro-imagerie, j'ai cherché à décrire comment cet exploit est mis en œuvre dans le cerveau humain. Dans la première partie, je présente une étude qui examine l'émergence des représentations compositionnelles. Nous isolons les processus sémantiques en comparant des phrases normales avec celles faites de pseudo-mots dépourvus de sens, ou « Jabberwocky ». Dans une rare combinaison d'enregistrements de magnétoencéphalographie et d'électroencéphalographie intracrânienne, nous montrons que la dimensionnalité intrinsèque des signaux neuronaux croît avec le temps, et plus encore pour les phrases normales que le Jabberwocky, dépeignant le recrutement progressif des neurones dans la représentation compositionnelle. De plus, au moyen de décodage multivarié, nous démontrons que la dynamique des représentations suit des schémas théoriques, en particulier de rampe et de fin de phrase. De plus, nous profitons de la résolution spatiale fine des enregistrements intracrâniens pour quantifier la participation de différentes régions du cerveau à chacune de ces étapes et identifier notamment le rôle principal du cortex frontal dans les processus compositionnels. Crucialement, ces signatures étaient présentes dans des modèles de langage de pointe, mais absentes dans les modèles non entraînés, ce qui suggère que l'apprentissage du langage est associé à un remodelage prévisible de l'espace vectoriel neuronal. Finalement, nous montrons que les représentations neuronales croissent avec chaque signification supplémentaire qui peut être ajoutée à la variété sémantique existante. La deuxième étude adopte une approche expérimentale plus minutieuse en se concentrant sur les représentations corticales de phrases composées d’un petit nombre de noms et d'adjectifs, dans une tache de mémoire de travail avec des étapes distinctes d’encodage, de délai et de comparaison avec une image. À l'aide d'enregistrements magnétoencéphalographiques et du décodage multivarié, nous collectons les réponses cérébrales à des mots isolés ainsi que dans des phrases de plus en plus longues. Pendant la phase d'encodage, une cascade d'activations suit chaque nouveau mot et, surtout, la représentation des mots individuels est partiellement maintenue jusqu'à ce qu'elle puisse être intégrée de manière cohérente dans une phrase, après quoi elle s'estompe. Ensuite, pendant une période de délai, au cours de laquelle les sujets devaient retenir la phrase pour la comparer à une image ultérieure, l'activité neuronale reflète la complexité de la phrase, quantifiée par le nombre de mots différents qu'elle contient. Enfin, lorsque la représentation compositionnelle doit être lue, la vitesse de ce mécanisme est également modulée par la complexité, ainsi que par la profondeur syntaxique de la requête. Ces résultats suggèrent que les représentations compositionnelles sont compressées dans la mémoire de travail et nécessitent une décompression spécifique pour être accédées. Pris ensemble, ces résultats ouvrent une fenêtre sur la nature des représentations compositionnelles dans le cerveau humain. Les deux études pointent vers l'idée que les représentations sémantiques sont codées dans des espaces vectoriels distribués, peut-être semblables aux modèles de langage artificiel et aux architectures vectorielles symboliques. Nous proposons un premier pas vers la caractérisation de ces espaces sémantiques neuronaux, leur dimensionnalité et leur évolution dans le temps

    À la recherche des bases neurales de la compositionnalité

    No full text
    Wilhelm von Humboldt famously said that language makes “infinite use of finite means”. Concretely, humans are able to combine words following a fixed set of grammatical rules, yielding a limitless reservoir of meanings. This is what I mean by compositionality, and what this thesis has aimed to characterize. Using a combination of tools from linguistics, natural language processing and neuroimaging, I sought to describe how this impressive feat is implemented in the human brain. In the first part, I present a study that examines the rise of compositional representations. We isolate semantic processes by comparing normal sentences with ones made of meaningless pseudowords, aka Jabberwocky. Using joint magnetoencephalography and intracranial electroencephalography recordings, we show that the intrinsic dimensionality of the neural signals grows over time, and more so for normal sentences than Jabberwocky, portraying the progressive recruitment of neurons in the semantic representation. Furthermore, by means of multivariate decoding, we demonstrate that the dynamics of neural signals follow theoretically driven patterns, especially ramping and sentence-final signatures. In addition, we take advantage of the fine spatial resolution of intracranial recordings to quantify the participation of various brain regions in each of these steps and identify a chief role of the prefrontal cortex in compositional processes. Crucially, these signatures are present in state-of-the-art neural language models, but absent in untrained models, suggesting that learning language is associated with a predictable shaping of the neural vector space. Overall, we show that the neural representations of sentences grow with each additional meaning that can be added to the existing semantic manifold. The second study takes a finer experimental approach by focusing on the cortical representation of phrases composed of a small number of nouns and adjectives, in a working memory task with distinct sentence encoding, delay, and picture comparison stages. Using magnetoencephalography recordings and multivariate decoding, we collect brain responses to words in isolation and within increasingly longer phrases, and use these data to investigate the organization and temporal evolution of compositional representations. During the encoding phase, a cascade of activations follows each new word, and crucially, the representation of individual words is partially sustained until it can be coherently integrated into a phrase, at which point it fades away. Later, during the delay period, in which the subjects keep in mind the sentence to match it to a subsequent image, neural activity reflects the complexity of the sentence, as quantified by the number of different words it comprises. Finally, when the compositional representation has to be read-out, the speed of this mechanism is also modulated by complexity, as well as by the syntactic depth of the query: surface properties are detected faster than syntactically deeper ones. These findings suggest that the compositional word representations are compressed in working memory and require task-specific decompression to be accessed. Taken together, these findings shed new light on the nature of compositional representations in the human brain. Both studies point towards the idea that semantic representations are encoded in distributed vector spaces, perhaps similar to artificial neural language models and vector-symbolic architectures. We provide the first steps towards the characterization of these neural semantic spaces, their dimensionality and how they evolve over time.Wilhelm von Humboldt a déclaré que le langage « fait un usage infini de moyens finis». En effet, les humains sont capables de combiner des mots en suivant un ensemble fixe de règles grammaticales, produisant ainsi un réservoir illimité de sens. C'est ce que j'entends par compositionnalité et ce que cette thèse s'est attachée à caractériser. En utilisant une combinaison d'outils venant de la linguistique, du traitement automatique du langage et de la neuro-imagerie, j'ai cherché à décrire comment cet exploit est mis en œuvre dans le cerveau humain. Dans la première partie, je présente une étude qui examine l'émergence des représentations compositionnelles. Nous isolons les processus sémantiques en comparant des phrases normales avec celles faites de pseudo-mots dépourvus de sens, ou « Jabberwocky ». Dans une rare combinaison d'enregistrements de magnétoencéphalographie et d'électroencéphalographie intracrânienne, nous montrons que la dimensionnalité intrinsèque des signaux neuronaux croît avec le temps, et plus encore pour les phrases normales que le Jabberwocky, dépeignant le recrutement progressif des neurones dans la représentation compositionnelle. De plus, au moyen de décodage multivarié, nous démontrons que la dynamique des représentations suit des schémas théoriques, en particulier de rampe et de fin de phrase. De plus, nous profitons de la résolution spatiale fine des enregistrements intracrâniens pour quantifier la participation de différentes régions du cerveau à chacune de ces étapes et identifier notamment le rôle principal du cortex frontal dans les processus compositionnels. Crucialement, ces signatures étaient présentes dans des modèles de langage de pointe, mais absentes dans les modèles non entraînés, ce qui suggère que l'apprentissage du langage est associé à un remodelage prévisible de l'espace vectoriel neuronal. Finalement, nous montrons que les représentations neuronales croissent avec chaque signification supplémentaire qui peut être ajoutée à la variété sémantique existante. La deuxième étude adopte une approche expérimentale plus minutieuse en se concentrant sur les représentations corticales de phrases composées d’un petit nombre de noms et d'adjectifs, dans une tache de mémoire de travail avec des étapes distinctes d’encodage, de délai et de comparaison avec une image. À l'aide d'enregistrements magnétoencéphalographiques et du décodage multivarié, nous collectons les réponses cérébrales à des mots isolés ainsi que dans des phrases de plus en plus longues. Pendant la phase d'encodage, une cascade d'activations suit chaque nouveau mot et, surtout, la représentation des mots individuels est partiellement maintenue jusqu'à ce qu'elle puisse être intégrée de manière cohérente dans une phrase, après quoi elle s'estompe. Ensuite, pendant une période de délai, au cours de laquelle les sujets devaient retenir la phrase pour la comparer à une image ultérieure, l'activité neuronale reflète la complexité de la phrase, quantifiée par le nombre de mots différents qu'elle contient. Enfin, lorsque la représentation compositionnelle doit être lue, la vitesse de ce mécanisme est également modulée par la complexité, ainsi que par la profondeur syntaxique de la requête. Ces résultats suggèrent que les représentations compositionnelles sont compressées dans la mémoire de travail et nécessitent une décompression spécifique pour être accédées. Pris ensemble, ces résultats ouvrent une fenêtre sur la nature des représentations compositionnelles dans le cerveau humain. Les deux études pointent vers l'idée que les représentations sémantiques sont codées dans des espaces vectoriels distribués, peut-être semblables aux modèles de langage artificiel et aux architectures vectorielles symboliques. Nous proposons un premier pas vers la caractérisation de ces espaces sémantiques neuronaux, leur dimensionnalité et leur évolution dans le temps

    Tracking the neural codes for words and phrases during semantic composition, working-memory storage, and retrieval

    No full text
    Summary: The ability to compose successive words into a meaningful phrase is a characteristic feature of human cognition, yet its neural mechanisms remain incompletely understood. Here, we analyze the cortical mechanisms of semantic composition using magnetoencephalography (MEG) while participants read one-word, two-word, and five-word noun phrases and compared them with a subsequent image. Decoding of MEG signals revealed three processing stages. During phrase comprehension, the representation of individual words was sustained for a variable duration depending on phrasal context. During the delay period, the word code was replaced by a working-memory code whose activation increased with semantic complexity. Finally, the speed and accuracy of retrieval depended on semantic complexity and was faster for surface than for deep semantic properties. In conclusion, we propose that the brain initially encodes phrases using factorized dimensions for successive words but later compresses them in working memory and requires a period of decompression to access them

    Dimensionality and ramping: Signatures of sentence integration in the dynamics of brains and deep language models

    No full text
    International audienceA sentence is more than the sum of its words: its meaning depends on how they combine with one another. The brain mechanisms underlying such semantic composition remain poorly understood. To shed light on the neural vector code underlying semantic composition, we introduce two hypotheses: First, the intrinsic dimensionality of the space of neural representations should increase as a sentence unfolds, paralleling the growing complexity of its semantic representation, and second, this progressive integration should be reflected in ramping and sentence-final signals. To test these predictions, we designed a dataset of closely matched normal and Jabberwocky sentences (composed of meaningless pseudo words) and displayed them to deep language models and to 11 human participants (5 men and 6 women) monitored with simultaneous magneto-encephalography and intracranial electro-encephalography. In both deep language models and electrophysiological data, we found that representational dimensionality was higher for meaningful sentences than Jabberwocky. Furthermore, multivariate decoding of normal versus Jabberwocky confirmed three dynamic patterns: (i) a phasic pattern following each word, peaking in temporal and parietal areas, (ii) a ramping pattern, characteristic of bilateral inferior and middle frontal gyri, and (iii) a sentence-final pattern in left superior frontal gyrus and right orbitofrontal cortex. These results provide a first glimpse into the neural geometry of semantic integration and constrain the search for a neural code of linguistic composition. Significance statement Starting from general linguistic concepts, we make two sets of predictions in neural signals evoked by reading multi-word sentences. First, the intrinsic dimensionality of the representation should grow with additional meaningful words. Second, the neural dynamics should exhibit signatures of encoding, maintaining, and resolving semantic composition. We successfully validated these hypotheses in deep Neural Language Models, artificial neural networks trained on text and performing very well on many Natural Language Processing tasks. Then, using a unique combination of magnetoencephalography and intracranial electrodes, we recorded high-resolution brain data from human participants while they read a controlled set of sentences. Time-resolved dimensionality analysis showed increasing dimensionality with meaning, and multivariate decoding allowed us to isolate the three dynamical patterns we had hypothesized

    Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

    No full text
    Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Get PDF
    Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.Comment: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-benc
    corecore