21 research outputs found
Recommended from our members
Modeling Conventionalization and Predictability within MWEs at the Brain Level
While expressions have traditionally been binarized as compositional and noncompositional in linguistic theory, Multiword Expressions (MWEs) demonstrate finer-grained distinctions. Using Association Measures like Pointwise Mutual Information and Dice\u27s Coefficient, MWEs can be characterized as having different degrees of conventionalization and predictability. Our goal is to investigate how these gradiences could reflect cognitive processes. In this study, fMRI recordings of naturalistic narrative comprehension is used to probe to what extent these computational measures and the cognitive processes they could operationalize are observable during on-line sentence processing. Our results show that Dice\u27s Coefficent, representing lexical predictability, is a better predictor of neural activation for processing MWEs. Overall our experimental approach demonstrates how we can test the cognitive plausibility of computational metrics by comparing it against neuroimaging data
Syntactic Parsing versus MWEs: What can fMRI signal tell us
International audienc
French Contextualized Word-Embeddings with a sip of CaBeRnet: a New French Balanced Reference Corpus
International audienceThis paper describes and compares the impact of different types and size of training corpora on language models like ELMO. By asking the fundamental question of quality versus quantity we evaluate four French corpora for training on parsing scores, POS-tagging and named-entities recognition downstream tasks. The paper studies the relevance of a new corpus, CaBeRnet, featuring a representative range of language usage, including a balanced variety of genres (oral transcriptions, newspapers, popular magazines, technical reports, fiction, academic texts), in oral and written styles. We hypothesize that a linguistically representative and balanced corpora will allow the language model to be more efficient and representative of a given language and therefore yield better evaluation scores on different evaluation sets and tasks
La phrase en tant qu'objet cognitif. Bases neurales des structures syntaxiques dans la phrase chinoise et française.
En associant les récentes techniques de neuro-imagerie (IRMf et Potentiels Evoqués) à la finesse des analyses syntaxiques des approches typologiques et formelles, cette recherche pluridisciplinaire se penche sur la question de la représentation des structures hiérarchiques qui caractérisent l’unité-phrase à travers les langues. La façon dont le cerveau humain représente, construit et l'esprit comprend les diverses structures de phrase, est en effet une des plus importantes questions qui restent encore largement irrésolues dans l’organisation cérébrale du langage. En nous appuyant sur la diversité des langues dans leur organisation syntaxique de l’unité-phrase, nous avons pu isoler différentes dimensions de cette complexité grâce aux propriétés syntaxiques du français dans la formation des questions, ainsi qu'aux spécificités des articulations Topique-Commentaire en chinois mandarin. Suite à une étude du marquage intonationel de la hiérarchie entre Topique et Commentaire, nous avons pu enregistrer les réponses cérébrales (PE) à ce type de constructions en contexte, et ainsi découvrir l’influence de sa signature prosodique sur son traitement en temps réel. Nos deux études d’IRMf apportent quand à elles un éclairage sur les bases neurales de deux dimensions de la complexité syntaxique de la phrase : sa structure hiérarchique et les transformations structurelles dont elle témoigne en cas de dislocation de ses éléments. La première étude, sur les interrogatives en français, met en lumière les corrélats cérébraux de différents types de movements syntaxiques, la seconde, sur les différents phénomènes topicaux du chinois, révèle les représentations et processus qui sont liés à l’activation par le Topique de l’interface entre l’unité-phrase et le niveau du discours.Combining fine-grained linguistic analyses — from both typological and formal approaches to syntax — with neuro-imaging techniques (fMRI and ERP), this pluri-disciplinary research aims at investigating experimentally the issue of the hierarchical nature and complexity of the linguistic representation of sentence structure and its processing strategies across languages, specifically focusing on the case of Chinese Topic-Comment articulations and on French Interrogative constructions. The question of how the brain achieves sentence structure representation, building and understanding is often seen as one of the most important and unsolved issues of the neural organization of language. Leveraging on cross-linguistic invariance and variability in sentence hierarchical structure organization and building, we found in Chinese and French two exceptional testing grounds to isolate different syntactic complexity dimensions of the sentence-unit encoding. While the on-line auditory comprehension of sentence hierarchical structure in case of minimal intonational cues is investigated thanks to ERP recordings of Topic-Comment articulations in Chinese, two fMRI studies isolate two different syntactic complexity dimensions, respectively reflecting the sentence’s hierarchy and syntactic transformations. The first study, on French interrogative, seeks to isolate the neural correlates of different syntactic movement types. The second study, on Chinese sentence-discourse interface and Topics types, enables us to distinguish word-order surface complexity factors from syntactic movement transformations
Variable beam search for generative neural parsing and its relevance for the analysis of neuro-imaging signal
International audienceThis paper describes a method of variable beam size inference for Recurrent Neural Network Grammar (rnng) by drawing inspiration from sequential Monte-Carlo methods such as particle filtering. The paper studies the relevance of such methods for speeding up the computations of direct generative parsing for rnng. But it also studies the potential cognitive interpretation of the underlying representations built by the search method (beam activity) through analysis of neuro-imaging signal
Indices phonologiques des sinogrammes : de l'étude de l'acquisition à la modélisation pour l'apprentissage
International audienceLearning a language such as Mandarin Chinese includes specific challenges. A crucial point consists in grasping the right Orthographic-to-Phonology Correspondence (OPC) between the different graphical units in the sinogram and sound. Going beyond traditional vocabulary lists based on a lexical frequency strategy, we propose a computationnal model that enables to introduce the learner into the rules of the graphic system as a whole, its phonologi-cal cues and their reliability. At the crossroad between different disciplines, our NLP approach integrates the research results from Language teaching, Psycholinguistics and Neuro-imaging.L'apprentissage d'une langue comme celui du mandarin présente un défi dont la difficulté principale consiste à saisir les correspondances entre les différents composants de la structure graphique du sinogramme et sa phonologie. En dépassant la stratégie didactique des listes de vocabulaire constituées sur des critères de fréquence, notre modèle veut présenter à l'apprenant les indices phonologiques et leur consistance au sein du système graphique dans sa globalité. À la frontière entre les disciplines, notre approche en TAL intègre dans le modèle présenté des propositions en didactique des langues ainsi que des résultats en psycholinguistique et neuro-imagerie
Variable beam search for generative neural parsing and its fit with neuro-imaging signal
International audienc
Modeling Conventionalization and Predictability in Multi-Word Expressions at Brain-level
International audienceLinguistic expressions have been binarized as compositional and non-compositional given the lack of composionallinguistic analysis, Multi-word Expressions (MWEs) demonstrate finer-grained degrees of conventionalization and predictability in psycholinguisitcs, which canbe quantified through computational Association Measures, like Point-wise Mutual Information and Dice's Coefficient.In this study, fMRI recordings of naturalistic narrative comprehension is used to investigate to what extent these computational measures and the underlying cognitive processes they could reflect are observable during on-line naturalistic sentence processing