21 research outputs found

    French Contextualized Word-Embeddings with a sip of CaBeRnet: a New French Balanced Reference Corpus

    Get PDF
    International audienceThis paper describes and compares the impact of different types and size of training corpora on language models like ELMO. By asking the fundamental question of quality versus quantity we evaluate four French corpora for training on parsing scores, POS-tagging and named-entities recognition downstream tasks. The paper studies the relevance of a new corpus, CaBeRnet, featuring a representative range of language usage, including a balanced variety of genres (oral transcriptions, newspapers, popular magazines, technical reports, fiction, academic texts), in oral and written styles. We hypothesize that a linguistically representative and balanced corpora will allow the language model to be more efficient and representative of a given language and therefore yield better evaluation scores on different evaluation sets and tasks

    La phrase en tant qu'objet cognitif. Bases neurales des structures syntaxiques dans la phrase chinoise et française.

    No full text
    En associant les récentes techniques de neuro-imagerie (IRMf et Potentiels Evoqués) à la finesse des analyses syntaxiques des approches typologiques et formelles, cette recherche pluridisciplinaire se penche sur la question de la représentation des structures hiérarchiques qui caractérisent l’unité-phrase à travers les langues. La façon dont le cerveau humain représente, construit et l'esprit comprend les diverses structures de phrase, est en effet une des plus importantes questions qui restent encore largement irrésolues dans l’organisation cérébrale du langage. En nous appuyant sur la diversité des langues dans leur organisation syntaxique de l’unité-phrase, nous avons pu isoler différentes dimensions de cette complexité grâce aux propriétés syntaxiques du français dans la formation des questions, ainsi qu'aux spécificités des articulations Topique-Commentaire en chinois mandarin. Suite à une étude du marquage intonationel de la hiérarchie entre Topique et Commentaire, nous avons pu enregistrer les réponses cérébrales (PE) à ce type de constructions en contexte, et ainsi découvrir l’influence de sa signature prosodique sur son traitement en temps réel. Nos deux études d’IRMf apportent quand à elles un éclairage sur les bases neurales de deux dimensions de la complexité syntaxique de la phrase : sa structure hiérarchique et les transformations structurelles dont elle témoigne en cas de dislocation de ses éléments. La première étude, sur les interrogatives en français, met en lumière les corrélats cérébraux de différents types de movements syntaxiques, la seconde, sur les différents phénomènes topicaux du chinois, révèle les représentations et processus qui sont liés à l’activation par le Topique de l’interface entre l’unité-phrase et le niveau du discours.Combining fine-grained linguistic analyses — from both typological and formal approaches to syntax — with neuro-imaging techniques (fMRI and ERP), this pluri-disciplinary research aims at investigating experimentally the issue of the hierarchical nature and complexity of the linguistic representation of sentence structure and its processing strategies across languages, specifically focusing on the case of Chinese Topic-Comment articulations and on French Interrogative constructions. The question of how the brain achieves sentence structure representation, building and understanding is often seen as one of the most important and unsolved issues of the neural organization of language. Leveraging on cross-linguistic invariance and variability in sentence hierarchical structure organization and building, we found in Chinese and French two exceptional testing grounds to isolate different syntactic complexity dimensions of the sentence-unit encoding. While the on-line auditory comprehension of sentence hierarchical structure in case of minimal intonational cues is investigated thanks to ERP recordings of Topic-Comment articulations in Chinese, two fMRI studies isolate two different syntactic complexity dimensions, respectively reflecting the sentence’s hierarchy and syntactic transformations. The first study, on French interrogative, seeks to isolate the neural correlates of different syntactic movement types. The second study, on Chinese sentence-discourse interface and Topics types, enables us to distinguish word-order surface complexity factors from syntactic movement transformations

    Variable beam search for generative neural parsing and its relevance for the analysis of neuro-imaging signal

    No full text
    International audienceThis paper describes a method of variable beam size inference for Recurrent Neural Network Grammar (rnng) by drawing inspiration from sequential Monte-Carlo methods such as particle filtering. The paper studies the relevance of such methods for speeding up the computations of direct generative parsing for rnng. But it also studies the potential cognitive interpretation of the underlying representations built by the search method (beam activity) through analysis of neuro-imaging signal

    Indices phonologiques des sinogrammes : de l'étude de l'acquisition à la modélisation pour l'apprentissage

    No full text
    International audienceLearning a language such as Mandarin Chinese includes specific challenges. A crucial point consists in grasping the right Orthographic-to-Phonology Correspondence (OPC) between the different graphical units in the sinogram and sound. Going beyond traditional vocabulary lists based on a lexical frequency strategy, we propose a computationnal model that enables to introduce the learner into the rules of the graphic system as a whole, its phonologi-cal cues and their reliability. At the crossroad between different disciplines, our NLP approach integrates the research results from Language teaching, Psycholinguistics and Neuro-imaging.L'apprentissage d'une langue comme celui du mandarin présente un défi dont la difficulté principale consiste à saisir les correspondances entre les différents composants de la structure graphique du sinogramme et sa phonologie. En dépassant la stratégie didactique des listes de vocabulaire constituées sur des critères de fréquence, notre modèle veut présenter à l'apprenant les indices phonologiques et leur consistance au sein du système graphique dans sa globalité. À la frontière entre les disciplines, notre approche en TAL intègre dans le modèle présenté des propositions en didactique des langues ainsi que des résultats en psycholinguistique et neuro-imagerie

    Modeling Conventionalization and Predictability in Multi-Word Expressions at Brain-level

    No full text
    International audienceLinguistic expressions have been binarized as compositional and non-compositional given the lack of composionallinguistic analysis, Multi-word Expressions (MWEs) demonstrate finer-grained degrees of conventionalization and predictability in psycholinguisitcs, which canbe quantified through computational Association Measures, like Point-wise Mutual Information and Dice's Coefficient.In this study, fMRI recordings of naturalistic narrative comprehension is used to investigate to what extent these computational measures and the underlying cognitive processes they could reflect are observable during on-line naturalistic sentence processing
    corecore