1,661 research outputs found

    From the Definitions of the "Trésor de la Langue Française" To a Semantic Database of the French Language

    Get PDF
    International audienceThe Definiens project aims at building a database of French lexical semantics that is formal and structured enough to allow for a fine-grained semantic access to the French lexicon—for such tasks as automatic extraction and computation. To achieve this in a relatively short time, we process the definitions of the Trésor de la Langue Française informatisé (TLFi), enriching them with an XML tagging that makes explicit their internal organization (roughly, genus and differentiae) and enhancing the components with semantic labels that explicit their role in the definition. There is, to our knowledge, no existing broad coverage database for the French lexicon that offers to researchers and NLP developers a structured decomposition of the meaning of lexical units. Definiens is an ongoing research that will hopefully fill this gap in the near future

    Enforcing Subcategorization Constraints in a Parser Using Sub-parses Recombining

    Get PDF
    International audienceTreebanks are not large enough to adequately model subcategorization frames of predicative lexemes, which is an important source of lexico-syntactic constraints for parsing. As a consequence, parsers trained on such treebanks usually make mistakes when selecting the arguments of predicative lexemes. In this paper, we propose an original way to correct subcategorization errors by combining sub-parses of a sentence S that appear in the list of the n-best parses of S. The subcategorization information comes from three different resources, the first one is extracted from a treebank, the second one is computed on a large corpora and the third one is an existing syntactic lexicon. Experiments on the French Treebank showed a 15.24% reduction of erroneous subcategorization frames (SF) selections for verbs as well as a relative decrease of the error rate of 4% Labeled Accuracy Score on the state of the art parser on this treebank

    Dictionary-Ontology Cross-Enrichment Using TLFi and WOLF to enrich one another

    Get PDF
    International audienceIt has been known since Ide and Veronis that it is impossible to automatically extract an ontology structure from a dictionary, because that information is simply not present. We at- tempt to extract structure elements from a dictionary using clues taken from a formal ontology, and use these elements to match dictionary definitions to ontology synsets; this allows us to enrich the ontology with dictionary definitions, assign ontological structure to the dictionary, and disambiguate elements of definitions and synsets

    Semi-supervised Dependency Parsing using Lexical Affinities

    No full text
    International audienceTreebanks are not large enough to reliably model precise lexical phenomena. This deficiency provokes attachment errors in the parsers trained on such data. We propose in this paper to compute lexical affinities, on large corpora, for specific lexico-syntactic configurations that are hard to disambiguate and introduce the new information in a parser. Experiments on the French Treebank showed a relative decrease of the error rate of 7.1% Labeled Accuracy Score yielding the best pars- ing results on this treebank

    The Effects of Factorizing Root and Pattern Mapping in Bidirectional Tunisian - Standard Arabic Machine Translation

    No full text
    International audienceThe development of natural language processing tools for dialects faces the severe problem of lack of resources. In cases of diglossia, as in Arabic, one variant, Modern Standard Arabic (MSA), has many resources that can be used to build natural language processing tools. Whereas other variants, Arabic dialects, are resource poor. Taking advantage of the closeness of MSA and its dialects, one way to solve the problem of limited resources, consists in performing a translation of the dialect into MSA in order to use the tools developed for MSA. We describe in this paper an architecture for such a translation and we evaluate it on Tunisian Arabic verbs. Our approach relies on modeling the translation process over the deep morphological representations of roots and patterns, commonly used to model Semitic morphology. We compare different techniques for how to perform the cross-lingual mapping. Our evaluation demonstrates that the use of a decent coverage root+pattern lexicon of Tunisian and MSA with a backoff that assumes independence of mapping roots and patterns is optimal in reducing overall ambiguity and increasing recall

    Un système de traduction de verbes entre arabe standard et arabe dialectal par analyse morphologique profonde

    No full text
    International audienceThe developpment of NLP tools for dialects faces the severe problem of lack of resources for such dialects. In the case of diglossia, as in arabic, a variant of arabic, Modern Standard Arabic, exists, for which many resources have been developped which can be used to build NLP tools. Taking advantage of the closeness of MSA and dialects, one way to solve the problem consist in performing a surfacic translation of the dialect into MSA in order to use the tools developped for MSA. We describe in this paper an achitecture for such a translation and we evaluate it on arabic verbs

    Création de clusters sémantiques dans des familles morphologiques à partir du TLFi

    Get PDF
    National audienceBuilding lexical resources is a time-consuming and expensive task, mainly when it comes to morphological lexicons. Such resources describe in depth and explicitly the morphological organization of the lexicon, completed with semantic information to be used in NLP applications. The work we present here goes on such direction, and especially, on refining an existing resource with automatically acquired semantic information. Our goal is to semantically characterize morpho-phonological families (words sharing a same base form and semantic continuity). To this end, we have used data from the TLFi which has been morpho-syntactically annotated. The first results of such a task will be analyzed and discussed.La constitution de ressources linguistiques est une tâche longue et coûteuse. C'est notamment le cas pour les ressources morphologiques. Ces ressources décrivent de façon approfondie et explicite l'organisation morphologique du lexique complétée d'informations sémantiques exploitables dans le domaine du TAL. Le travail que nous présentons dans cet article s'inscrit dans cette perspective et, plus particulièrement, dans l'optique d'affiner une ressource existante en s'appuyant sur des informations sémantiques obtenues automatiquement. Notre objectif est de caractériser sémantiquement des familles morpho-phonologiques (des mots partageant une même racine et une continuité de sens). Pour ce faire, nous avons utilisé des informations extraites du TLFi annoté morpho-syntaxiquement. Les premiers résultats de ce travail seront analysés et discutés

    Modèles génératif et discriminant en analyse syntaxique : expériences sur le corpus arboré de Paris 7

    No full text
    International audienceNous présentons une architecture pour l'analyse syntaxique en deux étapes. Dans un premier temps un analyseur syntagmatique construit, pour chaque phrase, une liste d'analyses qui sont converties en arbres de dépendances. Ces arbres sont ensuite réévalués par un réordonnanceur discriminant. Cette méthode permet de prendre en compte des informations auxquelles l'analyseur n'a pas accès, en particulier des annotations fonction- nelles. Nous validons notre approche par une évaluation sur le corpus arboré de Paris 7. La seconde étape permet d'améliorer significativement la qualité des analyses retournées, quelle que soit la métrique utilisée
    corecore