Search CORE

1,661 research outputs found

From the Definitions of the "Trésor de la Langue Française" To a Semantic Database of the French Language

Author: Barque Lucie
Nasr Alexis
Polguère Alain
Publication venue: HAL CCSD
Publication date: 06/07/2010
Field of study

International audienceThe Definiens project aims at building a database of French lexical semantics that is formal and structured enough to allow for a fine-grained semantic access to the French lexicon—for such tasks as automatic extraction and computation. To achieve this in a relatively short time, we process the definitions of the Trésor de la Langue Française informatisé (TLFi), enriching them with an XML tagging that makes explicit their internal organization (roughly, genus and differentiae) and enhancing the components with semantic labels that explicit their role in the definition. There is, to our knowledge, no existing broad coverage database for the French lexicon that offers to researchers and NLP developers a structured decomposition of the meaning of lexical units. Definiens is an ongoing research that will hopefully fill this gap in the near future

HAL AMU

Hal-Diderot

Enforcing Subcategorization Constraints in a Parser Using Sub-parses Recombining

Author: Mirroshandel Seyed Abolghasem
Nasr Alexis
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 09/06/2013
Field of study

International audienceTreebanks are not large enough to adequately model subcategorization frames of predicative lexemes, which is an important source of lexico-syntactic constraints for parsing. As a consequence, parsers trained on such treebanks usually make mistakes when selecting the arguments of predicative lexemes. In this paper, we propose an original way to correct subcategorization errors by combining sub-parses of a sentence S that appear in the list of the n-best parses of S. The subcategorization information comes from three different resources, the first one is extracted from a treebank, the second one is computed on a large corpora and the third one is an existing syntactic lexicon. Experiments on the French Treebank showed a 15.24% reduction of erroneous subcategorization frames (SF) selections for verbs as well as a relative decrease of the error rate of 4% Labeled Accuracy Score on the state of the art parser on this treebank

HAL AMU

INRIA a CCSD electronic archive server

Hal-Diderot

Dictionary-Ontology Cross-Enrichment Using TLFi and WOLF to enrich one another

Author: Barque Lucie
Eckard Emmanuel
Nasr Alexis
Sagot Benoît
Publication venue: Curran Associates, Inc.
Publication date: 15/12/2012
Field of study

International audienceIt has been known since Ide and Veronis that it is impossible to automatically extract an ontology structure from a dictionary, because that information is simply not present. We at- tempt to extract structure elements from a dictionary using clues taken from a formal ontology, and use these elements to match dictionary definitions to ontology synsets; this allows us to enrich the ontology with dictionary definitions, assign ontological structure to the dictionary, and disambiguate elements of definitions and synsets

HAL AMU

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Paris 13

Hal-Diderot

Semi-supervised Dependency Parsing using Lexical Affinities

Author: Le Roux Joseph
Mirroshandel Seyed Abolghasem
Nasr Alexis
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

International audienceTreebanks are not large enough to reliably model precise lexical phenomena. This deficiency provokes attachment errors in the parsers trained on such data. We propose in this paper to compute lexical affinities, on large corpora, for specific lexico-syntactic configurations that are hard to disambiguate and introduce the new information in a parser. Experiments on the French Treebank showed a relative decrease of the error rate of 7.1% Labeled Accuracy Score yielding the best pars- ing results on this treebank

The Effects of Factorizing Root and Pattern Mapping in Bidirectional Tunisian - Standard Arabic Machine Translation

Author: Boujelbane Rahma
Habash Nizar
Hamdi Ahmed
Nasr Alexis
Publication venue: HAL CCSD
Publication date: 02/09/2013
Field of study

International audienceThe development of natural language processing tools for dialects faces the severe problem of lack of resources. In cases of diglossia, as in Arabic, one variant, Modern Standard Arabic (MSA), has many resources that can be used to build natural language processing tools. Whereas other variants, Arabic dialects, are resource poor. Taking advantage of the closeness of MSA and its dialects, one way to solve the problem of limited resources, consists in performing a translation of the dialect into MSA in order to use the tools developed for MSA. We describe in this paper an architecture for such a translation and we evaluate it on Tunisian Arabic verbs. Our approach relies on modeling the translation process over the deep morphological representations of roots and patterns, commonly used to model Semitic morphology. We compare different techniques for how to perform the cross-lingual mapping. Our evaluation demonstrates that the use of a decent coverage root+pattern lexicon of Tunisian and MSA with a backoff that assumes independence of mapping roots and patterns is optimal in reducing overall ambiguity and increasing recall

HAL AMU

HAL Descartes

Hal-Diderot

Un système de traduction de verbes entre arabe standard et arabe dialectal par analyse morphologique profonde

Author: Boujelbane Rahma
Habash Nizar
Hamdi Ahmed
Nasr Alexis
Publication venue: HAL CCSD
Publication date: 17/06/2013
Field of study

International audienceThe developpment of NLP tools for dialects faces the severe problem of lack of resources for such dialects. In the case of diglossia, as in arabic, a variant of arabic, Modern Standard Arabic, exists, for which many resources have been developped which can be used to build NLP tools. Taking advantage of the closeness of MSA and dialects, one way to solve the problem consist in performing a surfacic translation of the dialect into MSA in order to use the tools developped for MSA. We describe in this paper an achitecture for such a translation and we evaluate it on arabic verbs

HAL AMU

HAL Descartes

Hal-Diderot

Création de clusters sémantiques dans des familles morphologiques à partir du TLFi

Author: Gala Nuria
Hathout Nabil
Nasr Alexis
Rey Véronique
Seppälä Selja
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

National audienceBuilding lexical resources is a time-consuming and expensive task, mainly when it comes to morphological lexicons. Such resources describe in depth and explicitly the morphological organization of the lexicon, completed with semantic information to be used in NLP applications. The work we present here goes on such direction, and especially, on refining an existing resource with automatically acquired semantic information. Our goal is to semantically characterize morpho-phonological families (words sharing a same base form and semantic continuity). To this end, we have used data from the TLFi which has been morpho-syntactically annotated. The first results of such a task will be analyzed and discussed.La constitution de ressources linguistiques est une tâche longue et coûteuse. C'est notamment le cas pour les ressources morphologiques. Ces ressources décrivent de façon approfondie et explicite l'organisation morphologique du lexique complétée d'informations sémantiques exploitables dans le domaine du TAL. Le travail que nous présentons dans cet article s'inscrit dans cette perspective et, plus particulièrement, dans l'optique d'affiner une ressource existante en s'appuyant sur des informations sémantiques obtenues automatiquement. Notre objectif est de caractériser sémantiquement des familles morpho-phonologiques (des mots partageant une même racine et une continuité de sens). Pour ce faire, nous avons utilisé des informations extraites du TLFi annoté morpho-syntaxiquement. Les premiers résultats de ce travail seront analysés et discutés

Scientific Publications of the University of Toulouse II Le Mirail

HAL AMU

HAL Descartes

Modèles génératif et discriminant en analyse syntaxique : expériences sur le corpus arboré de Paris 7

Author: Favre Benoit
Le Roux Joseph
Mirroshandel Seyed Abolghasem
Nasr Alexis
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceNous présentons une architecture pour l'analyse syntaxique en deux étapes. Dans un premier temps un analyseur syntagmatique construit, pour chaque phrase, une liste d'analyses qui sont converties en arbres de dépendances. Ces arbres sont ensuite réévalués par un réordonnanceur discriminant. Cette méthode permet de prendre en compte des informations auxquelles l'analyseur n'a pas accès, en particulier des annotations fonction- nelles. Nous validons notre approche par une évaluation sur le corpus arboré de Paris 7. La seconde étape permet d'améliorer significativement la qualité des analyses retournées, quelle que soit la métrique utilisée

HAL AMU