Search CORE

24 research outputs found

A Generation Framework for Grammar Development

Author: Gardent Claire
Narayan Shashi
Publication venue
Publication date: 06/03/2015
Field of study

Vers un méta-lexique pour le français: architecture, acquisition, utilisation

Author: Boullier Pierre
Clément Lionel
Sagot Benoît
Villemonte de La Clergerie Éric
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

Nous présentons dans cet article une nouvelle ressource lexicale pour le français, bientôt librement disponible en tant que deuxième version du Lefff (Lexique des Formes Fléchies du Français). Il s'agit d'un lexique morphologique et syntaxique à large couverture, dont l'architecture repose sur une structure d'héritage de propriétés, ce qui le rend plus compact et plus aisément maintenable. Cela permet également une description des entrées lexicales indépendante des formalismes dans lesquel il est utilisé. Pour ces deux raisons, nous utilisons le terme méta-lexique. Nous décrivons son architecture, différentes approches automatiques ou semi-automatiques pour acquérir, corriger et/ou compléter un tel lexique, ainsi que la manière dont il a été utilisé en lien avec une LFG et une TAG pour construire deux analyseurs du français à large couverture

INRIA a CCSD electronic archive server

Oskar Bordeaux

The Lefff 2 syntactic lexicon for French: architecture, acquisition, use

Author: Boullier Pierre
Clément Lionel
Sagot Benoît
Villemonte de La Clergerie Éric
Publication venue: HAL CCSD
Publication date: 01/01/2006
Field of study

International audienceIn this paper, we introduce a new lexical resource for French which is freely available as the second version of the Lefff (Lexique des formes fl ́echies du franc ̧ ais – Lexicon of French inflected forms). It is a wide-coverage morphosyntactic and syntactic lexicon, whose architecture relies on properties inheritance, which makes it more compact and more easily maintainable and allows to describe lexical entries independantly from the formalisms it is used for. For these two reasons, we define it as a meta-lexicon. We describe its architecture, several automatic or semi-automatic approaches we use to acquire, correct and/or enrich such a lexicon, as well as the way it is used both with an LFG parser and with a TAG parser based on a meta-grammar, so as to build two large-coverage parsers for French

INRIA a CCSD electronic archive server

Mining Parsing Results for Lexical Corrections

Author: Farré Jacques
Nicolas Lionel
Villemonte de La Clergerie Éric
Publication venue: Wydawnictwo Poznańskie Sp. z o. o.
Publication date: 05/10/2007
Field of study

International audienceSuccessful parsing depends on the quality of the underlying grammar but also on the correctness of the lexicon that feeds the parser. The development of a lexicon both complete and accurate is an intricate and demanding task. A first step towards the improvement of a lexicon consists in identifying potentially erroneous lexical entries, for instance by using error mining techniques on large corpora (Sagot and de La Clergerie, ACL/COLING 2006) This paper explores the next logical step, namely the suggestion of corrections for those entries. This is achieved by running new analysis on the sentences rejected at the previous step, after having modified the information carried by the identified lexical entries. Afterwards, a statistical computation on the parsing results exhibits the most relevant corrections

INRIA a CCSD electronic archive server

Error mining in parsing results

Author: Sagot Benoît
Villemonte de La Clergerie Éric
Publication venue: HAL CCSD
Publication date: 17/07/2006
Field of study

International audienceWe introduce an error mining technique for automatically detecting errors in resources that are used in parsing systems. We applied this technique on parsing results produced on several million words by two distinct parsing systems, which share the syntactic lexicon and the pre-parsing processing chain. We were thus able to identify missing and erroneous information in these resources

INRIA a CCSD electronic archive server

Trouver et confondre les coupables : un processus sophistiqué de correction de lexique

Author: Farré Jacques
Molinero Miguel,
Nicolas Lionel
Sagot Benoît
Villemonte de La Clergerie Éric
Publication venue: HAL CCSD
Publication date: 24/06/2009
Field of study

International audienceThe coverage of a parser depends mostly on the quality of the underlying grammar and lexicon. The development of a lexicon both complete and accurate is an intricate and demanding task, overall when achieving a certain level of quality and coverage. We introduce an automatic process able to detect missing or incomplete entries in a lexicon, and to suggest corrections hypotheses for these entries. The detection of dubious lexical entries is tackled by two techniques relying either on a specific statistical model, or on the information provided by a part-of-speech tagger. The generation of correction hypotheses for the detected entries is achieved by studying which modifications could improve the parse rate of the sentences in which the entries occur. This process brings together various techniques based on different tools such as taggers, parsers and entropy classifiers. Applying it on the Lefff, a large-coverage morphologi- cal and syntactic French lexicon, has already allowed us to perfom noticeable improvements

HAL-UNICE

INRIA a CCSD electronic archive server

Hal-Diderot

Error Mining on Dependency Trees

Author: Gardent Claire
Narayan Shashi
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

International audienceIn recent years, error mining approaches were developed to help identify the most likely sources of parsing failures in parsing systems using handcrafted grammars and lexicons. However the techniques they use to enumerate and count n-grams builds on the sequential nature of a text corpus and do not easily extend to structured data. In this paper, we propose an algorithm for mining trees and apply it to detect the most likely sources of generation failure. We show that this tree mining algorithm permits identifying not only errors in the generation system (grammar, lexicon) but also mismatches between the structures contained in the input and the input structures expected by our generator as well as a few idiosyncrasies/error in the input data

CiteSeerX

INRIA a CCSD electronic archive server

FRMG: évolutions d'un analyseur syntaxique TAG du français

Author: Guénot Marie-Laure
Nicolas Lionel
Sagot Benoît
Villemonte de La Clergerie Éric
Publication venue: HAL CCSD
Publication date: 10/10/2009
Field of study

Journée de l'ATALA organisée conjointement à la conférence IWPT 2009National audienceNous présentons FRMG, un analyseur syntaxique du français à large couverture. Nous mettons en avant les méthodes qui ont permis d'améliorer ses performances depuis sa naissance, en 2004, initiée dans le cadre de la première campagne EASy d'évaluation des analyseurs syntaxique

HAL-UNICE

INRIA a CCSD electronic archive server

Hal-Diderot

FRMG: évolutions d'un analyseur syntaxique TAG du français

Author: Guénot Marie-Laure
Nicolas Lionel
Sagot Benoît
Villemonte de La Clergerie Éric
Publication venue: HAL CCSD
Publication date: 10/10/2009
Field of study

INRIA a CCSD electronic archive server

Fouille d'erreurs sur des sorties d'analyseurs syntaxiques

Author: Sagot Benoît
Villemonte de La Clergerie Éric
Publication venue: ATALA (Association pour le Traitement Automatique des Langues)
Publication date: 01/01/2008
Field of study

International audienceNous présentons une méthode de fouille d'erreurs pour détecter automatiquement des erreurs dans les ressources utilisées par les systèmes d'analyse syntaxique. Nous avons mis en œuvre cette méthode sur le résultat de l'analyse de plusieurs millions de mots par deux systèmes d'analyse différents qui ont toutefois en commun le lexique syntaxique et la chaîne de traitement présyntaxique. Nous pouvons ainsi identiﬁer des inexactitudes et des incomplétudes dans les ressources utilisées. En particulier, la comparaison des résultats obtenus sur les sorties des deux analyseurs sur un même corpus nous permet d'isoler les problèmes issus des ressources partagées de ceux issus des grammaires

INRIA a CCSD electronic archive server

Hal-Diderot