Search CORE

288 research outputs found

Modular resource development and diagnostic evaluation framework for fast NLP system improvement

Author: de Chalendar Gaël
Nouvel Damien
Publication venue: HAL CCSD
Publication date: 31/05/2009
Field of study

Natural Language Processing systems are large-scale softwares, whose development involves many man-years of work, in terms of both coding and resource development. Given a dictionary of 110k lemmas, a few hundred syntactic analysis rules, 20k ngrams matrices and other resources, what will be the impact on a syntactic analyzer of adding a new possible category to a given verb? What will be the consequences of a new syntactic rules addition? Any modification may imply, besides what was expected, unforeseeable side-effects and the complexity of the system makes it difficult to guess the overall impact of even small changes. We present here a framework designed to effectively and iteratively improve the accuracy of our linguistic analyzer LIMA by iterative refinements of its linguistic resources. These improvements are continuously assessed by evaluating the analyzer performance against a reference corpus. Our first results show that this framework is really helpful towards this goal

HAL Université de Tours

HAL-CEA

Constraint-Based Parsing as an Efficient Solution: Results from the Parsing Evaluation Campaign EASy

Author: Balfourier Jean-Marie
Blache Philippe
Vanrullen Tristan
Publication venue: LREC
Publication date: 01/01/2006
Field of study

International audienceThis paper describes the unfolding of the EASy evaluation campaign for French parsers as well as the techniques employed for the participation of laboratory LPL to this campaign. Three symbolic parsers based on a same resource and a same formalism (Property Grammars) are described and evaluated. The first results of this evaluation are analyzed and lead to the conclusion that symbolic parsing in a constraint-based formalism is efficient and robust

HAL AMU

D4.1. Technologies and tools for corpus creation, normalization and annotation

Author: Aleksic Vera
B?l Nuria
Bartolini Roberto
Caselli Tommaso
Frontini Francesca
Hamon Olivier
Papavassiliou Vassilis
Pecina Pavel
Poch Riera Marc
Poibeau Thierry
Prokopis Prokopidis
Rimell Laura
Thurmair Gregor
Publication venue
Publication date
Field of study

The objectives of the Corpus Acquisition and Annotation (CAA) subsystem are the acquisition and processing of monolingual and bilingual language resources (LRs) required in the PANACEA context. Therefore, the CAA subsystem includes: i) a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web, ii) a component for cleanup and normalization (CNC) of these data and iii) a text processing component (TPC) which consists of NLP tools including modules for sentence splitting, POS tagging, lemmatization, parsing and named entity recognition

PUblication MAnagement

Automatic rich annotation of large corpus of conversational transcribed speech

Author: Antoine Jean-Yves
Friburger Nathalie
Mokrane Abdenour
Publication venue: HAL CCSD
Publication date: 28/05/2008
Field of study

International audienceThis paper describes the use of the CasSys platform in order to achieve the chunking of conversational speech transcripts by means of cascades of Unitex transducers. Our system is involved in the EPAC project of the French National agency of Research (ANR). The aim of this project is to develop robust methods for the annotation of audio/multimedia document collections which contains conversational speech sequences such as TV or radio programs. At first, this paper presents the EPAC project and the adaptation of a former chunking system (Romus) which was developed in the restricted framework of dedicated spoken man-machine dialogue. Then, it describes the problems that are arising due to 1) spontaneous speech disfluencies and 2) errors for the previous stages of processing (automatic speech recognition and POS tagging)

HAL Université de Tours

Comparing the Inﬂuence of Different Treebank Annotations on Dependency Parsing

Author: Attardi G.
Bosco Cristina
dell&#8217
Hall J.
Lavelli A.
Lenci A.
Lesmo Leonardo
Lombardo Vincenzo
Mazzei Alessandro
Montemagni S.
Nilsson J.
Nivre J.
Simi M.
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2010
Field of study

Institutional Research Information System University of Turin

The EVALITA Dependency Parsing Task: from 2007 to 2011

Author: C. Bosco
C. Bosco
F.M. Zanzotto
G. Attardi
J. Nivre
R. Hudson
S. Montemagni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Institutional Research Information System University of Turin

Des relations d'alignement pour décrire l'interaction des domaines linguistiques : vers des Grammaires Multimodales

Author: Blache Philippe
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

International audienceUn des problèmes majeurs de la linguistique aujourd'hui réside dans la prise en compte de phénomènes relevant de domaines et de modalités différentes. Dans la littérature, la réponse consiste à représenter les relations pouvant exister entre ces domaines de façon externe, en termes de relation de structure à structure, s'appuyant donc sur une description distincte de chaque domaine ou chaque modalité. Nous proposons dans cet article une approche différente permettant représenter ces phénomènes dans un cadre formel unique, permettant de rendre compte au sein d'une même grammaire tous les phénomènes concernés. Cette représentation précise de l'interaction entre domaines et modalités s'appuie sur la définition de relations d'alignement

HAL AMU

Proceedings

Author: Dickinson Markus
Müürisep Kaili
Passarotti Marco
Publication venue
Publication date: 01/12/2010
Field of study

Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

DSpace at Tartu University Library

Evaluation of a Grammar of French Determiners

Author: Eric Laporte
Éric Laporte
Publication venue
Publication date: 01/01/2007
Field of study

Existing syntactic grammars of natural languages, even with a far from complete coverage, are complex objects. Assessments of the quality of parts of such grammars are useful for the validation of their construction. We evaluated the quality of a grammar of French determiners that takes the form of a recursive transition network. The result of the application of this local grammar gives deeper syntactic information than chunking or information available in treebanks. We performed the evaluation by comparison with a corpus independently annotated with information on determiners. We obtained 86% precision and 92% recall on text not tagged for parts of speech.Comment: 10 page

arXiv.org e-Print Archive

CiteSeerX

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM