Search CORE

11 research outputs found

Evaluating evaluation measures

Author: Rehbein Ines
van Genabith Josef
Publication venue
Publication date: 01/01/2007
Field of study

This paper presents a thorough examination of the validity of three evaluation measures on parser output. We assess parser performance of an unlexicalised probabilistic parser trained on two German treebanks with different annotation schemes and evaluate parsing results using the PARSEVAL metric, the Leaf-Ancestor metric and a dependency-based evaluation. We reject the claim that the T¨uBa-D/Z annotation scheme is more adequate then the TIGER scheme for PCFG parsing and show that PARSEVAL should not be used to compare parser performance for parsers trained on treebanks with different annotation schemes. An analysis of specific error types indicates that the dependency-based evaluation is most appropriate to reflect parse quality

CiteSeerX

Irish Universities

DCU Online Research Access Service

DSpace at Tartu University Library

Treebank annotation schemes and parser evaluation for German

Author: Rehbein Ines
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

Recent studies focussed on the question whether less-congurational languages like German are harder to parse than English, or whether the lower parsing scores are an artefact of treebank encoding schemes and data structures, as claimed by K¨ubler et al. (2006). This claim is based on the assumption that PARSEVAL metrics fully reflect parse quality across treebank encoding schemes. In this paper we present new experiments to test this claim. We use the PARSEVAL metric, the Leaf-Ancestor metric as well as a dependency-based evaluation, and present novel approaches measuring the effect of controlled error insertion on treebank trees and parser output. We also provide extensive past-parsing crosstreebank conversion. The results of the experiments show that, contrary to K¨ubler et al. (2006), the question whether or not German is harder to parse than English remains undecided

Irish Universities

DCU Online Research Access Service

Statistical parsing of morphologically rich languages (SPMRL): what, how and whither

Author: Candito Marie
Foster Jennifer
Goldberg Yoav
Kübler Sandra
Rehbein Ines
Seddah Djamé
Tounsi Lamia
Tsarfaty Reut
Versley Yannick
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations

CiteSeerX

INRIA a CCSD electronic archive server

Irish Universities

DCU Online Research Access Service

Hal-Diderot

Comparing Conversions of Discontinuity in PCFG Parsing

Author: Hsu Yu-Yin
Publication venue
Publication date: 01/12/2010
Field of study

Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 103-113. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

DSpace at Tartu University Library

Annotation Schema Oriented Validation for Dependency Parsing Evaluation

Author: Alberto Lavelli
Bosco Cristina
Publication venue: NEALT - Northern European Association for Language Technology
Publication date: 01/01/2010
Field of study

Institutional Research Information System University of Turin

Annotation Schema Oriented Validation for Dependency Parsing Evaluation

Author: Bosco Cristina
Lavelli Alberto
Publication venue
Publication date: 29/11/2010
Field of study

Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 19-30. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

DSpace at Tartu University Library

D4.1. Technologies and tools for corpus creation, normalization and annotation

Author: Aleksic Vera
B?l Nuria
Bartolini Roberto
Caselli Tommaso
Frontini Francesca
Hamon Olivier
Papavassiliou Vassilis
Pecina Pavel
Poch Riera Marc
Poibeau Thierry
Prokopis Prokopidis
Rimell Laura
Thurmair Gregor
Publication venue
Publication date
Field of study

The objectives of the Corpus Acquisition and Annotation (CAA) subsystem are the acquisition and processing of monolingual and bilingual language resources (LRs) required in the PANACEA context. Therefore, the CAA subsystem includes: i) a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web, ii) a component for cleanup and normalization (CNC) of these data and iii) a text processing component (TPC) which consists of NLP tools including modules for sentence splitting, POS tagging, lemmatization, parsing and named entity recognition

PUblication MAnagement

Overview of the SPMRL 2013 shared task: cross-framework evaluation of parsing morphologically rich languages

Author: Candito Marie
Choi Jinho
Farkas Richard
Foster Jennifer
Goenaga Iakes
Gojenola Koldo
Goldberg Yoav
Green Spence
Habash Nizar
Kuhlmann Marco
Kübler Sandra
Maier Wolfgang
Nivre Joakim
Przepiórkowski Adam
Roth Ryan
Seddah Djamé
Seeker Wolfgang
Tsarfaty Reut
Versley Yannick
Villemonte de la Clérgerie Eric
Vincze Veronika
Wolinski Marcin
Wróblewska Alina
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 18/10/2013
Field of study

This paper reports on the first shared task on statistical parsing of morphologically rich languages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the evaluation metrics for parsing MRLs given different representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios

Irish Universities

DCU Online Research Access Service

Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages

Author: Candito Marie
Choi Jinho D.
Farkas Richárd
Foster Jennifer
Goenaga Iakes
Gojenola Galletebeitia Koldo
Goldberg Yoav
Green Spence
Habash Nizar
Kuhlmann Marco
Kübler Sandra
Maier Wolfgang
Nivre Joakim
PrzepiÓrkowski Adam
Roth Ryan
Seddah Djamé
Seeker Wolfgang
Tsarfaty Reut
Versley Yannick
Villemonte de La Clergerie Éric
Vincze Veronika
Wolińsk Marcin
WrÓblewska Alina
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 18/10/2013
Field of study

International audienceThis paper reports on the first shared task on statistical parsing of morphologically rich lan- guages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the eval- uation metrics for parsing MRLs given dif- ferent representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios

INRIA a CCSD electronic archive server

Hal-Diderot

Treebank-Based Deep Grammar Acquisition for French Probabilistic Parsing Resources

Author: Schluter Natalie
Publication venue: Dublin City University. School of Computing
Publication date: 19/01/2011
Field of study

Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in wide-coverage grammars automatically obtained from treebanks. In particular, recent years have seen a move towards acquiring deep (LFG, HPSG and CCG) resources that can represent information absent from simple CFG-type structured treebanks and which are considered to produce more language-neutral linguistic representations, such as syntactic dependency trees. As is often the case in early pioneering work in natural language processing, English has been the focus of attention in the first efforts towards acquiring treebank-based deep-grammar resources, followed by treatments of, for example, German, Japanese, Chinese and Spanish. However, to date no comparable large-scale automatically acquired deep-grammar resources have been obtained for French. The goal of the research presented in this thesis is to develop, implement, and evaluate treebank-based deep-grammar acquisition techniques for French. Along the way towards achieving this goal, this thesis presents the derivation of a new treebank for French from the Paris 7 Treebank, the Modified French Treebank, a cleaner, more coherent treebank with several transformed structures and new linguistic analyses. Statistical parsers trained on this data outperform those trained on the original Paris 7 Treebank, which has five times the amount of data. The Modified French Treebank is the data source used for the development of treebank-based automatic deep-grammar acquisition for LFG parsing resources for French, based on an f-structure annotation algorithm for this treebank. LFG CFG-based parsing architectures are then extended and tested, achieving a competitive best f-score of 86.73% for all features. The CFG-based parsing architectures are then complemented with an alternative dependency-based statistical parsing approach, obviating the CFG-based parsing step, and instead directly parsing strings into f-structures

CiteSeerX

Irish Universities

DCU Online Research Access Service