Search CORE

5,292 research outputs found

Parser evaluation across text types

Author: Versley Yannick
Publication venue
Publication date: 01/01/2005
Field of study

When a statistical parser is trained on one treebank, one usually tests it on another portion of the same treebank, partly due to the fact that a comparable annotation format is needed for testing. But the user of a parser may not be interested in parsing sentences from the same newspaper all over, or even wants syntactic annotations for a slightly different text type. Gildea (2001) for instance found that a parser trained on the WSJ portion of the Penn Treebank performs less well on the Brown corpus (the subset that is available in the PTB bracketing format) than a parser that has been trained only on the Brown corpus, although the latter one has only half as many sentences as the former. Additionally, a parser trained on both the WSJ and Brown corpora performs less well on the Brown corpus than on the WSJ one. This leads us to the following questions that we would like to address in this paper: - Is there a difference in usefulness of techniques that are used to improve parser performance between the same-corpus and the different-corpus case? - Are different types of parsers (rule-based and statistical) equally sensitive to corpus variation? To achieve this, we compared the quality of the parses of a hand-crafted constraint-based parser and a statistical PCFG-based parser that was trained on a treebank of German newspaper text

Hochschulschriftenserver - Universität Frankfurt am Main

How to compare treebanks

Author: Kübler Sandra
Maier Wolfgang
Rehbein Ines
Versley Yannick
Publication venue
Publication date: 01/01/2008
Field of study

Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination

Hochschulschriftenserver - Universität Frankfurt am Main

Treebank annotation schemes and parser evaluation for German

Author: Rehbein Ines
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

Recent studies focussed on the question whether less-congurational languages like German are harder to parse than English, or whether the lower parsing scores are an artefact of treebank encoding schemes and data structures, as claimed by K¨ubler et al. (2006). This claim is based on the assumption that PARSEVAL metrics fully reflect parse quality across treebank encoding schemes. In this paper we present new experiments to test this claim. We use the PARSEVAL metric, the Leaf-Ancestor metric as well as a dependency-based evaluation, and present novel approaches measuring the effect of controlled error insertion on treebank trees and parser output. We also provide extensive past-parsing crosstreebank conversion. The results of the experiments show that, contrary to K¨ubler et al. (2006), the question whether or not German is harder to parse than English remains undecided

Irish Universities

DCU Online Research Access Service

Automatic acquisition of LFG resources for German - as good as it gets

Author: Rehbein Ines
van Genabith Josef
Publication venue: CSLI Publications
Publication date: 01/01/2009
Field of study

We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar

CiteSeerX

Irish Universities

DCU Online Research Access Service

Self-adaptive exploration in evolutionary search

Author: Toussaint Marc
Publication venue
Publication date: 01/01/2001
Field of study

We address a primary question of computational as well as biological research on evolution: How can an exploration strategy adapt in such a way as to exploit the information gained about the problem at hand? We first introduce an integrated formalism of evolutionary search which provides a unified view on different specific approaches. On this basis we discuss the implications of indirect modeling (via a ``genotype-phenotype mapping'') on the exploration strategy. Notions such as modularity, pleiotropy and functional phenotypic complex are discussed as implications. Then, rigorously reflecting the notion of self-adaptability, we introduce a new definition that captures self-adaptability of exploration: different genotypes that map to the same phenotype may represent (also topologically) different exploration strategies; self-adaptability requires a variation of exploration strategies along such a ``neutral space''. By this definition, the concept of neutrality becomes a central concern of this paper. Finally, we present examples of these concepts: For a specific grammar-type encoding, we observe a large variability of exploration strategies for a fixed phenotype, and a self-adaptive drift towards short representations with highly structured exploration strategy that matches the ``problem's structure''.Comment: 24 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

How do treebank annotation schemes influence parsing results? : or how not to compare apples and oranges

Author: Kübler Sandra
Publication venue
Publication date: 01/01/2005
Field of study

In the last decade, the Penn treebank has become the standard data set for evaluating parsers. The fact that most parsers are solely evaluated on this specific data set leaves the question unanswered how much these results depend on the annotation scheme of the treebank. In this paper, we will investigate the influence which different decisions in the annotation schemes of treebanks have on parsing. The investigation uses the comparison of similar treebanks of German, NEGRA and TüBa-D/Z, which are subsequently modified to allow a comparison of the differences. The results show that deleted unary nodes and a flat phrase structure have a negative influence on parsing quality while a flat clause structure has a positive influence

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main