Search CORE

2,880 research outputs found

A hybrid architecture for robust parsing of german

Author: Erhard W. Hinrichs
Frank H. Müller
Ra Kübler
Tylman Ule
Universität Tübingen
Publication venue
Publication date: 01/01/2002
Field of study

This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

Treebank annotation schemes and parser evaluation for German

Author: Rehbein Ines
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

Recent studies focussed on the question whether less-congurational languages like German are harder to parse than English, or whether the lower parsing scores are an artefact of treebank encoding schemes and data structures, as claimed by K¨ubler et al. (2006). This claim is based on the assumption that PARSEVAL metrics fully reflect parse quality across treebank encoding schemes. In this paper we present new experiments to test this claim. We use the PARSEVAL metric, the Leaf-Ancestor metric as well as a dependency-based evaluation, and present novel approaches measuring the effect of controlled error insertion on treebank trees and parser output. We also provide extensive past-parsing crosstreebank conversion. The results of the experiments show that, contrary to K¨ubler et al. (2006), the question whether or not German is harder to parse than English remains undecided

Irish Universities

DCU Online Research Access Service

Towards case-based parsing : are chunks reliable indicators for syntax trees?

Author: Kübler Sandra
Publication venue
Publication date: 01/01/2006
Field of study

This paper presents an approach to the question whether it is possible to construct a parser based on ideas from case-based reasoning. Such a parser would employ a partial analysis of the input sentence to select a (nearly) complete syntax tree and then adapt this tree to the input sentence. The experiments performed on German data from the Tüba-D/Z treebank and the KaRoPars partial parser show that a wide range of levels of generality can be reached, depending on which types of information are used to determine the similarity between input sentence and training sentences. The results are such that it is possible to construct a case-based parser. The optimal setting out of those presented here need to be determined empirically

Hochschulschriftenserver - Universität Frankfurt am Main

Evaluating evaluation measures

Author: Rehbein Ines
van Genabith Josef
Publication venue
Publication date: 01/01/2007
Field of study

This paper presents a thorough examination of the validity of three evaluation measures on parser output. We assess parser performance of an unlexicalised probabilistic parser trained on two German treebanks with different annotation schemes and evaluate parsing results using the PARSEVAL metric, the Leaf-Ancestor metric and a dependency-based evaluation. We reject the claim that the T¨uBa-D/Z annotation scheme is more adequate then the TIGER scheme for PCFG parsing and show that PARSEVAL should not be used to compare parser performance for parsers trained on treebanks with different annotation schemes. An analysis of specific error types indicates that the dependency-based evaluation is most appropriate to reflect parse quality

CiteSeerX

Irish Universities

DCU Online Research Access Service

DSpace at Tartu University Library

Syntactic annotation of non-canonical linguistic structures

Author: Doolittle Seanna
Hirschmann Hagen
Lüdeling Anke
Publication venue
Publication date: 27/10/2009
Field of study

This paper deals with the syntactic annotation of corpora that contain both ‘canonical’ and ‘non-canonical’ sentences

Hochschulschriftenserver - Universität Frankfurt am Main

Automatic acquisition of LFG resources for German - as good as it gets

Author: Rehbein Ines
van Genabith Josef
Publication venue: CSLI Publications
Publication date: 01/01/2009
Field of study

We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar

CiteSeerX

Irish Universities

DCU Online Research Access Service

What linguists always wanted to know about german and did not know how to estimate

Author: Hinrichs Erhard
Kübler Sandra
Publication venue
Publication date: 01/01/2006
Field of study

This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres

Hochschulschriftenserver - Universität Frankfurt am Main