2,880 research outputs found
A hybrid architecture for robust parsing of german
This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation
Treebank annotation schemes and parser evaluation for German
Recent studies focussed on the question whether less-congurational languages like German are harder to parse than English, or whether the lower parsing scores are an
artefact of treebank encoding schemes and data structures, as claimed by K¨ubler et al. (2006). This claim is based on the assumption that PARSEVAL metrics fully reflect parse quality across treebank encoding schemes. In this paper we present new experiments to test this claim. We use the
PARSEVAL metric, the Leaf-Ancestor metric as well as a dependency-based evaluation, and present novel approaches measuring the effect of controlled error insertion on treebank trees and parser output. We also provide extensive past-parsing crosstreebank conversion. The results of the experiments show that, contrary to K¨ubler et al. (2006), the question whether or not German is harder to parse than English remains undecided
Towards case-based parsing : are chunks reliable indicators for syntax trees?
This paper presents an approach to the question whether it is possible to construct a parser based on ideas from case-based reasoning. Such a parser would employ a partial analysis of the input sentence to select a (nearly) complete syntax tree and then adapt this tree to the input sentence. The experiments performed on German data from the Tüba-D/Z treebank and the KaRoPars partial parser show that a wide range of levels of generality can be reached, depending on which types of information are used to determine the similarity between input sentence and training sentences. The results are such that it is possible to construct a case-based parser. The optimal setting out of those presented here need to be determined empirically
Evaluating evaluation measures
This paper presents a thorough examination of the validity of three evaluation measures on parser output. We assess parser performance of an unlexicalised probabilistic parser trained on two German treebanks with different annotation schemes and evaluate parsing results using the PARSEVAL
metric, the Leaf-Ancestor metric and a dependency-based evaluation. We reject the claim that the T¨uBa-D/Z annotation scheme is more adequate then the TIGER scheme
for PCFG parsing and show that PARSEVAL should not be used to compare parser performance for parsers trained on treebanks with different annotation schemes. An analysis
of specific error types indicates that the dependency-based evaluation is most appropriate to reflect parse quality
Syntactic annotation of non-canonical linguistic structures
This paper deals with the syntactic annotation of corpora that contain both ‘canonical’ and ‘non-canonical’ sentences
Automatic acquisition of LFG resources for German - as good as it gets
We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined
by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG
resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar
What linguists always wanted to know about german and did not know how to estimate
This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres
- …