1,204 research outputs found

    Wide-coverage deep statistical parsing using automatic dependency structure annotation

    Get PDF
    A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing “deep” hand-crafted wide-coverage with “shallow” treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

    Automatic acquisition of LFG resources for German - as good as it gets

    Get PDF
    We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar

    Improving dependency label accuracy using statistical post-editing: A cross-framework study

    Get PDF
    We present a statistical post-editing method for modifying the dependency labels in a dependency analysis. We test the method using two English datasets, three parsing systems and three labelled dependency schemes. We demonstrate how it can be used both to improve dependency label accuracy in parser output and highlight problems with and differences between constituency-to-dependency conversions

    Treebank-based acquisition of LFG parsing resources for French

    Get PDF
    Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in automatically obtained wide-coverage grammars from treebanks for natural language processing. In particular, recent years have seen the growth in interest in automatically obtained deep resources that can represent information absent from simple CFG-type structured treebanks and which are considered to produce more language-neutral linguistic representations, such as dependency syntactic trees. As is often the case in early pioneering work on natural language processing, English has provided the focus of first efforts towards acquiring deep-grammar resources, followed by successful treatments of, for example, German, Japanese, Chinese and Spanish. However, no comparable large-scale automatically acquired deep-grammar resources have been obtained for French to date. The goal of this paper is to present the application of treebank-based language acquisition to the case of French. We show that with modest changes to the established parsing architectures, encouraging results can be obtained for French, with a best dependency structure f-score of 86.73%

    Treebank-based acquisition of LFG resources for Chinese

    Get PDF
    This paper presents a method to automatically acquire wide-coverage, robust, probabilistic Lexical-Functional Grammar resources for Chinese from the Penn Chinese Treebank (CTB). Our starting point is the earlier, proofof- concept work of (Burke et al., 2004) on automatic f-structure annotation, LFG grammar acquisition and parsing for Chinese using the CTB version 2 (CTB2). We substantially extend and improve on this earlier research as regards coverage, robustness, quality and fine-grainedness of the resulting LFG resources. We achieve this through (i) improved LFG analyses for a number of core Chinese phenomena; (ii) a new automatic f-structure annotation architecture which involves an intermediate dependency representation; (iii) scaling the approach from 4.1K trees in CTB2 to 18.8K trees in CTB version 5.1 (CTB5.1) and (iv) developing a novel treebank-based approach to recovering non-local dependencies (NLDs) for Chinese parser output. Against a new 200-sentence good standard of manually constructed f-structures, the method achieves 96.00% f-score for f-structures automatically generated for the original CTB trees and 80.01%for NLD-recovered f-structures generated for the trees output by Bikel’s parser

    From surface dependencies towards deeper semantic representations [Semantic representations]

    Get PDF
    In the past, a divide could be seen between ’deep’ parsers on the one hand, which construct a semantic representation out of their input, but usually have significant coverage problems, and more robust parsers on the other hand, which are usually based on a (statistical) model derived from a treebank and have larger coverage, but leave the problem of semantic interpretation to the user. More recently, approaches have emerged that combine the robustness of datadriven (statistical) models with more detailed linguistic interpretation such that the output could be used for deeper semantic analysis. Cahill et al. (2002) use a PCFG-based parsing model in combination with a set of principles and heuristics to derive functional (f-)structures of Lexical-Functional Grammar (LFG). They show that the derived functional structures have a better quality than those generated by a parser based on a state-of-the-art hand-crafted LFG grammar. Advocates of Dependency Grammar usually point out that dependencies already are a semantically meaningful representation (cf. Menzel, 2003). However, parsers based on dependency grammar normally create underspecified representations with respect to certain phenomena such as coordination, apposition and control structures. In these areas they are too "shallow" to be directly used for semantic interpretation. In this paper, we adopt a similar approach to Cahill et al. (2002) using a dependency-based analysis to derive functional structure, and demonstrate the feasibility of this approach using German data. A major focus of our discussion is on the treatment of coordination and other potentially underspecified structures of the dependency data input. F-structure is one of the two core levels of syntactic representation in LFG (Bresnan, 2001). Independently of surface order, it encodes abstract syntactic functions that constitute predicate argument structure and other dependency relations such as subject, predicate, adjunct, but also further semantic information such as the semantic type of an adjunct (e.g. directional). Normally f-structure is captured as a recursive attribute value matrix, which is isomorphic to a directed graph representation. Figure 5 depicts an example target f-structure. As mentioned earlier, these deeper-level dependency relations can be used to construct logical forms as in the approaches of van Genabith and Crouch (1996), who construct underspecified discourse representations (UDRSs), and Spreyer and Frank (2005), who have robust minimal recursion semantics (RMRS) as their target representation. We therefore think that f-structures are a suitable target representation for automatic syntactic analysis in a larger pipeline of mapping text to interpretation. In this paper, we report on the conversion from dependency structures to fstructure. Firstly, we evaluate the f-structure conversion in isolation, starting from hand-corrected dependencies based on the TüBa-D/Z treebank and Versley (2005)´s conversion. Secondly, we start from tokenized text to evaluate the combined process of automatic parsing (using Foth and Menzel (2006)´s parser) and f-structure conversion. As a test set, we randomly selected 100 sentences from TüBa-D/Z which we annotated using a scheme very close to that of the TiGer Dependency Bank (Forst et al., 2004). In the next section, we sketch dependency analysis, the underlying theory of our input representations, and introduce four different representations of coordination. We also describe Weighted Constraint Dependency Grammar (WCDG), the dependency parsing formalism that we use in our experiments. Section 3 characterises the conversion of dependencies to f-structures. Our evaluation is presented in section 4, and finally, section 5 summarises our results and gives an overview of problems remaining to be solved

    Memory-Based Shallow Parsing

    Get PDF
    We present a memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as memory-based modules. The experiments reported in this paper show competitive results, the F-value for the Wall Street Journal (WSJ) treebank is: 93.8% for NP chunking, 94.7% for VP chunking, 77.1% for subject detection and 79.0% for object detection.Comment: 8 pages, to appear in: Proceedings of the EACL'99 workshop on Computational Natural Language Learning (CoNLL-99), Bergen, Norway, June 199

    DepAnn - An Annotation Tool for Dependency Treebanks

    Full text link
    DepAnn is an interactive annotation tool for dependency treebanks, providing both graphical and text-based annotation interfaces. The tool is aimed for semi-automatic creation of treebanks. It aids the manual inspection and correction of automatically created parses, making the annotation process faster and less error-prone. A novel feature of the tool is that it enables the user to view outputs from several parsers as the basis for creating the final tree to be saved to the treebank. DepAnn uses TIGER-XML, an XML-based general encoding format for both, representing the parser outputs and saving the annotated treebank. The tool includes an automatic consistency checker for sentence structures. In addition, the tool enables users to build structures manually, add comments on the annotations, modify the tagsets, and mark sentences for further revision
    corecore