257 research outputs found

    Factoring Predicate Argument and Scope Semantics : underspecified Semantics with LTAG

    Get PDF
    In this paper we propose a compositional semantics for lexicalized tree-adjoining grammar (LTAG). Tree-local multicomponent derivations allow separation of the semantic contribution of a lexical item into one component contributing to the predicate argument structure and a second component contributing to scope semantics. Based on this idea a syntax-semantics interface is presented where the compositional semantics depends only on the derivation structure. It is shown that the derivation structure (and indirectly the locality of derivations) allows an appropriate amount of underspecification. This is illustrated by investigating underspecified representations for quantifier scope ambiguities and related phenomena such as adjunct scope and island constraints

    Wide-coverage deep statistical parsing using automatic dependency structure annotation

    Get PDF
    A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing ā€œdeepā€ hand-crafted wide-coverage with ā€œshallowā€ treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

    The Computational Analysis of the Syntax and Interpretation of Free Word Order in Turkish

    Get PDF
    In this dissertation, I examine a language with ā€œfreeā€ word order, specifically Turkish, in order to develop a formalism that can capture the syntax and the context-dependent interpretation of ā€œfreeā€ word order within a computational framework. In ā€œfreeā€ word order languages, word order is used to convey distinctions in meaning that are not captured by traditional truth-conditional semantics. The word order indicates the ā€œinformation structureā€, e.g. what is the ā€œtopicā€ and the ā€œfocusā€ of the sentence. The context-appropriate use of ā€œfreeā€ word order is of considerable importance in developing practical applications in natural language interpretation, generation, and machine translation. I develop a formalism called Multiset-CCG, an extension of Combinatory Categorial Grammars, CCGs, (Ades/Steedman 1982, Steedman 1985), and demonstrate its advantages in an implementation of a data-base query system that interprets Turkish questions and generates answers with contextually appropriate word orders. Multiset-CCG is a context-sensitive and polynomially parsable grammar that captures the formal and descriptive properties of ā€œfreeā€ word order and restrictions on word order in simple and complex sentences (with discontinuous constituents and long distance dependencies). Multiset-CCG captures the context-dependent meaning of word order in Turkish by compositionally deriving the predicate-argument structure and the information structure of a sentence in parallel. The advantages of using such a formalism are that it is computationally attractive and that it provides a compositional and flexible surface structure that allows syntactic constituents to correspond to information structure constituents. A formalism that integrates information structure and syntax such as Multiset-CCG is essential to the computational tasks of interpreting and generating sentences with contextually appropriate word orders in ā€œfreeā€ word order languages

    Exploiting multi-word units in statistical parsing and generation

    Get PDF
    Syntactic parsing is an important prerequisite for many natural language processing (NLP) applications. The task refers to the process of generating the tree of syntactic nodes with associated phrase category labels corresponding to a sentence. Our objective is to improve upon statistical models for syntactic parsing by leveraging multi-word units (MWUs) such as named entities and other classes of multi-word expressions. Multi-word units are phrases that are lexically, syntactically and/or semantically idiosyncratic in that they are to at least some degree non-compositional. If such units are identified prior to, or as part of, the parsing process their boundaries can be exploited as islands of certainty within the very large (and often highly ambiguous) search space. Luckily, certain types of MWUs can be readily identified in an automatic fashion (using a variety of techniques) to a near-human level of accuracy. We carry out a number of experiments which integrate knowledge about different classes of MWUs in several commonly deployed parsing architectures. In a supplementary set of experiments, we attempt to exploit these units in the converse operation to statistical parsing---statistical generation (in our case, surface realisation from Lexical-Functional Grammar f-structures). We show that, by exploiting knowledge about MWUs, certain classes of parsing and generation decisions are more accurately resolved. This translates to improvements in overall parsing and generation results which, although modest, are demonstrably significant

    On Internal Merge

    Get PDF

    Quasi-logical forms from f-structures for the Penn treebank

    Get PDF
    In this paper we show how the trees in the Penn treebank can be associated automatically with simple quasi-logical forms. Our approach is based on combining two independent strands of work: the first is the observation that there is a close correspondence between quasi-logical forms and LFG f-structures [van Genabith and Crouch, 1996]; the second is the development of an automatic f-structure annotation algorithm for the Penn treebank [Cahill et al, 2002a; Cahill et al, 2002b]. We compare our approach with that of [Liakata and Pulman, 2002]

    Surface Structure

    Get PDF
    Combinatory Categorial Grammar (CCG) was originally advanced as a theory relating coordination and relativization. The claim was that these constructions can be analysed at the level of surface grammar, without rules of movement, deletion, passing of slash-features, or the syntactic empty category Wh-trace. Instead, CCG generalizes the notion of grammatical constituency to cover everything that can coordinate or result from extraction, via the use of a small number of operations which apply to adjacent lexically realised grammatical categories interpreted as functions

    Tree Description Grammars and Underspecified Representations

    Get PDF
    In this thesis, a new grammar formalism called (local) Tree Description Grammar (TDG) is presented that generates tree descriptions. This grammar formalism brings together some of the central ideas in the context of Tree Adjoining Grammars (TAG) on the one hand, and approaches to underspecified semantics for scope ambiguities on the other hand. First a general definition of TDGs is presented, and afterwards a restricted variant called local TDGs is proposed. Since the elements of a local TDG are tree descriptions, an extended domain of locality as in TAGs is provided by this formalism. Consequently, local TDGs can be lexicalized, and local dependencies such as filler gap dependencies can be expressed in the descriptions occurring in the grammar. The tree descriptions generated by local TDGs are such that the dominance relation (i.e. the reflexive and transitive closure of the parent relation) need not be fully specified. Therefore the generation of suitable underspecified representations for scope ambiguities is possible. The generative capacity of local TDGs is greater than the one of TAGs. Local TDGs are even more powerful than set-local multicomponent TAGs (MC-TAG). However, the generative capacity of local TDGs is restricted in such a way that only semilinear languages are generated. Therefore these languages are of constant growth, a property generally ascribed to natural languages. Local TDGs of different rank can be distinguished depending on the form of derivation steps that are possible in these grammars. This leads to a hierarchy of local TDGs. For the string languages generated by local TDGs of a certain rank, a pumping lemma is proven that allows to show that local TDGs of rank n can generate a language Li := {a1kĀ·Ā·Ā·a1k|k ā‰„ 0} iff i ā‰¤ 2n holds. In order to describe the relation between two languages, synchronous local TDGs are introduced. The synchronization with a second local TDG does not increase the generative power of the grammar in the sense that each language generated by a local TDG that is part of a synchronous pair of local TDGs, also can be generated by a single local TDG. This formalism of synchronous local TDGs is used to describe a syntax-semantics interface for a fragment of French which illustrates the derivation of underspecified representations for scope ambiguities with local TDGs
    • ā€¦
    corecore