Search CORE

257 research outputs found

Factoring Predicate Argument and Scope Semantics : underspecified Semantics with LTAG

Author: Joshi Aravind K.
Kallmeyer Laura
Publication venue
Publication date: 01/01/2003
Field of study

In this paper we propose a compositional semantics for lexicalized tree-adjoining grammar (LTAG). Tree-local multicomponent derivations allow separation of the semantic contribution of a lexical item into one component contributing to the predicate argument structure and a second component contributing to scope semantics. Based on this idea a syntax-semantics interface is presented where the compositional semantics depends only on the derivation structure. It is shown that the derivation structure (and indirectly the locality of derivations) allows an appropriate amount of underspecification. This is illustrated by investigating underspecified representations for quantifier scope ambiguities and related phenomena such as adjunct scope and island constraints

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

Wide-coverage deep statistical parsing using automatic dependency structure annotation

Author: Abney Stephen
Andy Way
Aoife Cahill
Briscoe Edward
Chinchor Nancy
Johnson Mark
Josef van Genabith
Michael Burke
Ruth O'Donovan
Stefan Riezler
Xue Nianwen
Publication venue: 'MIT Press - Journals'
Publication date: 01/03/2008
Field of study

A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing “deep” hand-crafted wide-coverage with “shallow” treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

Crossref

Irish Universities

DCU Online Research Access Service

The Computational Analysis of the Syntax and Interpretation of Free Word Order in Turkish

Author: Hoffman Beryl
Publication venue: ScholarlyCommons
Publication date: 01/01/1995
Field of study

In this dissertation, I examine a language with “free” word order, specifically Turkish, in order to develop a formalism that can capture the syntax and the context-dependent interpretation of “free” word order within a computational framework. In “free” word order languages, word order is used to convey distinctions in meaning that are not captured by traditional truth-conditional semantics. The word order indicates the “information structure”, e.g. what is the “topic” and the “focus” of the sentence. The context-appropriate use of “free” word order is of considerable importance in developing practical applications in natural language interpretation, generation, and machine translation. I develop a formalism called Multiset-CCG, an extension of Combinatory Categorial Grammars, CCGs, (Ades/Steedman 1982, Steedman 1985), and demonstrate its advantages in an implementation of a data-base query system that interprets Turkish questions and generates answers with contextually appropriate word orders. Multiset-CCG is a context-sensitive and polynomially parsable grammar that captures the formal and descriptive properties of “free” word order and restrictions on word order in simple and complex sentences (with discontinuous constituents and long distance dependencies). Multiset-CCG captures the context-dependent meaning of word order in Turkish by compositionally deriving the predicate-argument structure and the information structure of a sentence in parallel. The advantages of using such a formalism are that it is computationally attractive and that it provides a compositional and flexible surface structure that allows syntactic constituents to correspond to information structure constituents. A formalism that integrates information structure and syntax such as Multiset-CCG is essential to the computational tasks of interpreting and generating sentences with contextually appropriate word orders in “free” word order languages

ScholarlyCommons@Penn

Exploiting multi-word units in statistical parsing and generation

Author: Cafferkey Conor
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2008
Field of study

Syntactic parsing is an important prerequisite for many natural language processing (NLP) applications. The task refers to the process of generating the tree of syntactic nodes with associated phrase category labels corresponding to a sentence. Our objective is to improve upon statistical models for syntactic parsing by leveraging multi-word units (MWUs) such as named entities and other classes of multi-word expressions. Multi-word units are phrases that are lexically, syntactically and/or semantically idiosyncratic in that they are to at least some degree non-compositional. If such units are identified prior to, or as part of, the parsing process their boundaries can be exploited as islands of certainty within the very large (and often highly ambiguous) search space. Luckily, certain types of MWUs can be readily identified in an automatic fashion (using a variety of techniques) to a near-human level of accuracy. We carry out a number of experiments which integrate knowledge about different classes of MWUs in several commonly deployed parsing architectures. In a supplementary set of experiments, we attempt to exploit these units in the converse operation to statistical parsing---statistical generation (in our case, surface realisation from Lexical-Functional Grammar f-structures). We show that, by exploiting knowledge about MWUs, certain classes of parsing and generation decisions are more accurately resolved. This translates to improvements in overall parsing and generation results which, although modest, are demonstrably significant

DCU Online Research Access Service

On Internal Merge

Author: Steedman Mark
Publication venue
Publication date: 26/09/2023
Field of study

Edinburgh Research Explorer

Recommended from our members

Tree Adjoining Grammar at the Interfaces

Author: Longenbaugh Nicholas Steven
Publication venue: 'Harvard University Botany Libraries'
Publication date: 12/08/2014
Field of study

This thesis constitutes an exploration of the applications of tree adjoining grammar (TAG) to natural language syntax. Perhaps more than any of its major competitors such as HPSG and LFG, however, TAG has never strayed too far from the guiding principles of generative syntax. Indeed, following the pioneering work of Frank (2004), TAG has been successfully incorporated into Chomsky’s (1995) Minimalist Program (MP). In large part, however, Frank (2004) leaves unexplored the issue of how TAG applies at the PF and LF interfaces. Given the fundamental importance of interfaces within the MP, no minimalist syntactic theory is complete without at least some notion of the means by which syntactic structure relates to pronunciation and interpretation. In this thesis we attempt to provide insight on this very issue: we address how TAG interfaces with the articulatory and interpretive components of the language faculty, and what insights it provides to minimalist conceptions of these interfaces. Ultimately, our aim is both to reaffirm the viability of TAG as a minimalist syntactic theory as well as to demonstrate that TAG makes clear otherwise arcane facts in natural language syntax. The central proposal of this thesis is twofold. First, TAG may be naturally extended to interface with the articulatory and interpretive components of the language faculty by making recourse to synchronous TAG (STAG). Second, once such a framework has been adopted, minimalist ideas regarding the interaction between syntax and linear order can be applied to deal with certain problematic examples in the TAG framework. TAG thus offers confirmation that in at least some cases, certain aspects of linear order are dependent on post-syntactic operations, so that syntax does not always wholly determine linear order. As a corollary of our proposal, we also demonstrate, through a case study in Niuean raising, that the TAG system makes clear predictions on phenomena that are difficult to describe in mainstream minimalist theories. Our argumentation for these proposals proceeds in three major stages. First, we formalize the synchronous TAG system that has to date been applied in a mostly piecemeal way by various researchers (see Shieber & Nesson 2006, Frank & Storoshenko 2012 for some examples). As a part of this formalization, we argue that the derivation of the LF object, but not the PF object, should make recourse to a more expressive version of the TAG system: multicomponent TAG, a variant that relaxes some constraints on the primitive units in the TAG system to yield greater expressive power. Second, we argue that the STAG system lends credence to the view that at least some word order is determined post-syntactically. In the past, researchers have presented ad hoc extensions of the expressive power of TAG to handle various difficult examples such as subject-to-subject raising in English questions and Irish and Welsh main clauses. We demonstrate that these extensions are both theoretically suspect and ultimately unnecessary given minimalist notions of the derivation: for many of the data motivating these extensions, there is independent evidence that their derivation in fact relies on post-syntactic rearrangements of certain verbal heads. Such examples are therefore well within the generative capacity of a framework with a TAG-based syntactic component that allows certain specific and well motivated post-syntactic rearrangements. Third, we demonstrate that not only is our particular system well motivated within the theoretical bounds of the MP, but also that it makes surprising and accurate empirical predications in cases that have otherwise defied analysis. Specifically, the Austronesian language Niuean features a peculiar instance of raising that has defied a satisfactory analysis since its discovery by Seiter (1980, 1983). We show that TAG makes the clear prediction that there is no raising in Niuean, then argue that this prediction is borne out under a careful examination of the facts. Given that the framework was developed almost exclusively based on the Indo-European language family, its ability to capture confounding behavior in a typologically dissimilar Austronesian language is a strong confirmation of its status as a reasonable alternative to mainstream minimalist syntactic theories

Harvard University - DASH

Quasi-logical forms from f-structures for the Penn treebank

Author: Cahill Aoife
McCarthy Mairéad
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2003
Field of study

In this paper we show how the trees in the Penn treebank can be associated automatically with simple quasi-logical forms. Our approach is based on combining two independent strands of work: the first is the observation that there is a close correspondence between quasi-logical forms and LFG f-structures [van Genabith and Crouch, 1996]; the second is the development of an automatic f-structure annotation algorithm for the Penn treebank [Cahill et al, 2002a; Cahill et al, 2002b]. We compare our approach with that of [Liakata and Pulman, 2002]

Irish Universities

DCU Online Research Access Service

Surface Structure

Author: Steedman Mark
Publication venue: ScholarlyCommons
Publication date: 01/06/1992
Field of study

Combinatory Categorial Grammar (CCG) was originally advanced as a theory relating coordination and relativization. The claim was that these constructions can be analysed at the level of surface grammar, without rules of movement, deletion, passing of slash-features, or the syntactic empty category Wh-trace. Instead, CCG generalizes the notion of grammatical constituency to cover everything that can coordinate or result from extraction, via the use of a small number of operations which apply to adjacent lexically realised grammatical categories interpreted as functions

ScholarlyCommons@Penn

Tree Description Grammars and Underspecified Representations

Author: Kallmeyer Laura
Publication venue: ScholarlyCommons
Publication date: 01/01/1999
Field of study

In this thesis, a new grammar formalism called (local) Tree Description Grammar (TDG) is presented that generates tree descriptions. This grammar formalism brings together some of the central ideas in the context of Tree Adjoining Grammars (TAG) on the one hand, and approaches to underspecified semantics for scope ambiguities on the other hand. First a general definition of TDGs is presented, and afterwards a restricted variant called local TDGs is proposed. Since the elements of a local TDG are tree descriptions, an extended domain of locality as in TAGs is provided by this formalism. Consequently, local TDGs can be lexicalized, and local dependencies such as filler gap dependencies can be expressed in the descriptions occurring in the grammar. The tree descriptions generated by local TDGs are such that the dominance relation (i.e. the reflexive and transitive closure of the parent relation) need not be fully specified. Therefore the generation of suitable underspecified representations for scope ambiguities is possible. The generative capacity of local TDGs is greater than the one of TAGs. Local TDGs are even more powerful than set-local multicomponent TAGs (MC-TAG). However, the generative capacity of local TDGs is restricted in such a way that only semilinear languages are generated. Therefore these languages are of constant growth, a property generally ascribed to natural languages. Local TDGs of different rank can be distinguished depending on the form of derivation steps that are possible in these grammars. This leads to a hierarchy of local TDGs. For the string languages generated by local TDGs of a certain rank, a pumping lemma is proven that allows to show that local TDGs of rank n can generate a language Li := {a1k···a1k|k ≥ 0} iff i ≤ 2n holds. In order to describe the relation between two languages, synchronous local TDGs are introduced. The synchronization with a second local TDG does not increase the generative power of the grammar in the sense that each language generated by a local TDG that is part of a synchronous pair of local TDGs, also can be generated by a single local TDG. This formalism of synchronous local TDGs is used to describe a syntax-semantics interface for a fragment of French which illustrates the derivation of underspecified representations for scope ambiguities with local TDGs

ScholarlyCommons@Penn