140 research outputs found

    Textual Economy through Close Coupling of Syntax and Semantics

    Get PDF
    We focus on the production of efficient descriptions of objects, actions and events. We define a type of efficiency, textual economy, that exploits the hearer's recognition of inferential links to material elsewhere within a sentence. Textual economy leads to efficient descriptions because the material that supports such inferences has been included to satisfy independent communicative goals, and is therefore overloaded in Pollack's sense. We argue that achieving textual economy imposes strong requirements on the representation and reasoning used in generating sentences. The representation must support the generator's simultaneous consideration of syntax and semantics. Reasoning must enable the generator to assess quickly and reliably at any stage how the hearer will interpret the current sentence, with its (incomplete) syntax and semantics. We show that these representational and reasoning requirements are met in the SPUD system for sentence planning and realization.Comment: 10 pages, uses QobiTree.te

    Korean Grammar Using TAGs

    Get PDF
    This paper addresses various issues related to representing the Korean language using Tree Adjoining Grammars. Topics covered include Korean grammar using TAGs, Machine Translation between Korean and English using Synchronous Tree Adjoining Grammars (STAGs), handling scrambling using Multi Component TAGs (MC-TAGs), and recovering empty arguments. The data for the parsing is from US military communication messages

    A psycholinguistically motivated version of TAG

    Get PDF
    We propose a psycholinguistically moti-vated version of TAG which is designed to model key properties of human sentence processing, viz., incrementality, connect-edness, and prediction. We use findings from human experiments to motivate an in-cremental grammar formalism that makes it possible to build fully connected struc-tures on a word-by-word basis. A key idea of the approach is to explicitly model the prediction of upcoming material and the subsequent verification and integration pro-cesses. We also propose a linking theory that links the predictions of our formalism to experimental data such as reading times, and illustrate how it can capture psycholin-guistic results on the processing of either... or structures and relative clauses.


    Get PDF

    CCG Parsing and Multiword Expressions

    Full text link
    This thesis presents a study about the integration of information about Multiword Expressions (MWEs) into parsing with Combinatory Categorial Grammar (CCG). We build on previous work which has shown the benefit of adding information about MWEs to syntactic parsing by implementing a similar pipeline with CCG parsing. More specifically, we collapse MWEs to one token in training and test data in CCGbank, a corpus which contains sentences annotated with CCG derivations. Our collapsing algorithm however can only deal with MWEs when they form a constituent in the data which is one of the limitations of our approach. We study the effect of collapsing training and test data. A parsing effect can be obtained if collapsed data help the parser in its decisions and a training effect can be obtained if training on the collapsed data improves results. We also collapse the gold standard and show that our model significantly outperforms the baseline model on our gold standard, which indicates that there is a training effect. We show that the baseline model performs significantly better on our gold standard when the data are collapsed before parsing than when the data are collapsed after parsing which indicates that there is a parsing effect. We show that these results can lead to improved performance on the non-collapsed standard benchmark although we fail to show that it does so significantly. We conclude that despite the limited settings, there are noticeable improvements from using MWEs in parsing. We discuss ways in which the incorporation of MWEs into parsing can be improved and hypothesize that this will lead to more substantial results. We finally show that turning the MWE recognition part of the pipeline into an experimental part is a useful thing to do as we obtain different results with different recognizers.Comment: MSc thesis, The University of Edinburgh, 2014, School of Informatics, MSc Artificial Intelligenc

    A Metagrammatical Approach to Periphrasis in Gwadloupéyen

    Get PDF
    In this paper, I show that verbal and nominal functional elements of Gwadloupéyen can be described in the Tree-Adjoining Grammar as pertaining to morphological periphrasis. This challenges the claim that Creoles have fully analytical morpholog

    Broad-coverage model of prediction in human sentence processing

    Get PDF
    The aim of this thesis is to design and implement a cognitively plausible theory of sentence processing which incorporates a mechanism for modeling a prediction and verification process in human language understanding, and to evaluate the validity of this model on specific psycholinguistic phenomena as well as on broad-coverage, naturally occurring text. Modeling prediction is a timely and relevant contribution to the field because recent experimental evidence suggests that humans predict upcoming structure or lexemes during sentence processing. However, none of the current sentence processing theories capture prediction explicitly. This thesis proposes a novel model of incremental sentence processing that offers an explicit prediction and verification mechanism. In evaluating the proposed model, this thesis also makes a methodological contribution. The design and evaluation of current sentence processing theories are usually based exclusively on experimental results from individual psycholinguistic experiments on specific linguistic structures. However, a theory of language processing in humans should not only work in an experimentally designed environment, but should also have explanatory power for naturally occurring language. This thesis first shows that the Dundee corpus, an eye-tracking corpus of newspaper text, constitutes a valuable additional resource for testing sentence processing theories. I demonstrate that a benchmark processing effect (the subject/object relative clause asymmetry) can be detected in this data set (Chapter 4). I then evaluate two existing theories of sentence processing, Surprisal and Dependency Locality Theory (DLT), on the full Dundee corpus. This constitutes the first broad-coverage comparison of sentence processing theories on naturalistic text. I find that both theories can explain some of the variance in the eye-movement data, and that they capture different aspects of sentence processing (Chapter 5). In Chapter 6, I propose a new theory of sentence processing, which explicitly models prediction and verification processes, and aims to unify the complementary aspects of Surprisal and DLT. The proposed theory implements key cognitive concepts such as incrementality, full connectedness, and memory decay. The underlying grammar formalism is a strictly incremental version of Tree-adjoining Grammar (TAG), Psycholinguistically motivated TAG (PLTAG), which is introduced in Chapter 7. I then describe how the Penn Treebank can be converted into PLTAG format and define an incremental, fully connected broad-coverage parsing algorithm with associated probability model for PLTAG. Evaluation of the PLTAG model shows that it achieves the broad coverage required for testing a psycholinguistic theory on naturalistic data. On the standardized Penn Treebank test set, it approaches the performance of incremental TAG parsers without prediction (Chapter 8). Chapter 9 evaluates the psycholinguistic aspects of the proposed theory by testing it both on a on a selection of established sentence processing phenomena and on the Dundee eye-tracking corpus. The proposed theory can account for a larger range of psycholinguistic case studies than previous theories, and is a significant positive predictor of reading times on broad-coverage text. I show that it can explain a larger proportion of the variance in reading times than either DLT integration cost or Surprisal
