405 research outputs found

    Coordination in Tree Adjoining Grammars: Formalization and Implementation

    Full text link
    In this paper we show that an account for coordination can be constructed using the derivation structures in a lexicalized Tree Adjoining Grammar (LTAG). We present a notion of derivation in LTAGs that preserves the notion of fixed constituency in the LTAG lexicon while providing the flexibility needed for coordination phenomena. We also discuss the construction of a practical parser for LTAGs that can handle coordination including cases of non-constituent coordination.Comment: 6 pages, 16 Postscript figures, uses colap.sty. To appear in the proceedings of COLING 199

    Sentence Simplification for Text Processing

    Get PDF
    A thesis submitted in partial fulfilment of the requirement of the University of Wolverhampton for the degree of Doctor of Philosophy.Propositional density and syntactic complexity are two features of sentences which affect the ability of humans and machines to process them effectively. In this thesis, I present a new approach to automatic sentence simplification which processes sentences containing compound clauses and complex noun phrases (NPs) and converts them into sequences of simple sentences which contain fewer of these constituents and have reduced per sentence propositional density and syntactic complexity. My overall approach is iterative and relies on both machine learning and handcrafted rules. It implements a small set of sentence transformation schemes, each of which takes one sentence containing compound clauses or complex NPs and converts it one or two simplified sentences containing fewer of these constituents (Chapter 5). The iterative algorithm applies the schemes repeatedly and is able to simplify sentences which contain arbitrary numbers of compound clauses and complex NPs. The transformation schemes rely on automatic detection of these constituents, which may take a variety of forms in input sentences. In the thesis, I present two new shallow syntactic analysis methods which facilitate the detection process. The first of these identifies various explicit signs of syntactic complexity in input sentences and classifies them according to their specific syntactic linking and bounding functions. I present the annotated resources used to train and evaluate this sign tagger (Chapter 2) and the machine learning method used to implement it (Chapter 3). The second syntactic analysis method exploits the sign tagger and identifies the spans of compound clauses and complex NPs in input sentences. In Chapter 4 of the thesis, I describe the development and evaluation of a machine learning approach performing this task. This chapter also presents a new annotated dataset supporting this activity. In the thesis, I present two implementations of my approach to sentence simplification. One of these exploits handcrafted rule activation patterns to detect different parts of input sentences which are relevant to the simplification process. The other implementation uses my machine learning method to identify compound clauses and complex NPs for this purpose. Intrinsic evaluation of the two implementations is presented in Chapter 6 together with a comparison of their performance with several baseline systems. The evaluation includes comparisons of system output with human-produced simplifications, automated estimations of the readability of system output, and surveys of human opinions on the grammaticality, accessibility, and meaning of automatically produced simplifications. Chapter 7 presents extrinsic evaluation of the sentence simplification method exploiting handcrafted rule activation patterns. The extrinsic evaluation involves three NLP tasks: multidocument summarisation, semantic role labelling, and information extraction. Finally, in Chapter 8, conclusions are drawn and directions for future research considered

    Identifying Signs of Syntactic Complexity for Rule-Based Sentence Simplification

    Get PDF
    This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readers’ opinions about the accuracy, accessibility, and meaning of this output

    A Lexicalized Tree Adjoining Grammar for English

    Get PDF
    This document describes a sizable grammar of English written in the TAG formalism and implemented for use with the XTAG system. This report and the grammar described herein supersedes the TAG grammar described in an earlier 1995 XTAG technical report. The English grammar described in this report is based on the TAG formalism which has been extended to include lexicalization, and unification-based feature structures. The range of syntactic phenomena that can be handled is large and includes auxiliaries (including inversion), copula, raising and small clause constructions, topicalization, relative clauses, infinitives, gerunds, passives, adjuncts, it-clefts, wh-clefts, PRO constructions, noun-noun modifications, extraposition, determiner sequences, genitives, negation, noun-verb contractions, sentential adjuncts and imperatives. This technical report corresponds to the XTAG Release 8/31/98. The XTAG grammar is continuously updated with the addition of new analyses and modification of old ones, and an online version of this report can be found at the XTAG web page at http://www.cis.upenn.edu/~xtag/Comment: 310 pages, 181 Postscript figures, uses 11pt, psfig.te

    Classification-based phrase structure grammar: an extended revised version of HPSG

    Get PDF
    This thesis is concerned with a presentation of Classification -based Phrase Structure Grammar (or cPSG), a grammatical theory that has grown out of extensive revisions of, and extensions to, HPSG. The fundamental difference between this theory and HPSG concerns the central role that classification plays in the grammar: the grammar classifies strings, according to their feature structure descriptions, as being of various types. Apart from the role of classification, the theory bears a close resemblance to HPSG, though it is by no means a direct translation, including numerous revisions and extensions. A central goal in the development of the theory has been its computational implementation, which is included in the thesis.The presentation may be divided into four parts. In the first, chapters 1 and 2, we present the grammatical formalism within which the theory is stated. This consists of a development of the notion of a classificatory system (chapter 1), and the incorporation of hierarchality into that notion (chapter 2).The second part concerns syntactic issues. Chapter 3 revises the HPSG treatment of specifiers, complements and adjuncts, incorporating ideas that specifiers and complements should be distinguished and presenting a treatment of adjuncts whereby the head is selected for by the adjunct. Chapter 4 presents several options for an account of unbounded dependencies. The accounts are based loosely on that of GPSG, and a reconstruction of GPSG's Foot Feature Principle is presented which does not involve a notion of default. Chapter 5 discusses coordination, employing an extension of Rounds- Kasper logic to allow a treatment of cross -categorial coordination.In the third part, chapters 6, 7 and 8, we turn to semantic issues. We begin (Chapter 6) with a discussion of Situation Theory, the background semantic theory, attempting to establish a precise and coherent version of the theory within which to work. Chapter 7 presents the bulk of the treatment of semantics, and can be seen as an extensive revision of the HPSG treatment of semantics. The aim is to provide a semantic treatment which is faithful to the version of Situation Theory presented in Chapter 6. Chapter 8 deals with quantification, discussing the nature of quantification in Situation Theory before presenting a treatment of quantification in CPSG. Some residual questions about the semantics of coordinated noun phrases are also addressed in this chapter.The final part, Chapter 9, concerns the actual computational implementation of the theory. A parsing algorithm based on hierarchical classification is presented, along with four strategies that might be adopted given that algorithm. Also discussed are some implementation details. A concluding chapter summarises the arguments of the thesis and outlines some avenues for future research

    Categorial Grammar

    Get PDF
    The paper is a review article comparing a number of approaches to natural language syntax and semantics that have been developed using categorial frameworks. It distinguishes two related but distinct varieties of categorial theory, one related to Natural Deduction systems and the axiomatic calculi of Lambek, and another which involves more specialized combinatory operations

    Syntax und Valenz: Zur Modellierung kohÀrenter und elliptischer Strukturen mit Baumadjunktionsgrammatiken

    Get PDF
    Diese Arbeit untersucht das VerhĂ€ltnis zwischen Syntaxmodell und lexikalischen Valenzeigenschaften anhand der Familie der Baumadjunktionsgrammatiken (TAG) und anhand der PhĂ€nomenbereiche KohĂ€renz und Ellipse. Wie die meisten prominenten Syntaxmodelle betreibt TAG eine Amalgamierung von Syntax und Valenz, die oft zu Realisierungsidealisierungen fĂŒhrt. Es wird jedoch gezeigt, dass TAG dabei gewisse Realisierungsidealisierungen vermeidet und DiskontinuitĂ€t bei KohĂ€renz direkt reprĂ€sentieren kann; dass TAG trotzdem und trotz der im Vergleich zu GB, LFG und HPSG wesentlich eingeschrĂ€nkten AusdrucksstĂ€rke zu einer linguistisch sinnvollen Analyse kohĂ€renter Konstruktionen herangezogen werden kann; dass der TAG-Ableitungsbaum fĂŒr die indirekte Gapping-Modellierung eine ausreichend informative BezugsgrĂ¶ĂŸe darstellt. FĂŒr  die direkte ReprĂ€sentation von Gapping-Strukturen wird schließlich ein baumbasiertes Syntaxmodell, STUG, vorgeschlagen, in dem Syntax und Valenz getrennt, aber verlinkt sind.    German law requires we state the prices in Germany for this publication. The hardcover price is 35.00 EUR; the softcover price is 25.00 EUR

    A Lexicalized Tree Adjoining Grammar for English

    Get PDF
    This document describes a sizable grammar of English written in the TAG formalism and implemented for use with the XTAG system. This report and the grammar described herein supersedes the TAG grammar described in [Abeille` et al., 1990]. The English grammar described in this report is based on the TAG formalism developed [Joshi et al., 1975] which has been extended to include lexicalization ([Schabes et al., 1988]), and unification-based feature structures ([Vijay Shanker and Joshi, 1991]). The grammar discussed in this report extends the grammar presented in [Abeille` et al., in at least two ways. First, this grammar has more detailed linguistic analyses, and second, the grammar presented in this paper is fully implemented. The range of syntactic phenomena that can be handled is large and includes auxiliaries (including inversion), copula, raising and small clause constructions, topicalization, relative clauses, infinitives, gerunds, passives, adjuncts, it-clefts, wh-clefts, PRO contructions, noun-noun modifications, extraposition, determiner phrases, genitives, negation, noun-verb contractions, sentential adjuncts and imperatives. The XTAG grammar has been relatively stable since November 1993, although new analyses are still being added periodically

    Improving Syntactic Parsing of Clinical Text Using Domain Knowledge

    Get PDF
    Syntactic parsing is one of the fundamental tasks of Natural Language Processing (NLP). However, few studies have explored syntactic parsing in the medical domain. This dissertation systematically investigated different methods to improve the performance of syntactic parsing of clinical text, including (1) Constructing two clinical treebanks of discharge summaries and progress notes by developing annotation guidelines that handle missing elements in clinical sentences; (2) Retraining four state-of-the-art parsers, including the Stanford parser, Berkeley parser, Charniak parser, and Bikel parser, using clinical treebanks, and comparing their performance to identify better parsing approaches; and (3) Developing new methods to reduce syntactic ambiguity caused by Prepositional Phrase (PP) attachment and coordination using semantic information. Our evaluation showed that clinical treebanks greatly improved the performance of existing parsers. The Berkeley parser achieved the best F-1 score of 86.39% on the MiPACQ treebank. For PP attachment, our proposed methods improved the accuracies of PP attachment by 2.35% on the MiPACQ corpus and 1.77% on the I2b2 corpus. For coordination, our method achieved a precision of 94.9% and a precision of 90.3% for the MiPACQ and i2b2 corpus, respectively. To further demonstrate the effectiveness of the improved parsing approaches, we applied outputs of our parsers to two external NLP tasks: semantic role labeling and temporal relation extraction. The experimental results showed that performance of both tasks’ was improved by using the parse tree information from our optimized parsers, with an improvement of 3.26% in F-measure for semantic role labelling and an improvement of 1.5% in F-measure for temporal relation extraction

    Training of Crisis Mappers and Map Production from Multi-sensor Data: Vernazza Case Study (Cinque Terre National Park, Italy)

    Get PDF
    This aim of paper is to presents the development of a multidisciplinary project carried out by the cooperation between Politecnico di Torino and ITHACA (Information Technology for Humanitarian Assistance, Cooperation and Action). The goal of the project was the training in geospatial data acquiring and processing for students attending Architecture and Engineering Courses, in order to start up a team of "volunteer mappers". Indeed, the project is aimed to document the environmental and built heritage subject to disaster; the purpose is to improve the capabilities of the actors involved in the activities connected in geospatial data collection, integration and sharing. The proposed area for testing the training activities is the Cinque Terre National Park, registered in the World Heritage List since 1997. The area was affected by flood on the 25th of October 2011. According to other international experiences, the group is expected to be active after emergencies in order to upgrade maps, using data acquired by typical geomatic methods and techniques such as terrestrial and aerial Lidar, close-range and aerial photogrammetry, topographic and GNSS instruments etc.; or by non conventional systems and instruments such us UAV, mobile mapping etc. The ultimate goal is to implement a WebGIS platform to share all the data collected with local authorities and the Civil Protectio
    • 

    corecore