405 research outputs found
Coordination in Tree Adjoining Grammars: Formalization and Implementation
In this paper we show that an account for coordination can be constructed
using the derivation structures in a lexicalized Tree Adjoining Grammar (LTAG).
We present a notion of derivation in LTAGs that preserves the notion of fixed
constituency in the LTAG lexicon while providing the flexibility needed for
coordination phenomena. We also discuss the construction of a practical parser
for LTAGs that can handle coordination including cases of non-constituent
coordination.Comment: 6 pages, 16 Postscript figures, uses colap.sty. To appear in the
proceedings of COLING 199
Sentence Simplification for Text Processing
A thesis submitted in partial fulfilment of the requirement of the University of Wolverhampton for the degree of Doctor of Philosophy.Propositional density and syntactic complexity are two features of sentences which
affect the ability of humans and machines to process them effectively. In this
thesis, I present a new approach to automatic sentence simplification which processes
sentences containing compound clauses and complex noun phrases (NPs)
and converts them into sequences of simple sentences which contain fewer of these
constituents and have reduced per sentence propositional density and syntactic
complexity.
My overall approach is iterative and relies on both machine learning and handcrafted
rules. It implements a small set of sentence transformation schemes, each
of which takes one sentence containing compound clauses or complex NPs and
converts it one or two simplified sentences containing fewer of these constituents
(Chapter 5). The iterative algorithm applies the schemes repeatedly and is able
to simplify sentences which contain arbitrary numbers of compound clauses and
complex NPs. The transformation schemes rely on automatic detection of these
constituents, which may take a variety of forms in input sentences. In the thesis, I
present two new shallow syntactic analysis methods which facilitate the detection
process.
The first of these identifies various explicit signs of syntactic complexity in
input sentences and classifies them according to their specific syntactic linking and bounding functions. I present the annotated resources used to train and
evaluate this sign tagger (Chapter 2) and the machine learning method used to
implement it (Chapter 3). The second syntactic analysis method exploits the sign
tagger and identifies the spans of compound clauses and complex NPs in input
sentences. In Chapter 4 of the thesis, I describe the development and evaluation
of a machine learning approach performing this task. This chapter also presents
a new annotated dataset supporting this activity.
In the thesis, I present two implementations of my approach to sentence simplification.
One of these exploits handcrafted rule activation patterns to detect
different parts of input sentences which are relevant to the simplification process.
The other implementation uses my machine learning method to identify
compound clauses and complex NPs for this purpose.
Intrinsic evaluation of the two implementations is presented in Chapter 6 together
with a comparison of their performance with several baseline systems. The
evaluation includes comparisons of system output with human-produced simplifications,
automated estimations of the readability of system output, and surveys
of human opinions on the grammaticality, accessibility, and meaning of automatically
produced simplifications.
Chapter 7 presents extrinsic evaluation of the sentence simplification method
exploiting handcrafted rule activation patterns. The extrinsic evaluation involves
three NLP tasks: multidocument summarisation, semantic role labelling, and information
extraction. Finally, in Chapter 8, conclusions are drawn and directions
for future research considered
Identifying Signs of Syntactic Complexity for Rule-Based Sentence Simplification
This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readersâ opinions about the accuracy, accessibility, and meaning of this output
A Lexicalized Tree Adjoining Grammar for English
This document describes a sizable grammar of English written in the TAG
formalism and implemented for use with the XTAG system. This report and the
grammar described herein supersedes the TAG grammar described in an earlier
1995 XTAG technical report. The English grammar described in this report is
based on the TAG formalism which has been extended to include lexicalization,
and unification-based feature structures. The range of syntactic phenomena that
can be handled is large and includes auxiliaries (including inversion), copula,
raising and small clause constructions, topicalization, relative clauses,
infinitives, gerunds, passives, adjuncts, it-clefts, wh-clefts, PRO
constructions, noun-noun modifications, extraposition, determiner sequences,
genitives, negation, noun-verb contractions, sentential adjuncts and
imperatives. This technical report corresponds to the XTAG Release 8/31/98. The
XTAG grammar is continuously updated with the addition of new analyses and
modification of old ones, and an online version of this report can be found at
the XTAG web page at http://www.cis.upenn.edu/~xtag/Comment: 310 pages, 181 Postscript figures, uses 11pt, psfig.te
Classification-based phrase structure grammar: an extended revised version of HPSG
This thesis is concerned with a presentation of Classification -based Phrase Structure
Grammar (or cPSG), a grammatical theory that has grown out of extensive revisions
of, and extensions to, HPSG. The fundamental difference between this theory and HPSG
concerns the central role that classification plays in the grammar: the grammar classifies
strings, according to their feature structure descriptions, as being of various types.
Apart from the role of classification, the theory bears a close resemblance to HPSG,
though it is by no means a direct translation, including numerous revisions and extensions.
A central goal in the development of the theory has been its computational
implementation, which is included in the thesis.The presentation may be divided into four parts. In the first, chapters 1 and 2, we
present the grammatical formalism within which the theory is stated. This consists of a
development of the notion of a classificatory system (chapter 1), and the incorporation
of hierarchality into that notion (chapter 2).The second part concerns syntactic issues. Chapter 3 revises the HPSG treatment of
specifiers, complements and adjuncts, incorporating ideas that specifiers and complements
should be distinguished and presenting a treatment of adjuncts whereby the
head is selected for by the adjunct. Chapter 4 presents several options for an account of
unbounded dependencies. The accounts are based loosely on that of GPSG, and a reconstruction
of GPSG's Foot Feature Principle is presented which does not involve a notion
of default. Chapter 5 discusses coordination, employing an extension of Rounds- Kasper
logic to allow a treatment of cross -categorial coordination.In the third part, chapters 6, 7 and 8, we turn to semantic issues. We begin (Chapter 6)
with a discussion of Situation Theory, the background semantic theory, attempting to
establish a precise and coherent version of the theory within which to work. Chapter 7
presents the bulk of the treatment of semantics, and can be seen as an extensive revision
of the HPSG treatment of semantics. The aim is to provide a semantic treatment which
is faithful to the version of Situation Theory presented in Chapter 6. Chapter 8 deals
with quantification, discussing the nature of quantification in Situation Theory before
presenting a treatment of quantification in CPSG. Some residual questions about the
semantics of coordinated noun phrases are also addressed in this chapter.The final part, Chapter 9, concerns the actual computational implementation of the
theory. A parsing algorithm based on hierarchical classification is presented, along with
four strategies that might be adopted given that algorithm. Also discussed are some
implementation details. A concluding chapter summarises the arguments of the thesis
and outlines some avenues for future research
Categorial Grammar
The paper is a review article comparing a number of approaches to natural language syntax and semantics that have been developed using categorial frameworks.
It distinguishes two related but distinct varieties of categorial theory, one related to Natural Deduction systems and the axiomatic calculi of Lambek, and another which involves more specialized combinatory operations
Syntax und Valenz: Zur Modellierung kohÀrenter und elliptischer Strukturen mit Baumadjunktionsgrammatiken
Diese Arbeit untersucht das VerhĂ€ltnis zwischen Syntaxmodell und lexikalischen Valenzeigenschaften anhand der Familie der Baumadjunktionsgrammatiken (TAG) und anhand der PhĂ€nomenbereiche KohĂ€renz und Ellipse. Wie die meisten prominenten Syntaxmodelle betreibt TAG eine Amalgamierung von Syntax und Valenz, die oft zu Realisierungsidealisierungen fĂŒhrt. Es wird jedoch gezeigt,
dass TAG dabei gewisse Realisierungsidealisierungen vermeidet und DiskontinuitÀt bei KohÀrenz direkt reprÀsentieren kann;
dass TAG trotzdem und trotz der im Vergleich zu GB, LFG und HPSG wesentlich eingeschrÀnkten AusdrucksstÀrke zu einer linguistisch sinnvollen Analyse kohÀrenter Konstruktionen herangezogen werden kann;
dass der TAG-Ableitungsbaum fĂŒr die indirekte Gapping-Modellierung eine ausreichend informative BezugsgröĂe darstellt.
FĂŒr die direkte ReprĂ€sentation von Gapping-Strukturen wird schlieĂlich ein baumbasiertes Syntaxmodell, STUG, vorgeschlagen, in dem Syntax und Valenz getrennt, aber verlinkt sind.
German law requires we state the prices in Germany for this publication. The hardcover price is 35.00 EUR; the softcover price is 25.00 EUR
A Lexicalized Tree Adjoining Grammar for English
This document describes a sizable grammar of English written in the TAG formalism and implemented for use with the XTAG system. This report and the grammar described herein supersedes the TAG grammar described in [Abeille` et al., 1990]. The English grammar described in this report is based on the TAG formalism developed [Joshi et al., 1975] which has been extended to include lexicalization ([Schabes et al., 1988]), and unification-based feature structures ([Vijay Shanker and Joshi, 1991]). The grammar discussed in this report extends the grammar presented in [Abeille` et al., in at least two ways. First, this grammar has more detailed linguistic analyses, and second, the grammar presented in this paper is fully implemented. The range of syntactic phenomena that can be handled is large and includes auxiliaries (including inversion), copula, raising and small clause constructions, topicalization, relative clauses, infinitives, gerunds, passives, adjuncts, it-clefts, wh-clefts, PRO contructions, noun-noun modifications, extraposition, determiner phrases, genitives, negation, noun-verb contractions, sentential adjuncts and imperatives. The XTAG grammar has been relatively stable since November 1993, although new analyses are still being added periodically
Improving Syntactic Parsing of Clinical Text Using Domain Knowledge
Syntactic parsing is one of the fundamental tasks of Natural Language Processing (NLP). However, few studies have explored syntactic parsing in the medical domain. This dissertation systematically investigated different methods to improve the performance of syntactic parsing of clinical text, including (1) Constructing two clinical treebanks of discharge summaries and progress notes by developing annotation guidelines that handle missing elements in clinical sentences; (2) Retraining four state-of-the-art parsers, including the Stanford parser, Berkeley parser, Charniak parser, and Bikel parser, using clinical treebanks, and comparing their performance to identify better parsing approaches; and (3) Developing new methods to reduce syntactic ambiguity caused by Prepositional Phrase (PP) attachment and coordination using semantic information.
Our evaluation showed that clinical treebanks greatly improved the performance of existing parsers. The Berkeley parser achieved the best F-1 score of 86.39% on the MiPACQ treebank. For PP attachment, our proposed methods improved the accuracies of PP attachment by 2.35% on the MiPACQ corpus and 1.77% on the I2b2 corpus. For coordination, our method achieved a precision of 94.9% and a precision of 90.3% for the MiPACQ and i2b2 corpus, respectively. To further demonstrate the effectiveness of the improved parsing approaches, we applied outputs of our parsers to two external NLP tasks: semantic role labeling and temporal relation extraction. The experimental results showed that performance of both tasksâ was improved by using the parse tree information from our optimized parsers, with an improvement of 3.26% in F-measure for semantic role labelling and an improvement of 1.5% in F-measure for temporal relation extraction
Training of Crisis Mappers and Map Production from Multi-sensor Data: Vernazza Case Study (Cinque Terre National Park, Italy)
This aim of paper is to presents the development of a multidisciplinary project carried out by the cooperation between Politecnico di Torino and ITHACA (Information Technology for Humanitarian Assistance, Cooperation and Action). The goal of the project was the training in geospatial data acquiring and processing for students attending Architecture and Engineering Courses, in order to start up a team of "volunteer mappers". Indeed, the project is aimed to document the environmental and built heritage subject to disaster; the purpose is to improve the capabilities of the actors involved in the activities connected in geospatial data collection, integration and sharing. The proposed area for testing the training activities is the Cinque Terre National Park, registered in the World Heritage List since 1997. The area was affected by flood on the 25th of October 2011. According to other international experiences, the group is expected to be active after emergencies in order to upgrade maps, using data acquired by typical geomatic methods and techniques such as terrestrial and aerial Lidar, close-range and aerial photogrammetry, topographic and GNSS instruments etc.; or by non conventional systems and instruments such us UAV, mobile mapping etc. The ultimate goal is to implement a WebGIS platform to share all the data collected with local authorities and the Civil Protectio
- âŠ