384 research outputs found
Implicit reference to citations: a study of astronomy
The research in this paper presents results in the automatic classification of pronouns within articles into those which refer to cited research and those which do not. It also discusses the automatic linking of pronouns which do refer to citations to their corresponding citations. The current study focused on the pronoun they as used in papers in Astronomy journals. The paper describes a classifier trained on maximum entropy principles using features defined by the distance to preceding citations and the category of verbs associated to the pronoun under consideration
Discourse relations and conjoined VPs: automated sense recognition
Sense classification of discourse relations is a sub-task of shallow discourse parsing. Discourse relations can occur both across sentences (inter-sentential) and within sentences (intra-sentential), and more than one discourse relation can hold between the same units. Using a newly available corpus of discourse-annotated intra-sentential conjoined verb phrases, we demonstrate a sequential classification system for their multi-label sense classification. We assess the importance of each feature used in the classification, the feature scope, and what is lost in moving from gold standard manual parses to the output of an off-the-shelf parser
Expectations in Incremental Discourse Processing
The way in which discourse features express connections back to the previous
discourse has been described in the literature in terms of adjoining at the
right frontier of discourse structure. But this does not allow for discourse
features that express expectations about what is to come in the subsequent
discourse. After characterizing these expectations and their distribution in
text, we show how an approach that makes use of substitution as well as
adjoining on a suitably defined right frontier, can be used to both process
expectations and constrain discouse processing in general.Comment: 9 pages, uses aclap.sty, psfig.te
Textual Economy through Close Coupling of Syntax and Semantics
We focus on the production of efficient descriptions of objects, actions and
events. We define a type of efficiency, textual economy, that exploits the
hearer's recognition of inferential links to material elsewhere within a
sentence. Textual economy leads to efficient descriptions because the material
that supports such inferences has been included to satisfy independent
communicative goals, and is therefore overloaded in Pollack's sense. We argue
that achieving textual economy imposes strong requirements on the
representation and reasoning used in generating sentences. The representation
must support the generator's simultaneous consideration of syntax and
semantics. Reasoning must enable the generator to assess quickly and reliably
at any stage how the hearer will interpret the current sentence, with its
(incomplete) syntax and semantics. We show that these representational and
reasoning requirements are met in the SPUD system for sentence planning and
realization.Comment: 10 pages, uses QobiTree.te
Anaphora and Discourse Structure
We argue in this paper that many common adverbial phrases generally taken to
signal a discourse relation between syntactically connected units within
discourse structure, instead work anaphorically to contribute relational
meaning, with only indirect dependence on discourse structure. This allows a
simpler discourse structure to provide scaffolding for compositional semantics,
and reveals multiple ways in which the relational meaning conveyed by adverbial
connectives can interact with that associated with discourse structure. We
conclude by sketching out a lexicalised grammar for discourse that facilitates
discourse interpretation as a product of compositional rules, anaphor
resolution and inference.Comment: 45 pages, 17 figures. Revised resubmission to Computational
Linguistic
Anchoring a Lexicalized Tree-Adjoining Grammar for Discourse
We here explore a ``fully'' lexicalized Tree-Adjoining Grammar for discourse
that takes the basic elements of a (monologic) discourse to be not simply
clauses, but larger structures that are anchored on variously realized
discourse cues. This link with intra-sentential grammar suggests an account for
different patterns of discourse cues, while the different structures and
operations suggest three separate sources for elements of discourse meaning:
(1) a compositional semantics tied to the basic trees and operations; (2) a
presuppositional semantics carried by cue phrases that freely adjoin to trees;
and (3) general inference, that draws additional, defeasible conclusions that
flesh out what is conveyed compositionally.Comment: 7 pages, uses aclcol.st
Structured and Unstructured Cache Models for SMT Domain Adaptation
We present a French to English translation system for Wikipedia biography articles. We use training data from out- of-domain corpora and adapt the system for biographies. We propose two forms of domain adaptation. The first biases the system towards words likely in biographies and encourages repetition of words across the document. Since biographies in Wikipedia follow a regular structure, our second model exploits this structure as a sequence of topic segments, where each segment discusses a narrower subtopic of the biography domain. In this structured model, the system is encouraged to use words likely in the current segment’s topic rather than in biographies as a whole. We implement both systems using cache based translation techniques. We show that a system trained on Europarl and news can be adapted for biographies with 0.5 BLEU score improvement using our models. Further the structure-aware model out performs the system which treats the entire document as a single segment
- …
