Search CORE

149 research outputs found

Building a Generation Knowledge Source using Internet-Accessible Newswire

Author: McKeown Kathleen R.
Radev Dragomir R.
Publication venue
Publication date: 01/01/1997
Field of study

In this paper, we describe a method for automatic creation of a knowledge source for text generation using information extraction over the Internet. We present a prototype system called PROFILE which uses a client-server architecture to extract noun-phrase descriptions of entities such as people, places, and organizations. The system serves two purposes: as an information extraction tool, it allows users to search for textual descriptions of entities; as a utility to generate functional descriptions (FD), it is used in a functional-unification based generation system. We present an evaluation of the approach and its applications to natural language generation and summarization.Comment: 8 pages, uses eps

arXiv.org e-Print Archive

CiteSeerX

Crossref

Columbia University Academic Commons

Gathering Statistics to Aspectually Classify Sentences with a Genetic Algorithm

Author: McKeown Kathleen R.
Siegel Eric V.
Publication venue
Publication date: 01/01/1996
Field of study

This paper presents a method for large corpus analysis to semantically classify an entire clause. In particular, we use cooccurrence statistics among similar clauses to determine the aspectual class of an input clause. The process examines linguistic features of clauses that are relevant to aspectual classification. A genetic algorithm determines what combinations of linguistic features to use for this task.Comment: postscript, 9 pages, Proceedings of the Second International Conference on New Methods in Language Processing, Oflazer and Somers ed

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

Paraphrasing Using Given and New Information in a Question-Answer System

Author: McKeown Kathleen R
Publication venue: ScholarlyCommons
Publication date: 01/01/1979
Field of study

The design and implementation of a paraphrase component for a natural language question-answer system (CO-OP) is presented. A major point made is the role of given and new information in formulating a paraphrase that differs in a meaningful way from the user\u27s question. A description is also given of the transformational grammar used by the paraphraser to generate questions

Crossref

ScholarlyCommons@Penn

Using the Annotated Bibliography as a Resource for Indicative Summarization

Author: Kan Min-Yen
Klavans Judith L.
McKeown Kathleen R.
Publication venue
Publication date: 01/01/2002
Field of study

We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated bibliographies cover certain aspects of summarization that have not been well-covered by other summary corpora, and motivate why they constitute an important form to study for information retrieval. We detail our methodology for collecting the corpus, and overview our document feature markup that we introduced to facilitate summary analysis. We present the characteristics of the corpus, methods of collection, and show its use in finding the distribution of types of information included in indicative summaries and their relative ordering within the summaries.Comment: 8 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

Coordinating Text and Graphics in Explanation Generation

Author: Kathleen R. Mckeown
Steven K. Feiner
Publication venue
Publication date: 01/01/1989
Field of study

To generate multimedia explanations, a system must be able to coordinate the use of different media in a single explanation. In this paper, we present an architecture that we have developed for COMET (COordinated Multimedia Explanation Testbed), a system that generates directions for equipment maintenance and repair, and we show how it addresses the coordination problem. In particular, we focus on the use of a single content planner that produces a common content description used by multiple media-specific generators, a media coordinator that makes a f'me-grained division of information between media, and bidirectional interaction between media-specific generators to allow influence across media.

CiteSeerX

Crossref

Resources for Evaluation of Summarization Techniques

Author: Kan Min-Yen
Klavans Judith L.
Lee Susan
McKeown Kathleen R.
Publication venue
Publication date: 01/01/1998
Field of study

We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of the corpora, methods used in the collection of user judgments, and an overview of the application of the corpora to evaluating the component system. Finally, we discuss the problems and issues with construction of the test set which apply broadly to the construction of evaluation resources for language technologies.Comment: LaTeX source, 5 pages, US Letter, uses lrec98.st

arXiv.org e-Print Archive

CiteSeerX

Prosody Modelling in Concept-to-Speech Generation: Methodological Issues

Author: Grosz B.
Kathleen R. McKeown
Shimei Pan
Silverman K.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2000
Field of study

We explore three issues for the development of concept-to-speech (CTS) systems. We identify information available in a language-generation system that has the potential to impact prosody; investigate the role played by different corpora in CTS prosody modelling; and explore different methodologies for learning how linguistic features impact prosody. Our major focus is on the comparison of two machine learning methodologies: generalized rule induction and memory-based learning. We describe this work in the context of multimedia abstract generation of intensive care (MAGIC) data, a system that produces multimedia brings of the status of patients who have just undergone a bypass operation

CiteSeerX

Crossref

Columbia University Academic Commons

From Text to Speech Summarization

Author: Galley Michel
Hirschberg Julia Bell
Maskey Sameer R.
McKeown Kathleen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

In this paper, we present approaches used in text summarization, showing how they can be adapted for speech summarization and where they fall short. Informal style and apparent lack of structure in speech mean that the typical approaches used for text summarization must be extended for use with speech. We illustrate how features derived from speech can help determine summary content within two ongoing summarization projects at Columbia University

CiteSeerX

Columbia University Academic Commons

Generating EDU Extracts for Plan-Guided Summary Re-Ranking

Author: Adams Griffin
Elhadad Noémie
Fabbri Alexander R.
Ladhak Faisal
McKeown Kathleen
Publication venue
Publication date: 28/05/2023
Field of study

Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach. Yet, standard decoding methods (i.e., beam search, nucleus sampling, and diverse beam search) produce candidates with redundant, and often low quality, content. In this paper, we design a novel method to generate candidates for re-ranking that addresses these issues. We ground each candidate abstract on its own unique content plan and generate distinct plan-guided abstracts using a model's top beam. More concretely, a standard language model (a BART LM) auto-regressively generates elemental discourse unit (EDU) content plans with an extractive copy mechanism. The top K beams from the content plan generator are then used to guide a separate LM, which produces a single abstractive candidate for each distinct plan. We apply an existing re-ranker (BRIO) to abstractive candidates generated from our method, as well as baseline decoding methods. We show large relevance improvements over previously published methods on widely used single document news article corpora, with ROUGE-2 F1 gains of 0.88, 2.01, and 0.38 on CNN / Dailymail, NYT, and Xsum, respectively. A human evaluation on CNN / DM validates these results. Similarly, on 1k samples from CNN / DM, we show that prompting GPT-3 to follow EDU plans outperforms sampling-based methods by 1.05 ROUGE-2 F1 points. Code to generate and realize plans is available at https://github.com/griff4692/edu-sum.Comment: ACL 202

arXiv.org e-Print Archive