799 research outputs found
Abstractive Multi-Document Summarization via Phrase Selection and Merging
We propose an abstraction-based multi-document summarization framework that
can construct new sentences by exploring more fine-grained syntactic units than
sentences, namely, noun/verb phrases. Different from existing abstraction-based
approaches, our method first constructs a pool of concepts and facts
represented by phrases from the input documents. Then new sentences are
generated by selecting and merging informative phrases to maximize the salience
of phrases and meanwhile satisfy the sentence construction constraints. We
employ integer linear optimization for conducting phrase selection and merging
simultaneously in order to achieve the global optimal solution for a summary.
Experimental results on the benchmark data set TAC 2011 show that our framework
outperforms the state-of-the-art models under automated pyramid evaluation
metric, and achieves reasonably well results on manual linguistic quality
evaluation.Comment: 11 pages, 1 figure, accepted as a full paper at ACL 201
A Theme-Rewriting Approach for Generating Algebra Word Problems
Texts present coherent stories that have a particular theme or overall
setting, for example science fiction or western. In this paper, we present a
text generation method called {\it rewriting} that edits existing
human-authored narratives to change their theme without changing the underlying
story. We apply the approach to math word problems, where it might help
students stay more engaged by quickly transforming all of their homework
assignments to the theme of their favorite movie without changing the math
concepts that are being taught. Our rewriting method uses a two-stage decoding
process, which proposes new words from the target theme and scores the
resulting stories according to a number of factors defining aspects of
syntactic, semantic, and thematic coherence. Experiments demonstrate that the
final stories typically represent the new theme well while still testing the
original math concepts, outperforming a number of baselines. We also release a
new dataset of human-authored rewrites of math word problems in several themes.Comment: To appear EMNLP 201
Entity Summarisation with Limited Edge Budget on Undirected and Directed Knowledge Graphs
The paper concerns a novel problem of summarising entities with limited presentation budget on entity-relationship knowledge graphs and propose an efficient algorithm for solving this problem. The algorithm has been implemented in two variants: undirected and directed, together with a visualisation tool. Experimental user evaluation of the algorithm was conducted on real large semantic knowledge graphs extracted from the web. The reported results of experimental user evaluation are promising and encourage to continue the work on improving the algorithm.
Automatic Summarization
It has now been 50 years since the publication of Luhn’s seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field
Recommended from our members
Columbia University at DUC 2004
We describe our participation in tasks 2, 4 and 5 of the DUC 2004 evaluation. For each task, we present the system (s) used, focusing on novel and newly developed aspects. We also analyze the results of the human and automatic evaluations
Exploring events and distributed representations of text in multi-document summarization
In this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007.info:eu-repo/semantics/submittedVersio
- …