1,510 research outputs found
Inferring Strategies for Sentence Ordering in Multidocument News Summarization
The problem of organizing information for multidocument summarization so that
the generated summary is coherent has received relatively little attention.
While sentence ordering for single document summarization can be determined
from the ordering of sentences in the input article, this is not the case for
multidocument summarization where summary sentences may be drawn from different
input articles. In this paper, we propose a methodology for studying the
properties of ordering information in the news genre and describe experiments
done on a corpus of multiple acceptable orderings we developed for the task.
Based on these experiments, we implemented a strategy for ordering information
that combines constraints from chronological order of events and topical
relatedness. Evaluation of our augmented algorithm shows a significant
improvement of the ordering over two baseline strategies
TGSum: Build Tweet Guided Multi-Document Summarization Dataset
The development of summarization research has been significantly hampered by
the costly acquisition of reference summaries. This paper proposes an effective
way to automatically collect large scales of news-related multi-document
summaries with reference to social media's reactions. We utilize two types of
social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to
cluster documents into different topic sets. Also, a tweet with a hyper-link
often highlights certain key points of the corresponding document. We
synthesize a linked document cluster to form a reference summary which can
cover most key points. To this aim, we adopt the ROUGE metrics to measure the
coverage ratio, and develop an Integer Linear Programming solution to discover
the sentence set reaching the upper bound of ROUGE. Since we allow summary
sentences to be selected from both documents and high-quality tweets, the
generated reference summaries could be abstractive. Both informativeness and
readability of the collected summaries are verified by manual judgment. In
addition, we train a Support Vector Regression summarizer on DUC generic
multi-document summarization benchmarks. With the collected data as extra
training resource, the performance of the summarizer improves a lot on all the
test sets. We release this dataset for further research.Comment: 7 pages, 1 figure in AAAI 201
NATSUM: Narrative abstractive summarization through cross-document timeline generation
A new approach to narrative abstractive summarization (NATSUM) is presented in this paper. NATSUM is centered on generating a narrative chronologically ordered summary about a target entity from several news documents related to the same topic. To achieve this, first, our system creates a cross-document timeline where a time point contains all the event mentions that refer to the same event. This timeline is enriched with all the arguments of the events that are extracted from different documents. Secondly, using natural language generation techniques, one sentence for each event is produced using the arguments involved in the event. Specifically, a hybrid surface realization approach is used, based on over-generation and ranking techniques. The evaluation demonstrates that NATSUM performed better than extractive summarization approaches and competitive abstractive baselines, improving the F1-measure at least by 50%, when a real scenario is simulated.This research work has been partially funded by the Ministerio de Economía y Competitividad. España through projects TIN2015-65100-R, TIN2015-65136-C2-2-R, as well as by the project “Analisis de Sentimientos Aplicado a la Prevencion del Suicidio en las Redes Sociales (ASAP)” funded by Ayudas Fundación BBVA a equipos de investigacion cientifica. Moreover, it has been also funded by Generalitat Valenciana through project “SIIA: Tecnologías del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” with grant reference PROMETEU/2018/089
- …