418 research outputs found
Multi-Document Summarization via Discriminative Summary Reranking
Existing multi-document summarization systems usually rely on a specific
summarization model (i.e., a summarization method with a specific parameter
setting) to extract summaries for different document sets with different
topics. However, according to our quantitative analysis, none of the existing
summarization models can always produce high-quality summaries for different
document sets, and even a summarization model with good overall performance may
produce low-quality summaries for some document sets. On the contrary, a
baseline summarization model may produce high-quality summaries for some
document sets. Based on the above observations, we treat the summaries produced
by different summarization models as candidate summaries, and then explore
discriminative reranking techniques to identify high-quality summaries from the
candidates for difference document sets. We propose to extract a set of
candidate summaries for each document set based on an ILP framework, and then
leverage Ranking SVM for summary reranking. Various useful features have been
developed for the reranking process, including word-level features,
sentence-level features and summary-level features. Evaluation results on the
benchmark DUC datasets validate the efficacy and robustness of our proposed
approach
Multi-Document Summarization using Distributed Bag-of-Words Model
As the number of documents on the web is growing exponentially,
multi-document summarization is becoming more and more important since it can
provide the main ideas in a document set in short time. In this paper, we
present an unsupervised centroid-based document-level reconstruction framework
using distributed bag of words model. Specifically, our approach selects
summary sentences in order to minimize the reconstruction error between the
summary and the documents. We apply sentence selection and beam search, to
further improve the performance of our model. Experimental results on two
different datasets show significant performance gains compared with the
state-of-the-art baselines
A Novel ILP Framework for Summarizing Content with High Lexical Variety
Summarizing content contributed by individuals can be challenging, because
people make different lexical choices even when describing the same events.
However, there remains a significant need to summarize such content. Examples
include the student responses to post-class reflective questions, product
reviews, and news articles published by different news agencies related to the
same events. High lexical diversity of these documents hinders the system's
ability to effectively identify salient content and reduce summary redundancy.
In this paper, we overcome this issue by introducing an integer linear
programming-based summarization framework. It incorporates a low-rank
approximation to the sentence-word co-occurrence matrix to intrinsically group
semantically-similar lexical items. We conduct extensive experiments on
datasets of student responses, product reviews, and news documents. Our
approach compares favorably to a number of extractive baselines as well as a
neural abstractive summarization system. The paper finally sheds light on when
and why the proposed framework is effective at summarizing content with high
lexical variety.Comment: Accepted for publication in the journal of Natural Language
Engineering, 201
Focused multi-document summarization: Human summarization activity vs. automated systems techniques
Focused Multi-Document Summarization (MDS) is concerned with summarizing documents in a collection with a concentration toward a particular external request (i.e. query, question, topic, etc.), or focus. Although the current state-of-the-art provides somewhat decent performance for DUC/TAC-like evaluations (i.e. government and news concerns), other considerations need to be explored. This paper not only briefly explores the state-of-the-art in automatic systems techniques, but also a comparison with human summarization activity
Document Based Clustering For Detecting Events in Microblogging Websites
Social media has a great in?uence in our daily lives. People share their opinions, stories, news, and broadcast events using social media. This results in great amounts of information in social media. It is cumbersome to identify and organize the interesting events with this massive volumes of data, typically browsing, searching, monitoring events becomes more and more challenging. A lot of work has been done in the area of topic detection and tracking (TDT). Most of these methods are based on single-modality (e.g., text, images) information or multi-modality information. In the single-modality analysis, many existing methods adopt visual information (e.g., images and videos) or textual information (e.g., names, time references, locations, title, tags, and description) in isolation to model event data for event detection and tracking. This problem can be resolved by a novel multi-model social event tracking and an evolutionary framework not only effectively capturing the events, but also generates the summary of these events over time. We proposed a novel method works with mmETM, which can effectively model the social documents, which includes the long text along with the images. It learns the similarities between the textual and visual modalities to separate the visual and non-visual representative topics. To incorporate our method to social tracking, we adopted an incremental learning technique represented as mmETM, which gives informative textual and visual topics of event in social media with respect to the time. To validate our work, we used a sample data set and conducted various experiments on it. Both subjective and quantitative assessments show that the proposed mmETM technique performs positively against a few best state-of-the art techniques
- …