418 research outputs found

    Multi-Document Summarization via Discriminative Summary Reranking

    Full text link
    Existing multi-document summarization systems usually rely on a specific summarization model (i.e., a summarization method with a specific parameter setting) to extract summaries for different document sets with different topics. However, according to our quantitative analysis, none of the existing summarization models can always produce high-quality summaries for different document sets, and even a summarization model with good overall performance may produce low-quality summaries for some document sets. On the contrary, a baseline summarization model may produce high-quality summaries for some document sets. Based on the above observations, we treat the summaries produced by different summarization models as candidate summaries, and then explore discriminative reranking techniques to identify high-quality summaries from the candidates for difference document sets. We propose to extract a set of candidate summaries for each document set based on an ILP framework, and then leverage Ranking SVM for summary reranking. Various useful features have been developed for the reranking process, including word-level features, sentence-level features and summary-level features. Evaluation results on the benchmark DUC datasets validate the efficacy and robustness of our proposed approach

    Multi-Document Summarization using Distributed Bag-of-Words Model

    Full text link
    As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since it can provide the main ideas in a document set in short time. In this paper, we present an unsupervised centroid-based document-level reconstruction framework using distributed bag of words model. Specifically, our approach selects summary sentences in order to minimize the reconstruction error between the summary and the documents. We apply sentence selection and beam search, to further improve the performance of our model. Experimental results on two different datasets show significant performance gains compared with the state-of-the-art baselines

    A Novel ILP Framework for Summarizing Content with High Lexical Variety

    Full text link
    Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system's ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word co-occurrence matrix to intrinsically group semantically-similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.Comment: Accepted for publication in the journal of Natural Language Engineering, 201

    Focused multi-document summarization: Human summarization activity vs. automated systems techniques

    Get PDF
    Focused Multi-Document Summarization (MDS) is concerned with summarizing documents in a collection with a concentration toward a particular external request (i.e. query, question, topic, etc.), or focus. Although the current state-of-the-art provides somewhat decent performance for DUC/TAC-like evaluations (i.e. government and news concerns), other considerations need to be explored. This paper not only briefly explores the state-of-the-art in automatic systems techniques, but also a comparison with human summarization activity

    Document Based Clustering For Detecting Events in Microblogging Websites

    Get PDF
    Social media has a great in?uence in our daily lives. People share their opinions, stories, news, and broadcast events using social media. This results in great amounts of information in social media. It is cumbersome to identify and organize the interesting events with this massive volumes of data, typically browsing, searching, monitoring events becomes more and more challenging. A lot of work has been done in the area of topic detection and tracking (TDT). Most of these methods are based on single-modality (e.g., text, images) information or multi-modality information. In the single-modality analysis, many existing methods adopt visual information (e.g., images and videos) or textual information (e.g., names, time references, locations, title, tags, and description) in isolation to model event data for event detection and tracking. This problem can be resolved by a novel multi-model social event tracking and an evolutionary framework not only effectively capturing the events, but also generates the summary of these events over time. We proposed a novel method works with mmETM, which can effectively model the social documents, which includes the long text along with the images. It learns the similarities between the textual and visual modalities to separate the visual and non-visual representative topics. To incorporate our method to social tracking, we adopted an incremental learning technique represented as mmETM, which gives informative textual and visual topics of event in social media with respect to the time. To validate our work, we used a sample data set and conducted various experiments on it. Both subjective and quantitative assessments show that the proposed mmETM technique performs positively against a few best state-of-the art techniques
    • …
    corecore