70 research outputs found

    Detecting (Un)Important Content for Single-Document News Summarization

    Full text link
    We present a robust approach for detecting intrinsic sentence importance in news, by training on two corpora of document-summary pairs. When used for single-document summarization, our approach, combined with the "beginning of document" heuristic, outperforms a state-of-the-art summarizer and the beginning-of-article baseline in both automatic and manual evaluations. These results represent an important advance because in the absence of cross-document repetition, single document summarizers for news have not been able to consistently outperform the strong beginning-of-article baseline.Comment: Accepted By EACL 201

    Assessing the Quality of Automatic Summarization for Peer Review in Education

    Get PDF
    ABSTRACT Technology supported peer review has drawn many interests from educators and researchers. It encourages active learning, provides timely feedback to students and multiple perspectives on their work. Currently, online peer review systems allow a student's work to be reviewed by a handful of their peers. While this is quite a good way to obtain a high degree of confidence, reading a large amount of feedback could be overwhelming. Our observation shows that the students even ignore some feedback when it gets too large. In this work, we try to automatically summarize the feedback by extracting the similar content that is mentioned by the reviewers, which would capture the strength and weaknesses of the work. We evaluate different auto summarization algorithms and length of the summary with educational peer review dataset, which was rated by a human. In general, the students found that medium-size generated summaries (5-10 sentences) encapsulate the context of the reviews, are able to convey the intent of the reviews, and help them to judge the quality of the work

    Peringkasan Otomatis dengan Ekstraksi Informasi untuk Dokumen Berita Ter-cluster

    Full text link
    Keterbukaan dan kemudahan mengakses informasi membuat jumlah informasi menjadi sangat banyak. Banyaknya informasi untuk satu hal yang sama menimbulkan information overload. Masalah tersebut muncul dalam berbagai bidang seperti berita, dokumen karya ilmiah dan media sosial. Dibutuhkan sistem yang mampu membantu pengguna untuk menghasilkan berita yang lengkap dengan cara membangun sistem peringkasan otomatis. Pada penelitian ini diajukan membentuk serangkayan standar dalam tahapan peringkasan berita dengan konfirgurasi dinamis pada masing-masing tugas (clustering, ekstraksi informasi dan peringkasan). Dengan membangun sistem peringkasan dari mulai proses clustering, ekstraksi informasi dan peringkasan diharapkan menghasilkan hasil ringkasan yang utuh, lengkap dan memiliki tingkat keterbacaan tinggi

    Automatic Multiple Document Text Summarization using Wordnet and Agility Tool

    Get PDF
    The number of web pages on the World Wide Web is increasing very rapidly. Consequently, search engines like Google, AltaVista, Bing etc. provides a long list of URLs to the end user. So, it becomes very difficult to review and analyze each web page manually. That2019;s why automatic text sumarization is used to summarize the source text into its shorter version by preserving its information content and overall meaning. This paper proposes an automatic multiple documents text summarization technique called AMDTSWA, which allows the end user to select multiple URLs to generate their summarized results in parallel. AMDTSWA makes the use of concept based segmentation, HTML DOM tree and concept blocks formation. Similarities of contents are determined by calculating the sentence score and useful information is extracted for generating a comparative summary. The proposed approach is implemented by using ASP.Net and gives good results

    Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering

    Get PDF
    In this paper, we present Latent Drichlet Allocation in automatic text summarization to improve accuracy in document clustering. The experiments involving 398 data set from public blog article obtained by using python scrapy crawler and scraper. Several steps of clustering in this research are preprocessing, automatic document compression using feature method, automatic document compression using LDA, word weighting and clustering algorithm The results show that automatic document summarization with LDA reaches 72% in LDA 40%, compared to traditional k-means method which only reaches 66%

    Utilizing microblogs for improving automatic news high-lights extraction

    Get PDF

    Extractive multi document summarization using harmony search algorithm

    Get PDF
    The exponential growth of information on the internet makes it troublesome for users to get valuable information. Text summarization is the process to overcome such a problem. An adequate summary must have wide coverage, high diversity, and high readability. In this article, a new method for multi-document summarization has been supposed based on a harmony search algorithm that optimizes the coverage, diversity, and readability. Concerning the benchmark dataset Text Analysis Conference (TAC-2011), the ROUGE package used to measure the effectiveness of the proposed model. The calculated results support the effectiveness of the proposed approach
    • …