4,874 research outputs found

    Multi-document Summarization Based on Sentence Clustering Improved Using Topic Words

    Full text link
    Informasi dalam bentuk teks berita telah menjadi salah satu komoditas yang paling penting dalam era informasi ini. Ada banyak berita yang dihasilkan sehari-hari, tetapi berita-berita ini sering memberikan konten kontekstual yang sama dengan narasi berbeda. Oleh karena itu, diperlukan metode untuk mengumpulkan informasi ini ke dalam ringkasan sederhana. Di antara sejumlah subtugas yang terlibat dalam peringkasan multi-dokumen termasuk ekstraksi kalimat, deteksi topik, ekstraksi kalimat representatif, dan kalimat rep-resentatif. Dalam tulisan ini, kami mengusulkan metode baru untuk merepresentasikan kalimat ber-dasarkan kata kunci dari topic teks menggunakan Latent Dirichlet Allocation (LDA). Metode ini terdiri dari tiga langkah dasar. Pertama, kami mengelompokkan kalimat di set dokumen menggunakan kesamaan histogram pengelompokan (SHC). Selanjutnya, peringkat cluster menggunakan klaster penting. Terakhir, kalimat perwakilan yang dipilih oleh topik diidentifikasi pada LDA. Metode yang diusulkan diuji pada dataset DUC2004. Hasil penelitian menunjukkan rata-rata 0,3419 dan 0,0766 untuk ROUGE-1 dan ROUGE-2, masing-masing. Selain itu, dari pembaca prespective, metode kami diusulkan menyajikan pengaturan yang koheren dan baik dalam memesan kalimat representatif, sehingga dapat mempermudah pemahaman bacaan dan mengurangi waktu yang dibutuhkan untuk membaca ringkasan

    MULTI-DOCUMENT SUMMARIZATION BASED ON SENTENCE CLUSTERING IMPROVED USING TOPIC WORDS

    Get PDF
    Informasi dalam bentuk teks berita telah menjadi salah satu komoditas yang paling penting dalam era informasi ini. Ada banyak berita yang dihasilkan sehari-hari, tetapi berita-berita ini sering memberikan konten kontekstual yang sama dengan narasi berbeda. Oleh karena itu, diperlukan metode untuk mengumpulkan informasi ini ke dalam ringkasan sederhana. Di antara sejumlah subtugas yang terlibat dalam peringkasan multi-dokumen termasuk ekstraksi kalimat, deteksi topik, ekstraksi kalimat representatif, dan kalimat rep-resentatif. Dalam tulisan ini, kami mengusulkan metode baru untuk merepresentasikan kalimat ber-dasarkan kata kunci dari topic teks menggunakan Latent Dirichlet Allocation (LDA). Metode ini terdiri dari tiga langkah dasar. Pertama, kami mengelompokkan kalimat di set dokumen menggunakan kesamaan histogram pengelompokan (SHC). Selanjutnya, peringkat cluster menggunakan klaster penting. Terakhir, kalimat perwakilan yang dipilih oleh topik diidentifikasi pada LDA. Metode yang diusulkan diuji pada dataset DUC2004. Hasil penelitian menunjukkan rata-rata 0,3419 dan 0,0766 untuk ROUGE-1 dan ROUGE-2, masing-masing. Selain itu, dari pembaca prespective, metode kami diusulkan menyajikan pengaturan yang koheren dan baik dalam memesan kalimat representatif, sehingga dapat mempermudah pemahaman bacaan dan mengurangi waktu yang dibutuhkan untuk membaca ringkasan

    Analisis dan Implementasi Metode Clustering dan Non-Negative Matrix Factorization pada Peringkasan Multi Dokumen

    Get PDF
    ABSTRAKSI: Peringkasan teks (text summarization) adalah proses pengambilan informasi yang paling penting dari satu atau beberapa dokumen teks sehingga dihasilkan sebuah versi yang lebih ringkas dengan menggunakan aplikasi berbasis komputer.Pada Tugas Akhir ini diimplementasikan metode clustering dan nonnegative matrix factorization(NMF) untuk meringkas multi dokumen berita. Ringkasan yang dihasilkan berupa ringkasan ekstraktif yang berisi kalimat-kalimat penting dari dokumen berita. NMF akan menguraikan kalimat menjadi kombinasi semantic feature. Hal ini dapat meningkatkan kualitas hasil ringkasan, karena dengan melihat kesamaan antara semantic feature kalimat dengan topik berita, akan diketahui kalimat yang penting dan benar-benar relevan dengan topik berita. Dalam metode peringkasan ini juga akan dilakukan proses clustering yang bertujuan untuk menghilangkan kalimat-kalimat noise dan meminimalkan redundancy pada ringkasan.Pengevaluasian dilakukan menggunakan ROUGE evaluation toolkit. Hasil pengujian menunjukkan bahwa metode NMF dengan clustering menghasilkan ringkasan yang mempunyai nilai precision lebih tinggi dan redundancy lebih sedikit dibandingkan dengan ringkasan yang dihasilkan metode NMF tanpa clustering.Kata Kunci : peringkasan teks, nonnegative matrix factorization, clustering.ABSTRACT: The text summarization is the process of taking the most important information from a text document or some text documents to create a brief version of that text using a computer-based application.On this final assessment implemented multi document summarization based on clustering and nonnegative matrix factorization. The summarization produce extractive summaries consist of important sentences from the documents. NMF decomposes a sentence into combination of semantic feature. It can improve the quality of document summaries, because by using similarity between semantic feature and document topic, it can produce important sentences which have high relevance with the document topic. In the proposed method there is also clustering process to remove noise sentence and redundancy.Evaluation of the summaries uses ROUGE evaluation toolkit. The test results show that summarization using NMF and clustering can produce summaries with higher precision score than summarization using NMF without clustering. NMF with clustering also produce summaries with fewer redundancy.Keyword: text summarization, nonnegative matrix factorization, clustering

    Improvement of Cluster Importance Algorithm with Sentence Position for News Summarization

    Get PDF
    Text summarization is one of the ways to reduce large document dimension to obtain important information from the document. News is one of information which usually has several sub-topics from a topic. In order to get the main information from a topic as fast as possible, multi-document summarization is the solution, but sometimes it can create redundancy. In this study, we used cluster importance algorithm by considering sentence position to overcome the redundancy. Stages of cluster importance algorithm are sentence clustering, cluster ordering, and selection of sentence representative which will be explained in the subsections below. The contribution of this research was to add the position of sentence in the selection phase of representative sentence. For evaluation, we used 30 topics of Indonesian news tested by using ROUGE-1, there were 2 news topics that had different ROUGE-1 score between using cluster importance algorithm by considering sentence position and using cluster importance. However, those 2 news topics which used cluster importance by considering sentence position have a greater score of Rouge-1 than the one which only used cluster importance. The use of sentence position had an effect on the order of sentence on each topic, but there were only 2 news topics that affected the outcome of the summary

    Improvement of Cluster Importance Algorithm with Sentence Position for News Summarization

    Get PDF
    Text summarization is one of the ways to reduce large document dimension to obtain important information from the document. News is one of information which usually has several sub-topics from a topic. In order to get the main information from a topic as fast as possible, multi-document summarization is the solution, but sometimes it can create redundancy. In this study, we used cluster importance algorithm by considering sentence position to overcome the redundancy. Stages of cluster importance algorithm are sentence clustering, cluster ordering, and selection of sentence representative which will be explained in the subsections below. The contribution of this research was to add the position of sentence in the selection phase of representative sentence. For evaluation, we used 30 topics of Indonesian news tested by using ROUGE-1, there were 2 news topics that had different ROUGE-1 score between using cluster importance algorithm by considering sentence position and using cluster importance. However, those 2 news topics which used cluster importance by considering sentence position have a greater score of Rouge-1 than the one which only used cluster importance. The use of sentence position had an effect on the order of sentence on each topic, but there were only 2 news topics that affected the outcome of the summary

    Generating Aspect-oriented Multi-document Summarization with Event-Aspect Model

    Get PDF
    In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. We first develop an event-aspect LDA model to cluster sentences into aspects. We then use extended LexRank algorithm to rank the sentences in each cluster. We use Integer Linear Programming for sentence selection. Key features of our method include automatic grouping of semantically related sentences and sentence ranking based on extension of random walk model. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We compare our method with four baseline methods. Quantitative evaluation based on Rouge metric demonstrates the effectiveness and advantages of our method.

    LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

    Full text link
    We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents

    Abstract Meaning Representation for Multi-Document Summarization

    Full text link
    Generating an abstract from a collection of documents is a desirable capability for many real-world applications. However, abstractive approaches to multi-document summarization have not been thoroughly investigated. This paper studies the feasibility of using Abstract Meaning Representation (AMR), a semantic representation of natural language grounded in linguistic theory, as a form of content representation. Our approach condenses source documents to a set of summary graphs following the AMR formalism. The summary graphs are then transformed to a set of summary sentences in a surface realization step. The framework is fully data-driven and flexible. Each component can be optimized independently using small-scale, in-domain training data. We perform experiments on benchmark summarization datasets and report promising results. We also describe opportunities and challenges for advancing this line of research.Comment: 13 page
    • ā€¦
    corecore