    Timestamped Graph Model Pada Peringkas Otomatis Untuk Multidokumen Berita

    ABSTRAKSI: Peringkasan teks otomatis (automatic text summarization) adalah proses mengambil informasi yang paling penting dari sebuah teks dokumen atau beberapa teks dokumen sehingga dihasilkan versi yang lebih singkat yang digunakan oleh user (users) untuk task tertentu (tasks) dengan memanfaatkan aplikasi berbasis komputer. Pada Tugas Akhir ini diimplementasikan suatu teknik Automatic Text summarization berdasarkan pendekatan graf yaitu Timestamped Graph Model untuk multi dokumen berita. Aplikasi ini dilakukan terhadap multidokumen berita berbahasa Indonesia atau Inggris yang memiliki topik yang sama. Metode ini menerapkan konsep Topic-sensitive PageRank untuk menghasilkan skor tiap kalimat pada seluruh dokumen sehingga didapatkan urutan rangking seluruh kalimat. Beberapa kalimat yang memiliki rangking tertinggi akan diekstraksi menjadi kalimat ringkasan sesuai dengan parameter compression rate yang diinginkan user. Perhitungan skor kalimat dilakukan berdasarkan similarity antar kalimat dengan user query dan Timestamped Graph, yaitu graf berarah yang dibangun secara bertahap dengan simpulnya sebagai kalimat dan sisinya sebagai similarity antar kalimat. Metode similarity yang digunakan yaitu cosine-based similarity. Proses ordering dan reranking dilakukan agar kalimat-kalimat hasil ringkasan dapat terurut dan tidak ada redundancy data atau informasi yang berulang.Pengujian dilakukan dengan menggunakan ROUGE evaluation toolkit dengan membandingkan hasil aplikasi ini dengan summarizer lainnya yaitu MEAD. Hasil pengujian menunjukkan bahwa metode Timestamped Graph Model yang diaplikasikan pada tugas akhir ini sudah memiliki akurasi yang cukup baik.Kata Kunci : peringkas teks otomatis, Timestamped Graph Model, topic-senstive PageRank, user query, cosine-based similarity, ordering, reranking.ABSTRACT: The automatic text summarization is the process of taking the most important information from a text or some text to create a brief version of that text to fulfill user\u27s need or any tasks required.using a computer-based application. On this final assessment the Automatic Text summarization technique based on Graph approach which is Timestamped Graph Model for multi-document of news is applied. As the input for this application are Indonesia and English news documents which have same topic. This method applies the concept of Topic-Sensitive Pagerank to yield the score of every sentence to get sequence ranking of all sentence. Some sentences which have highest ranking will be extracted to be a summary as according to parameter of compression rate from user. The calculation of sentence score is based on similarity sentences with query and Timestamped Graph which is directed graph builded step by step with its node as sentence and its edge as similarity between sentences. Similarity method that is used is cosine-based similarity. Ordering and reranking process are apllied to produce a summary which has chronological in meaning yet and there are no redundancy data or repeated informationEvaluation of the summaries uses ROUGE evaluation toolkit comparing the result from another summarizer, MEAD. Result of examination indicate that the method of Timestamped Graph Model application in this final assessment have owned the good enough accuration.Keyword: automatic text summarization, Timestamped Graph Model, topic-senstive PageRank, cosine-based similarity, ordering, reranking

    Peringkas Otomatis Berbasis Perangkingan Graf untuk Multi Dokumen Berita

    ABSTRAKSI: Peringkasan teks otomatis (automatic text summarization) adalah proses mengambil informasi yang paling penting dari sebuah teks atau beberapa teks untuk membuat sebuah versi ringkas dari teks dengan menggunakan aplikasi berbasis komputer Pada Tugas Akhir ini diimplementasikan suatu teknik Automatic Text summarization berdasarkan pendekatan Graf untuk multi dokumen berita. Proses ini menghasilkan keluaran berupa ringkasan ekstraktif yang terdiri dari kalimat-kalimat. Metode perankingan berbasis graf yang diterapkan adalah LexRank[6] yang akan merangking kalimat dari multi dokumen yang digunakan, dengan menghitung nilai centrality suatu kalimat, rangking kalimat tertinggi kemudian akan di ekstrak menjadi ringkasan. Adapun metode centrality yang digunakan pada tugas akhir ini adalah metode degree centrality yang merupakan adaptasi langsung dari LexRank dan metode modified LexRank centrality, yang merupakan modifikasi LexRank dengan ide dasar PageRank. Perhitungan Kedua metode tersebut ditentukan oleh perhitungan similarity antar kalimat, metode similarity yang digunakan pada tugas akhir ini, yaitu metode idf modified cosine similarity, yang merupakan bawaan langsung dari LexRank, dan metode Long Common Subsequences similarity yang merupakan modifikasi untuk melihat similarity antar kalimat berdasarkan makna. Selanjutnya hasil ringkasan yang didapat belum terurut untuk itu di perlukan proses ordering. Ordering yang akan digunakan adalah metode chronological ordering. Permasalahan suatu peringkasan multi dokumen adalah terdapatnya redundancy data atau informasi yang berulang untuk menghilangkannya di lakukan proses Reranker. Pengevaluasian dilakukan menggunakan ROUGE evaluation toolkit. ā€œHasil pengujian menunjukkan metode degree centrality dengan idf modified cosine similarity memiliki akurasi lebih baik dibandingkan ke tiga metode lainnya yang diimplementasikan pada Tugas Akhir iniā€ .Kata Kunci : peringkasan teks, LexRank, Degree centrality, Modified LexRank centrality, idf modified cosine similarity, Long common subsequences similarity, Reranker, chronological ordering.ABSTRACT: The automatic text summarization is the process of taking the most important information from a text or some text to create a brief version of that text using a computer-based application. On this final assessment the Automatic Text summarization technique based on Graph approach for multi-document news is implemented. This process produces an output in the form of extractive summary which is consist of sentences. The ranking grades method based on graph which is applied is LexRank, which will arrange grades in a rank of a sentence from multi documents that are used by calculate the centrality value based on the concept of Similarity, the highest ranking then will be extracted to be a summary. The centrality methods that are used on this final assessment are degree centrality which is a direct adaptation from LexRank and modified LexRank centrality method which is a direct modification of LexRank with PageRank as the basic idea. The calculation on both methods are by the similarity calculation between sentences. Similarity methods that are used on this final assessment are idf modified cosine similarity method which is also a direct adaptation from LexRank, and Long Common Subsequences similarity method that is a modification to observe similarity between sentences based on sentences meaning. The summary results which is extracted are not chronological in meaning yet, because of that it needs the ordering process. Ordering will be used is a method of chronological ordering. The problem of a multi documents summarization there is a redundancy of data or repeated information to removed it have to do a Reranker process. The evaluation has to do by ROUGE evaluation toolkit. "The test results show method of degree centrality with idf modified cosine similarity has the best accuracy from the other method,that have been implemented on this last task.".Keyword: automatic text summarization, LexRank, Degree centrality, Modified LexRank centrality, idf modified cosine similarity, Long common subsequences similarity, Reranker, chronological ordering

    Machine translation evaluation resources and methods: a survey

    We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content

    Evaluation of Automatic Video Captioning Using Direct Assessment

    We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for any given video clip there is no definitive ground truth or correct answer against which to measure. Automatic metrics for comparing automatic video captions against a manual caption such as BLEU and METEOR, drawn from techniques used in evaluating machine translation, were used in the TRECVid video captioning task in 2016 but these are shown to have weaknesses. The work presented here brings human assessment into the evaluation by crowdsourcing how well a caption describes a video. We automatically degrade the quality of some sample captions which are assessed manually and from this we are able to rate the quality of the human assessors, a factor we take into account in the evaluation. Using data from the TRECVid video-to-text task in 2016, we show how our direct assessment method is replicable and robust and should scale to where there many caption-generation techniques to be evaluated.Comment: 26 pages, 8 figure

    Graph-based Neural Multi-Document Summarization

    We propose a neural multi-document summarization (MDS) system that incorporates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence features for salience estimation. We then use a greedy heuristic to extract salient sentences while avoiding redundancy. In our experiments on DUC 2004, we consider three types of sentence relation graphs and demonstrate the advantage of combining sentence relations in graphs with the representation power of deep neural networks. Our model improves upon traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multi-document summarization systems.Comment: In CoNLL 201
