247 research outputs found

    Graph-based Neural Multi-Document Summarization

    Full text link
    We propose a neural multi-document summarization (MDS) system that incorporates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence features for salience estimation. We then use a greedy heuristic to extract salient sentences while avoiding redundancy. In our experiments on DUC 2004, we consider three types of sentence relation graphs and demonstrate the advantage of combining sentence relations in graphs with the representation power of deep neural networks. Our model improves upon traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multi-document summarization systems.Comment: In CoNLL 201

    Myanmar news summarization using different word representations

    Get PDF
    There is enormous amount information available in different forms of sources and genres. In order to extract useful information from a massive amount of data, automatic mechanism is required. The text summarization systems assist with content reduction keeping the important information and filtering the non-important parts of the text. Good document representation is really important in text summarization to get relevant information. Bag-of-words cannot give word similarity on syntactic and semantic relationship. Word embedding can give good document representation to capture and encode the semantic relation between words. Therefore, centroid based on word embedding representation is employed in this paper. Myanmar news summarization based on different word embedding is proposed. In this paper, Myanmar local and international news are summarized using centroid-based word embedding summarizer using the effectiveness of word representation approach, word embedding. Experiments were done on Myanmar local and international news dataset using different word embedding models and the results are compared with performance of bag-of-words summarization. Centroid summarization using word embedding performs comprehensively better than centroid summarization using bag-of-words

    Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews

    Full text link
    Manually extracting relevant aspects and opinions from large volumes of user-generated text is a time-consuming process. Summaries, on the other hand, help readers with limited time budgets to quickly consume the key ideas from the data. State-of-the-art approaches for multi-document summarization, however, do not consider user preferences while generating summaries. In this work, we argue the need and propose a solution for generating personalized aspect-based opinion summaries from large collections of online tourist reviews. We let our readers decide and control several attributes of the summary such as the length and specific aspects of interest among others. Specifically, we take an unsupervised approach to extract coherent aspects from tourist reviews posted on TripAdvisor. We then propose an Integer Linear Programming (ILP) based extractive technique to select an informative subset of opinions around the identified aspects while respecting the user-specified values for various control parameters. Finally, we evaluate and compare our summaries using crowdsourcing and ROUGE-based metrics and obtain competitive results.Comment: 4 pages, accepted in the Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 202

    Using Embeddings to Improve Text Segmentation

    Get PDF
    Tekstipõhised andmestikud on tihti struktuuritud lausete kogumid ning seega raskesti kasutatavad paljude eesmärkide täitmiseks. Tekstis struktuuri loomine teemade või mõtete kaupa aitab teksti kokkuvõtmisel, tehisnärvivõrkudega masintõlkel ning teistel rakendustel, kus üksik lause võib pakkuda liiga vähe konteksti. Teksti segmenteerimiseks loodud meetodid on olnud kas juhendamata ning põhinevad sõnade koosesinemise vaatlemisel või juhendatud ning põhinevad sõnade ja lausete vektoresitustel. Selle magistritöö eesmärgiks on üldise teksti segmenteerimise meetodi arendamine, mis kasutab sõna-vektoreid ning koosinuskaugust. Loodud meetodi implementatsioone võrreldakse naiivse tõenäosusliku lahendusega, et hinnata loodud lahenduse efektiivsust. Ühte mudelit kasutati ka osana teksti kokkuvõtmise algoritmi osana, et hinnata lähenemise praktilist kasu. Tulemuste põhjal võib öelda, et kuigi loodud lahendus töötab paremini kui võrdlusalus, edasise uurimistööga on võimalik lähenemise võimekust märkimisväärselt tõsta.Textual data is often an unstructured collection of sentences and thus difficult to use for many purposes. Creating structure in the text according to topics or concepts can aid in text summarization, neural machine translation and other fields where a single sentence can provide too little context. There have been methods of text segmentation that are either unsupervised and based on word occurrences or supervised and based on vector representations of words and sentences. The purpose of this Master’s Thesis is to develop a general unsupervised method of text segmentation using word vector. The created ap-proach is implemented and compared to a naïve baseline to assess the viability of this method. An implemented model is used as part of extractive text summarization to as-sess the benefit of the proposed approach. The results show that while the approach out-performs the baseline, further research can greatly improve its efficacy

    Summarizing Dialogic Arguments from Social Media

    Full text link
    Online argumentative dialog is a rich source of information on popular beliefs and opinions that could be useful to companies as well as governmental or public policy agencies. Compact, easy to read, summaries of these dialogues would thus be highly valuable. A priori, it is not even clear what form such a summary should take. Previous work on summarization has primarily focused on summarizing written texts, where the notion of an abstract of the text is well defined. We collect gold standard training data consisting of five human summaries for each of 161 dialogues on the topics of Gay Marriage, Gun Control and Abortion. We present several different computational models aimed at identifying segments of the dialogues whose content should be used for the summary, using linguistic features and Word2vec features with both SVMs and Bidirectional LSTMs. We show that we can identify the most important arguments by using the dialog context with a best F-measure of 0.74 for gun control, 0.71 for gay marriage, and 0.67 for abortion.Comment: Proceedings of the 21th Workshop on the Semantics and Pragmatics of Dialogue (SemDial 2017
    corecore