Search CORE

1,305 research outputs found

Extractive Multi-document Summarization Using Multilayer Networks

Author: Amancio Diego R.
Tohalino Jorge V.
Publication venue: 'Elsevier BV'
Publication date: 07/11/2017
Field of study

Huge volumes of textual information has been produced every single day. In order to organize and understand such large datasets, in recent years, summarization techniques have become popular. These techniques aims at finding relevant, concise and non-redundant content from such a big data. While network methods have been adopted to model texts in some scenarios, a systematic evaluation of multilayer network models in the multi-document summarization task has been limited to a few studies. Here, we evaluate the performance of a multilayer-based method to select the most relevant sentences in the context of an extractive multi document summarization (MDS) task. In the adopted model, nodes represent sentences and edges are created based on the number of shared words between sentences. Differently from previous studies in multi-document summarization, we make a distinction between edges linking sentences from different documents (inter-layer) and those connecting sentences from the same document (intra-layer). As a proof of principle, our results reveal that such a discrimination between intra- and inter-layer in a multilayered representation is able to improve the quality of the generated summaries. This piece of information could be used to improve current statistical methods and related textual models

arXiv.org e-Print Archive

PerSum: Novel Systems for Document Summarization in Persian

Author: Boroumand Fahimeh
Lahiri Shibamouli
Parvandeh Saeid
Publication venue
Publication date: 09/06/2016
Field of study

In this paper we explore the problem of document summarization in Persian language from two distinct angles. In our first approach, we modify a popular and widely cited Persian document summarization framework to see how it works on a realistic corpus of news articles. Human evaluation on generated summaries shows that graph-based methods perform better than the modified systems. We carry this intuition forward in our second approach, and probe deeper into the nature of graph-based systems by designing several summarizers based on centrality measures. Ad hoc evaluation using ROUGE score on these summarizers suggests that there is a small class of centrality measures that perform better than three strong unsupervised baselines.Comment: 42 pages, 9 figure

arXiv.org e-Print Archive

Multi-layered graph-based multi-document summarization model

Author: Canhasi Ercan
Publication venue
Publication date: 17/05/2014
Field of study

Multi-document summarization is a process of automatic generation of a compressed version of the given collection of documents. Recently, the graph-based models and ranking algorithms have been actively investigated by the extractive document summarization community. While most work to date focuses on homogeneous connecteness of sentences and heterogeneous connecteness of documents and sentences (e.g. sentence similarity weighted by document importance), in this paper we present a novel 3-layered graph model that emphasizes not only sentence and document level relations but also the influence of under sentence level relations (e.g. a part of sentence similarity)

arXiv.org e-Print Archive

Extractive Multi Document Summarization using Dynamical Measurements of Complex Networks

Author: Amancio Diego R.
Tohalino Jorge V.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/08/2017
Field of study

Due to the large amount of textual information available on Internet, it is of paramount relevance to use techniques that find relevant and concise content. A typical task devoted to the identification of informative sentences in documents is the so called extractive document summarization task. In this paper, we use complex network concepts to devise an extractive Multi Document Summarization (MDS) method, which extracts the most central sentences from several textual sources. In the proposed model, texts are represented as networks, where nodes represent sentences and the edges are established based on the number of shared words. Differently from previous works, the identification of relevant terms is guided by the characterization of nodes via dynamical measurements of complex networks, including symmetry, accessibility and absorption time. The evaluation of the proposed system revealed that excellent results were obtained with particular dynamical measurements, including those based on the exploration of networks via random walks.Comment: Accepted for publication in BRACIS 2017 (Brazilian Conference on Intelligent Systems

arXiv.org e-Print Archive

Abstractive Summarization Using Attentive Neural Techniques

Author: Kalita Jugal
Krantz Jacob
Publication venue
Publication date: 20/10/2018
Field of study

In a world of proliferating data, the ability to rapidly summarize text is growing in importance. Automatic summarization of text can be thought of as a sequence to sequence problem. Another area of natural language processing that solves a sequence to sequence problem is machine translation, which is rapidly evolving due to the development of attention-based encoder-decoder networks. This work applies these modern techniques to abstractive summarization. We perform analysis on various attention mechanisms for summarization with the goal of developing an approach and architecture aimed at improving the state of the art. In particular, we modify and optimize a translation model with self-attention for generating abstractive sentence summaries. The effectiveness of this base model along with attention variants is compared and analyzed in the context of standardized evaluation sets and test metrics. However, we show that these metrics are limited in their ability to effectively score abstractive summaries, and propose a new approach based on the intuition that an abstractive model requires an abstractive evaluation.Comment: Accepted for oral presentation at the 15th International Conference on Natural Language Processing (ICON 2018

arXiv.org e-Print Archive

Query-Focused Opinion Summarization for User-Generated Content

Author: Cardie Claire
Castelli Vittorio
Raghavan Hema
Wang Lu
Publication venue
Publication date: 17/06/2016
Field of study

We present a submodular function-based framework for query-focused opinion summarization. Within our framework, relevance ordering produced by a statistical ranker, and information coverage with respect to topic distribution and diverse viewpoints are both encoded as submodular functions. Dispersion functions are utilized to minimize the redundancy. We are the first to evaluate different metrics of text similarity for submodularity-based summarization methods. By experimenting on community QA and blog summarization, we show that our system outperforms state-of-the-art approaches in both automatic evaluation and human evaluation. A human evaluation task is conducted on Amazon Mechanical Turk with scale, and shows that our systems are able to generate summaries of high overall quality and information diversity.Comment: COLING 201

arXiv.org e-Print Archive

MUDOS-NG: Multi-document Summaries Using N-gram Graphs (Tech Report)

Author: Giannakopoulos George
Karkaletsis Vangelis
Vouros George
Publication venue
Publication date: 01/01/2010
Field of study

This report describes the MUDOS-NG summarization system, which applies a set of language-independent and generic methods for generating extractive summaries. The proposed methods are mostly combinations of simple operators on a generic character n-gram graph representation of texts. This work defines the set of used operators upon n-gram graphs and proposes using these operators within the multi-document summarization process in such subtasks as document analysis, salient sentence selection, query expansion and redundancy control. Furthermore, a novel chunking methodology is used, together with a novel way to assign concepts to sentences for query expansion. The experimental results of the summarization system, performed upon widely used corpora from the Document Understanding and the Text Analysis Conferences, are promising and provide evidence for the potential of the generic methods introduced. This work aims to designate core methods exploiting the n-gram graph representation, providing the basis for more advanced summarization systems.Comment: Technical Repor

arXiv.org e-Print Archive

CiteSeerX

Large-Margin Learning of Submodular Summarization Methods

Author: Joachims Thorsten
Shivaswamy Pannaga
Sipos Ruben
Publication venue
Publication date: 13/10/2011
Field of study

In this paper, we present a supervised learning approach to training submodular scoring functions for extractive multi-document summarization. By taking a structured predicition approach, we provide a large-margin method that directly optimizes a convex relaxation of the desired performance measure. The learning method applies to all submodular summarization methods, and we demonstrate its effectiveness for both pairwise as well as coverage-based scoring functions on multiple datasets. Compared to state-of-the-art functions that were tuned manually, our method significantly improves performance and enables high-fidelity models with numbers of parameters well beyond what could reasonbly be tuned by hand.Comment: update: improved formatting (figure placement) and algorithm pseudocode clarity (Fig. 3

arXiv.org e-Print Archive

Diversity in Machine Learning

Author: Gong Zhiqiang
Hu Weidong
Zhong Ping
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/05/2019
Field of study

Machine learning methods have achieved good performance and been widely applied in various real-world applications. They can learn the model adaptively and be better fit for special requirements of different tasks. Generally, a good machine learning system is composed of plentiful training data, a good model training process, and an accurate inference. Many factors can affect the performance of the machine learning process, among which the diversity of the machine learning process is an important one. The diversity can help each procedure to guarantee a total good machine learning: diversity of the training data ensures that the training data can provide more discriminative information for the model, diversity of the learned model (diversity in parameters of each model or diversity among different base models) makes each parameter/model capture unique or complement information and the diversity in inference can provide multiple choices each of which corresponds to a specific plausible local optimal result. Even though the diversity plays an important role in machine learning process, there is no systematical analysis of the diversification in machine learning system. In this paper, we systematically summarize the methods to make data diversification, model diversification, and inference diversification in the machine learning process, respectively. In addition, the typical applications where the diversity technology improved the machine learning performance have been surveyed, including the remote sensing imaging tasks, machine translation, camera relocalization, image segmentation, object detection, topic modeling, and others. Finally, we discuss some challenges of the diversity technology in machine learning and point out some directions in future work.Comment: Accepted by IEEE Acces

arXiv.org e-Print Archive

Scientific Article Summarization Using Citation-Context and Article's Discourse Structure

Author: Cohan Arman
Goharian Nazli
Publication venue
Publication date: 21/04/2017
Field of study

We propose a summarization approach for scientific articles which takes advantage of citation-context and the document discourse model. While citations have been previously used in generating scientific summaries, they lack the related context from the referenced article and therefore do not accurately reflect the article's content. Our method overcomes the problem of inconsistency between the citation summary and the article's content by providing context for each citation. We also leverage the inherent scientific article's discourse for producing better summaries. We show that our proposed method effectively improves over existing summarization approaches (greater than 30% improvement over the best performing baseline) in terms of \textsc{Rouge} scores on TAC2014 scientific summarization dataset. While the dataset we use for evaluation is in the biomedical domain, most of our approaches are general and therefore adaptable to other domains.Comment: EMNLP 201

arXiv.org e-Print Archive