1,305 research outputs found
Extractive Multi-document Summarization Using Multilayer Networks
Huge volumes of textual information has been produced every single day. In
order to organize and understand such large datasets, in recent years,
summarization techniques have become popular. These techniques aims at finding
relevant, concise and non-redundant content from such a big data. While network
methods have been adopted to model texts in some scenarios, a systematic
evaluation of multilayer network models in the multi-document summarization
task has been limited to a few studies. Here, we evaluate the performance of a
multilayer-based method to select the most relevant sentences in the context of
an extractive multi document summarization (MDS) task. In the adopted model,
nodes represent sentences and edges are created based on the number of shared
words between sentences. Differently from previous studies in multi-document
summarization, we make a distinction between edges linking sentences from
different documents (inter-layer) and those connecting sentences from the same
document (intra-layer). As a proof of principle, our results reveal that such a
discrimination between intra- and inter-layer in a multilayered representation
is able to improve the quality of the generated summaries. This piece of
information could be used to improve current statistical methods and related
textual models
PerSum: Novel Systems for Document Summarization in Persian
In this paper we explore the problem of document summarization in Persian
language from two distinct angles. In our first approach, we modify a popular
and widely cited Persian document summarization framework to see how it works
on a realistic corpus of news articles. Human evaluation on generated summaries
shows that graph-based methods perform better than the modified systems. We
carry this intuition forward in our second approach, and probe deeper into the
nature of graph-based systems by designing several summarizers based on
centrality measures. Ad hoc evaluation using ROUGE score on these summarizers
suggests that there is a small class of centrality measures that perform better
than three strong unsupervised baselines.Comment: 42 pages, 9 figure
Multi-layered graph-based multi-document summarization model
Multi-document summarization is a process of automatic generation of a
compressed version of the given collection of documents. Recently, the
graph-based models and ranking algorithms have been actively investigated by
the extractive document summarization community. While most work to date
focuses on homogeneous connecteness of sentences and heterogeneous connecteness
of documents and sentences (e.g. sentence similarity weighted by document
importance), in this paper we present a novel 3-layered graph model that
emphasizes not only sentence and document level relations but also the
influence of under sentence level relations (e.g. a part of sentence
similarity)
Extractive Multi Document Summarization using Dynamical Measurements of Complex Networks
Due to the large amount of textual information available on Internet, it is
of paramount relevance to use techniques that find relevant and concise
content. A typical task devoted to the identification of informative sentences
in documents is the so called extractive document summarization task. In this
paper, we use complex network concepts to devise an extractive Multi Document
Summarization (MDS) method, which extracts the most central sentences from
several textual sources. In the proposed model, texts are represented as
networks, where nodes represent sentences and the edges are established based
on the number of shared words. Differently from previous works, the
identification of relevant terms is guided by the characterization of nodes via
dynamical measurements of complex networks, including symmetry, accessibility
and absorption time. The evaluation of the proposed system revealed that
excellent results were obtained with particular dynamical measurements,
including those based on the exploration of networks via random walks.Comment: Accepted for publication in BRACIS 2017 (Brazilian Conference on
Intelligent Systems
Abstractive Summarization Using Attentive Neural Techniques
In a world of proliferating data, the ability to rapidly summarize text is
growing in importance. Automatic summarization of text can be thought of as a
sequence to sequence problem. Another area of natural language processing that
solves a sequence to sequence problem is machine translation, which is rapidly
evolving due to the development of attention-based encoder-decoder networks.
This work applies these modern techniques to abstractive summarization. We
perform analysis on various attention mechanisms for summarization with the
goal of developing an approach and architecture aimed at improving the state of
the art. In particular, we modify and optimize a translation model with
self-attention for generating abstractive sentence summaries. The effectiveness
of this base model along with attention variants is compared and analyzed in
the context of standardized evaluation sets and test metrics. However, we show
that these metrics are limited in their ability to effectively score
abstractive summaries, and propose a new approach based on the intuition that
an abstractive model requires an abstractive evaluation.Comment: Accepted for oral presentation at the 15th International Conference
on Natural Language Processing (ICON 2018
Query-Focused Opinion Summarization for User-Generated Content
We present a submodular function-based framework for query-focused opinion
summarization. Within our framework, relevance ordering produced by a
statistical ranker, and information coverage with respect to topic distribution
and diverse viewpoints are both encoded as submodular functions. Dispersion
functions are utilized to minimize the redundancy. We are the first to evaluate
different metrics of text similarity for submodularity-based summarization
methods. By experimenting on community QA and blog summarization, we show that
our system outperforms state-of-the-art approaches in both automatic evaluation
and human evaluation. A human evaluation task is conducted on Amazon Mechanical
Turk with scale, and shows that our systems are able to generate summaries of
high overall quality and information diversity.Comment: COLING 201
MUDOS-NG: Multi-document Summaries Using N-gram Graphs (Tech Report)
This report describes the MUDOS-NG summarization system, which applies a set
of language-independent and generic methods for generating extractive
summaries. The proposed methods are mostly combinations of simple operators on
a generic character n-gram graph representation of texts. This work defines the
set of used operators upon n-gram graphs and proposes using these operators
within the multi-document summarization process in such subtasks as document
analysis, salient sentence selection, query expansion and redundancy control.
Furthermore, a novel chunking methodology is used, together with a novel way to
assign concepts to sentences for query expansion. The experimental results of
the summarization system, performed upon widely used corpora from the Document
Understanding and the Text Analysis Conferences, are promising and provide
evidence for the potential of the generic methods introduced. This work aims to
designate core methods exploiting the n-gram graph representation, providing
the basis for more advanced summarization systems.Comment: Technical Repor
Large-Margin Learning of Submodular Summarization Methods
In this paper, we present a supervised learning approach to training
submodular scoring functions for extractive multi-document summarization. By
taking a structured predicition approach, we provide a large-margin method that
directly optimizes a convex relaxation of the desired performance measure. The
learning method applies to all submodular summarization methods, and we
demonstrate its effectiveness for both pairwise as well as coverage-based
scoring functions on multiple datasets. Compared to state-of-the-art functions
that were tuned manually, our method significantly improves performance and
enables high-fidelity models with numbers of parameters well beyond what could
reasonbly be tuned by hand.Comment: update: improved formatting (figure placement) and algorithm
pseudocode clarity (Fig. 3
Diversity in Machine Learning
Machine learning methods have achieved good performance and been widely
applied in various real-world applications. They can learn the model adaptively
and be better fit for special requirements of different tasks. Generally, a
good machine learning system is composed of plentiful training data, a good
model training process, and an accurate inference. Many factors can affect the
performance of the machine learning process, among which the diversity of the
machine learning process is an important one. The diversity can help each
procedure to guarantee a total good machine learning: diversity of the training
data ensures that the training data can provide more discriminative information
for the model, diversity of the learned model (diversity in parameters of each
model or diversity among different base models) makes each parameter/model
capture unique or complement information and the diversity in inference can
provide multiple choices each of which corresponds to a specific plausible
local optimal result. Even though the diversity plays an important role in
machine learning process, there is no systematical analysis of the
diversification in machine learning system. In this paper, we systematically
summarize the methods to make data diversification, model diversification, and
inference diversification in the machine learning process, respectively. In
addition, the typical applications where the diversity technology improved the
machine learning performance have been surveyed, including the remote sensing
imaging tasks, machine translation, camera relocalization, image segmentation,
object detection, topic modeling, and others. Finally, we discuss some
challenges of the diversity technology in machine learning and point out some
directions in future work.Comment: Accepted by IEEE Acces
Scientific Article Summarization Using Citation-Context and Article's Discourse Structure
We propose a summarization approach for scientific articles which takes
advantage of citation-context and the document discourse model. While citations
have been previously used in generating scientific summaries, they lack the
related context from the referenced article and therefore do not accurately
reflect the article's content. Our method overcomes the problem of
inconsistency between the citation summary and the article's content by
providing context for each citation. We also leverage the inherent scientific
article's discourse for producing better summaries. We show that our proposed
method effectively improves over existing summarization approaches (greater
than 30% improvement over the best performing baseline) in terms of
\textsc{Rouge} scores on TAC2014 scientific summarization dataset. While the
dataset we use for evaluation is in the biomedical domain, most of our
approaches are general and therefore adaptable to other domains.Comment: EMNLP 201
- …