21,951 research outputs found
Improving Abstraction in Text Summarization
Abstractive text summarization aims to shorten long text documents into a
human readable form that contains the most important facts from the original
document. However, the level of actual abstraction as measured by novel phrases
that do not appear in the source document remains low in existing approaches.
We propose two techniques to improve the level of abstraction of generated
summaries. First, we decompose the decoder into a contextual network that
retrieves relevant parts of the source document, and a pretrained language
model that incorporates prior knowledge about language generation. Second, we
propose a novelty metric that is optimized directly through policy learning to
encourage the generation of novel phrases. Our model achieves results
comparable to state-of-the-art models, as determined by ROUGE scores and human
evaluations, while achieving a significantly higher level of abstraction as
measured by n-gram overlap with the source document
Time Aware Knowledge Extraction for Microblog Summarization on Twitter
Microblogging services like Twitter and Facebook collect millions of user
generated content every moment about trending news, occurring events, and so
on. Nevertheless, it is really a nightmare to find information of interest
through the huge amount of available posts that are often noise and redundant.
In general, social media analytics services have caught increasing attention
from both side research and industry. Specifically, the dynamic context of
microblogging requires to manage not only meaning of information but also the
evolution of knowledge over the timeline. This work defines Time Aware
Knowledge Extraction (briefly TAKE) methodology that relies on temporal
extension of Fuzzy Formal Concept Analysis. In particular, a microblog
summarization algorithm has been defined filtering the concepts organized by
TAKE in a time-dependent hierarchy. The algorithm addresses topic-based
summarization on Twitter. Besides considering the timing of the concepts,
another distinguish feature of the proposed microblog summarization framework
is the possibility to have more or less detailed summary, according to the
user's needs, with good levels of quality and completeness as highlighted in
the experimental results.Comment: 33 pages, 10 figure
Better Summarization Evaluation with Word Embeddings for ROUGE
ROUGE is a widely adopted, automatic evaluation measure for text
summarization. While it has been shown to correlate well with human judgements,
it is biased towards surface lexical similarities. This makes it unsuitable for
the evaluation of abstractive summarization, or summaries with substantial
paraphrasing. We study the effectiveness of word embeddings to overcome this
disadvantage of ROUGE. Specifically, instead of measuring lexical overlaps,
word embeddings are used to compute the semantic similarity of the words used
in summaries instead. Our experimental results show that our proposal is able
to achieve better correlations with human judgements when measured with the
Spearman and Kendall rank coefficients.Comment: Pre-print - To appear in proceedings of the Conference on Empirical
Methods in Natural Language Processing (EMNLP
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a
variety of sources. This volume of text is an invaluable source of information
and knowledge which needs to be effectively summarized to be useful. In this
review, the main approaches to automatic text summarization are described. We
review the different processes for summarization and describe the effectiveness
and shortcomings of the different methods.Comment: Some of references format have update
- …