6,336 research outputs found
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization
We study unsupervised multi-document summarization evaluation metrics, which
require neither human-written reference summaries nor human annotations (e.g.
preferences, ratings, etc.). We propose SUPERT, which rates the quality of a
summary by measuring its semantic similarity with a pseudo reference summary,
i.e. selected salient sentences from the source documents, using contextualized
embeddings and soft token alignment techniques. Compared to the
state-of-the-art unsupervised evaluation metrics, SUPERT correlates better with
human ratings by 18-39%. Furthermore, we use SUPERT as rewards to guide a
neural-based reinforcement learning summarizer, yielding favorable performance
compared to the state-of-the-art unsupervised summarizers. All source code is
available at https://github.com/yg211/acl20-ref-free-eval.Comment: ACL 202
Automatic Text Summarization Approaches to Speed up Topic Model Learning Process
The number of documents available into Internet moves each day up. For this
reason, processing this amount of information effectively and expressibly
becomes a major concern for companies and scientists. Methods that represent a
textual document by a topic representation are widely used in Information
Retrieval (IR) to process big data such as Wikipedia articles. One of the main
difficulty in using topic model on huge data collection is related to the
material resources (CPU time and memory) required for model estimate. To deal
with this issue, we propose to build topic spaces from summarized documents. In
this paper, we present a study of topic space representation in the context of
big data. The topic space representation behavior is analyzed on different
languages. Experiments show that topic spaces estimated from text summaries are
as relevant as those estimated from the complete documents. The real advantage
of such an approach is the processing time gain: we showed that the processing
time can be drastically reduced using summarized documents (more than 60\% in
general). This study finally points out the differences between thematic
representations of documents depending on the targeted languages such as
English or latin languages.Comment: 16 pages, 4 tables, 8 figure
An approach to graph-based analysis of textual documents
In this paper a new graph-based model is proposed for the representation of textual documents. Graph-structures are obtained from textual documents by making use of the well-known Part-Of-Speech (POS) tagging technique. More specifically, a simple rule-based (re) classifier is used to map each tag onto graph vertices and edges. As a result, a decomposition of textual documents is obtained where tokens are automatically parsed and attached to either a vertex or an edge. It is shown how textual documents can be aggregated through their graph-structures and finally, it is shown how vertex-ranking methods can be used to find relevant tokens.(1)
- …