402 research outputs found
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a
variety of sources. This volume of text is an invaluable source of information
and knowledge which needs to be effectively summarized to be useful. In this
review, the main approaches to automatic text summarization are described. We
review the different processes for summarization and describe the effectiveness
and shortcomings of the different methods.Comment: Some of references format have update
Improving Sequential Determinantal Point Processes for Supervised Video Summarization
It is now much easier than ever before to produce videos. While the
ubiquitous video data is a great source for information discovery and
extraction, the computational challenges are unparalleled. Automatically
summarizing the videos has become a substantial need for browsing, searching,
and indexing visual content. This paper is in the vein of supervised video
summarization using sequential determinantal point process (SeqDPP), which
models diversity by a probabilistic distribution. We improve this model in two
folds. In terms of learning, we propose a large-margin algorithm to address the
exposure bias problem in SeqDPP. In terms of modeling, we design a new
probabilistic distribution such that, when it is integrated into SeqDPP, the
resulting model accepts user input about the expected length of the summary.
Moreover, we also significantly extend a popular video summarization dataset by
1) more egocentric videos, 2) dense user annotations, and 3) a refined
evaluation scheme. We conduct extensive experiments on this dataset (about 60
hours of videos in total) and compare our approach to several competitive
baselines
A Supervised Approach to Extractive Summarisation of Scientific Papers
Automatic summarisation is a popular approach to reduce a document to its
main arguments. Recent research in the area has focused on neural approaches to
summarisation, which can be very data-hungry. However, few large datasets exist
and none for the traditionally popular domain of scientific publications, which
opens up challenging research avenues centered on encoding large, complex
documents. In this paper, we introduce a new dataset for summarisation of
computer science publications by exploiting a large resource of author provided
summaries and show straightforward ways of extending it further. We develop
models on the dataset making use of both neural sentence encoding and
traditionally used summarisation features and show that models which encode
sentences as well as their local and global context perform best, significantly
outperforming well-established baseline methods.Comment: 11 pages, 6 figure
Abstract Meaning Representation for Multi-Document Summarization
Generating an abstract from a collection of documents is a desirable
capability for many real-world applications. However, abstractive approaches to
multi-document summarization have not been thoroughly investigated. This paper
studies the feasibility of using Abstract Meaning Representation (AMR), a
semantic representation of natural language grounded in linguistic theory, as a
form of content representation. Our approach condenses source documents to a
set of summary graphs following the AMR formalism. The summary graphs are then
transformed to a set of summary sentences in a surface realization step. The
framework is fully data-driven and flexible. Each component can be optimized
independently using small-scale, in-domain training data. We perform
experiments on benchmark summarization datasets and report promising results.
We also describe opportunities and challenges for advancing this line of
research.Comment: 13 page
- …