3,069 research outputs found
A literature review of abstractive summarization methods
The paper contains a literature review for automatic abstractive text summarization. The classification of abstractive text summarization methods was considered. Since the emergence of text summarization in the 1950s, techniques for summaries generation were constantly improving, but because the abstractive summarization require extensive language processing, the greatest progress was achieved only recently. Due to the current fast pace of development of both Natural Language Processing in general and Text Summarization in particular, it is essential to analyze the progress in these areas. The paper aims to give a general perspective on both the state-of-the-art and older approaches, while explaining the methods and approaches. Additionally, evaluation results of the research papers are presented
Abstract Meaning Representation for Multi-Document Summarization
Generating an abstract from a collection of documents is a desirable
capability for many real-world applications. However, abstractive approaches to
multi-document summarization have not been thoroughly investigated. This paper
studies the feasibility of using Abstract Meaning Representation (AMR), a
semantic representation of natural language grounded in linguistic theory, as a
form of content representation. Our approach condenses source documents to a
set of summary graphs following the AMR formalism. The summary graphs are then
transformed to a set of summary sentences in a surface realization step. The
framework is fully data-driven and flexible. Each component can be optimized
independently using small-scale, in-domain training data. We perform
experiments on benchmark summarization datasets and report promising results.
We also describe opportunities and challenges for advancing this line of
research.Comment: 13 page
Dual encoding for abstractive text summarization
Recurrent Neural Network (RNN) based sequence-to-sequence attentional models have proven effective in abstractive text summarization. In this paper, we model abstractive text summarization using a dual encoding model. Different from the previous works only using a single encoder, the proposed method employs a dual encoder including the primary and the secondary encoders. Specifically, the primary encoder conducts coarse encoding in a regular way, while the secondary encoder models the importance of words and generates more fine encoding based on the input raw text and the previously generated output text summarization. The two level encodings are combined and fed into the decoder to generate more diverse summary that can decrease repetition phenomenon for long sequence generation. The experimental results on two challenging datasets (i.e., CNN/DailyMail and DUC 2004) demonstrate that our dual encoding model performs against existing methods
Generating Abstractive Summaries from Meeting Transcripts
Summaries of meetings are very important as they convey the essential content
of discussions in a concise form. Generally, it is time consuming to read and
understand the whole documents. Therefore, summaries play an important role as
the readers are interested in only the important context of discussions. In
this work, we address the task of meeting document summarization. Automatic
summarization systems on meeting conversations developed so far have been
primarily extractive, resulting in unacceptable summaries that are hard to
read. The extracted utterances contain disfluencies that affect the quality of
the extractive summaries. To make summaries much more readable, we propose an
approach to generating abstractive summaries by fusing important content from
several utterances. We first separate meeting transcripts into various topic
segments, and then identify the important utterances in each segment using a
supervised learning approach. The important utterances are then combined
together to generate a one-sentence summary. In the text generation step, the
dependency parses of the utterances in each segment are combined together to
create a directed graph. The most informative and well-formed sub-graph
obtained by integer linear programming (ILP) is selected to generate a
one-sentence summary for each topic segment. The ILP formulation reduces
disfluencies by leveraging grammatical relations that are more prominent in
non-conversational style of text, and therefore generates summaries that is
comparable to human-written abstractive summaries. Experimental results show
that our method can generate more informative summaries than the baselines. In
addition, readability assessments by human judges as well as log-likelihood
estimates obtained from the dependency parser show that our generated summaries
are significantly readable and well-formed.Comment: 10 pages, Proceedings of the 2015 ACM Symposium on Document
Engineering, DocEng' 201
- β¦