414 research outputs found
Diversity driven Attention Model for Query-based Abstractive Summarization
Abstractive summarization aims to generate a shorter version of the document
covering all the salient points in a compact and coherent fashion. On the other
hand, query-based summarization highlights those points that are relevant in
the context of a given query. The encode-attend-decode paradigm has achieved
notable success in machine translation, extractive summarization, dialog
systems, etc. But it suffers from the drawback of generation of repeated
phrases. In this work we propose a model for the query-based summarization task
based on the encode-attend-decode paradigm with two key additions (i) a query
attention model (in addition to document attention model) which learns to focus
on different portions of the query at different time steps (instead of using a
static representation for the query) and (ii) a new diversity based attention
model which aims to alleviate the problem of repeating phrases in the summary.
In order to enable the testing of this model we introduce a new query-based
summarization dataset building on debatepedia. Our experiments show that with
these two additions the proposed model clearly outperforms vanilla
encode-attend-decode models with a gain of 28% (absolute) in ROUGE-L scores.Comment: Accepted at ACL 201
A Multi-task Learning Approach for Improving Product Title Compression with User Search Log Data
It is a challenging and practical research problem to obtain effective
compression of lengthy product titles for E-commerce. This is particularly
important as more and more users browse mobile E-commerce apps and more
merchants make the original product titles redundant and lengthy for Search
Engine Optimization. Traditional text summarization approaches often require a
large amount of preprocessing costs and do not capture the important issue of
conversion rate in E-commerce. This paper proposes a novel multi-task learning
approach for improving product title compression with user search log data. In
particular, a pointer network-based sequence-to-sequence approach is utilized
for title compression with an attentive mechanism as an extractive method and
an attentive encoder-decoder approach is utilized for generating user search
queries. The encoding parameters (i.e., semantic embedding of original titles)
are shared among the two tasks and the attention distributions are jointly
optimized. An extensive set of experiments with both human annotated data and
online deployment demonstrate the advantage of the proposed research for both
compression qualities and online business values.Comment: 8 Pages, accepted at AAAI 201
Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization
Generating a text abstract from a set of documents remains a challenging
task. The neural encoder-decoder framework has recently been exploited to
summarize single documents, but its success can in part be attributed to the
availability of large parallel data automatically acquired from the Web. In
contrast, parallel data for multi-document summarization are scarce and costly
to obtain. There is a pressing need to adapt an encoder-decoder model trained
on single-document summarization data to work with multiple-document input. In
this paper, we present an initial investigation into a novel adaptation method.
It exploits the maximal marginal relevance method to select representative
sentences from multi-document input, and leverages an abstractive
encoder-decoder model to fuse disparate sentences to an abstractive summary.
The adaptation method is robust and itself requires no training data. Our
system compares favorably to state-of-the-art extractive and abstractive
approaches judged by automatic metrics and human assessors.Comment: 11 page
Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation
Most recent approaches use the sequence-to-sequence model for paraphrase
generation. The existing sequence-to-sequence model tends to memorize the words
and the patterns in the training dataset instead of learning the meaning of the
words. Therefore, the generated sentences are often grammatically correct but
semantically improper. In this work, we introduce a novel model based on the
encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our
proposed model generates the words by querying distributed word representations
(i.e. neural word embeddings), hoping to capturing the meaning of the according
words. Following previous work, we evaluate our model on two
paraphrase-oriented tasks, namely text simplification and short text
abstractive summarization. Experimental results show that our model outperforms
the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two
English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a
Chinese summarization dataset. Moreover, our model achieves state-of-the-art
performances on these three benchmark datasets.Comment: arXiv admin note: text overlap with arXiv:1710.0231
A survey on opinion summarization technique s for social media
The volume of data on the social media is huge and even keeps increasing. The need for efficient processing of this extensive information resulted in increasing research interest in knowledge engineering tasks such as Opinion Summarization. This survey shows the current opinion summarization challenges for social media, then the necessary pre-summarization steps like preprocessing, features extraction, noise elimination, and handling of synonym features. Next, it covers the various approaches used in opinion summarization like Visualization, Abstractive, Aspect based, Query-focused, Real Time, Update Summarization, and highlight other Opinion Summarization approaches such as Contrastive, Concept-based, Community Detection, Domain Specific, Bilingual, Social Bookmarking, and Social Media Sampling. It covers the different datasets used in opinion summarization and future work suggested in each technique. Finally, it provides different ways for evaluating opinion summarization
Survey of Query-based Text Summarization
Query-based text summarization is an important real world problem that
requires to condense the prolix text data into a summary under the guidance of
the query information provided by users. The topic has been studied for a long
time and there are many existing interesting research related to query-based
text summarization. Yet much of the work is not systematically surveyed. This
survey aims at summarizing some interesting work in query-based text
summarization methods as well as related generic text summarization methods.
Not all taxonomies in this paper exist the related work to the best of our
knowledge and some analysis will be presented
- …