61,287 research outputs found
Text Summarization
With the overwhelming amount of textual information available in electronic formats on the web, there is a need for an efficient text summarizer capable of condensing large bodies of text into shorter versions while keeping the relevant information intact. Such a technology would allow users to get their information in a shortened form, saving valuable time. Since 1997, Microsoft Word has included a summarizer for documents, and currently there are companies that summarize breaking news and send SMS for mobile phones. I wish to create a text summarizer to provide condensed versions of original documents. My focus is on blogs, because people are increasingly using this mode of communication to express their opinions on a variety of topics. Consequently, it will be very useful for a reader to be able to employ a concise summary, tailored to his or her own interests to quickly browse through volumes of opinions relevant to any number of topics. Although many summarization methods exist, my approach involves employing the Lanczos algorithm to compute eigenvalues and eigenvectors of a large sparse matrix and SVD (Singular Value Decomposition) as a means of identifying latent topics hidden in contexts; and the next phase of the process involves taking a high-dimensional set of data and reducing it to a lower-dimensional set. This procedure makes it possible to identify the best approximation of the original text. Since SQL makes it possible to allow analyzing data sets and take advantage of the parallel processing available today, in most database management systems, SQL is employed in my project. The utilization of SQL without external math libraries, however, adds to challenge in the computation of the SVD and the Lanczos algorithm
Faithful to the Original: Fact Aware Neural Abstractive Summarization
Unlike extractive summarization, abstractive summarization has to fuse
different parts of the source text, which inclines to create fake facts. Our
preliminary study reveals nearly 30% of the outputs from a state-of-the-art
neural summarization system suffer from this problem. While previous
abstractive summarization approaches usually focus on the improvement of
informativeness, we argue that faithfulness is also a vital prerequisite for a
practical abstractive summarization system. To avoid generating fake facts in a
summary, we leverage open information extraction and dependency parse
technologies to extract actual fact descriptions from the source text. The
dual-attention sequence-to-sequence framework is then proposed to force the
generation conditioned on both the source text and the extracted fact
descriptions. Experiments on the Gigaword benchmark dataset demonstrate that
our model can greatly reduce fake summaries by 80%. Notably, the fact
descriptions also bring significant improvement on informativeness since they
often condense the meaning of the source text.Comment: 8 pages, 3 figures, AAAI 201
Generating indicative-informative summaries with SumUM
We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies
LCSTS: A Large Scale Chinese Short Text Summarization Dataset
Automatic text summarization is widely regarded as the highly difficult
problem, partially because of the lack of large text summarization data set.
Due to the great challenge of constructing the large scale summaries for full
text, in this paper, we introduce a large corpus of Chinese short text
summarization dataset constructed from the Chinese microblogging website Sina
Weibo, which is released to the public
{http://icrc.hitsz.edu.cn/Article/show/139.html}. This corpus consists of over
2 million real Chinese short texts with short summaries given by the author of
each text. We also manually tagged the relevance of 10,666 short summaries with
their corresponding short texts. Based on the corpus, we introduce recurrent
neural network for the summary generation and achieve promising results, which
not only shows the usefulness of the proposed corpus for short text
summarization research, but also provides a baseline for further research on
this topic.Comment: Recently, we received feedbacks from Yuya Taguchi from NAIST in Japan
and Qian Chen from USTC of China, that the results in the EMNLP2015 version
seem to be underrated. So we carefully checked our results and find out that
we made a mistake while using the standard ROUGE. Then we re-evaluate all
methods in the paper and get corrected results listed in Table 2 of this
versio
- …
