2,065 research outputs found
Cross-Language Text Summarization using Sentence and Multi-Sentence Compression
long paperInternational audienceCross-Language Automatic Text Summarization produces a summary in a language different from the language of the source documents. In this paper, we propose a French-to-English cross-lingual sum-marization framework that analyzes the information in both languages to identify the most relevant sentences. In order to generate more informative cross-lingual summaries, we introduce the use of chunks and two compression methods at the sentence and multi-sentence levels. Experimental results on the MultiLing 2011 dataset show that our framework improves the results obtained by state-of-the art approaches according to ROUGE metrics
Multiple Alternative Sentene Compressions as a Tool for Automatic Summarization Tasks
Automatic summarization is the distillation of important information from a source into an abridged form for a particular user or task.
Many current systems summarize texts by selecting sentences with important content. The limitation of extraction at the sentence level
is that highly relevant sentences may also contain non-relevant and
redundant content.
This thesis presents a novel framework for text summarization that
addresses the limitations of sentence-level extraction. Under this
framework text summarization is performed by generating Multiple
Alternative Sentence Compressions (MASC) as candidate summary
components and using weighted features of the candidates to construct
summaries from them. Sentence compression is the rewriting of a
sentence in a shorter form. This framework provides an environment in
which hypotheses about summarization techniques can be tested.
Three approaches to sentence compression were developed under this
framework. The first approach, HMM Hedge, uses the Noisy Channel
Model to calculate the most likely compressions of a sentence. The
second approach, Trimmer, uses syntactic trimming rules that are
linguistically motivated by Headlinese, a form of compressed English
associated with newspaper headlines. The third approach, Topiary, is
a combination of fluent text with topic terms.
The MASC framework for automatic text summarization has been applied
to the tasks of headline generation and multi-document summarization,
and has been used for initial work in summarization of novel genres
and applications, including broadcast news, email threads,
cross-language, and structured queries. The framework supports
combinations of component techniques, fostering collaboration between
development teams.
Three results will be demonstrated under the MASC framework. The first is
that an extractive summarization system can produce better summaries
by automatically selecting from a pool of compressed sentence
candidates than by automatically selecting from unaltered source
sentences. The second result is that sentence selectors can construct
better summaries from pools of compressed candidates when they make
use of larger candidate feature sets. The third result is that for
the task of Headline Generation, a combination of topic terms and
compressed sentences performs better then either approach alone.
Experimental evidence supports all three results
A Multilingual Study of Compressive Cross-Language Text Summarization
Cross-Language Text Summarization (CLTS) generates summaries in a language
different from the language of the source documents. Recent methods use
information from both languages to generate summaries with the most informative
sentences. However, these methods have performance that can vary according to
languages, which can reduce the quality of summaries. In this paper, we propose
a compressive framework to generate cross-language summaries. In order to
analyze performance and especially stability, we tested our system and
extractive baselines on a dataset available in four languages (English, French,
Portuguese, and Spanish) to generate English and French summaries. An automatic
evaluation showed that our method outperformed extractive state-of-art CLTS
methods with better and more stable ROUGE scores for all languages
Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization
Generating a text abstract from a set of documents remains a challenging
task. The neural encoder-decoder framework has recently been exploited to
summarize single documents, but its success can in part be attributed to the
availability of large parallel data automatically acquired from the Web. In
contrast, parallel data for multi-document summarization are scarce and costly
to obtain. There is a pressing need to adapt an encoder-decoder model trained
on single-document summarization data to work with multiple-document input. In
this paper, we present an initial investigation into a novel adaptation method.
It exploits the maximal marginal relevance method to select representative
sentences from multi-document input, and leverages an abstractive
encoder-decoder model to fuse disparate sentences to an abstractive summary.
The adaptation method is robust and itself requires no training data. Our
system compares favorably to state-of-the-art extractive and abstractive
approaches judged by automatic metrics and human assessors.Comment: 11 page
A Novel ILP Framework for Summarizing Content with High Lexical Variety
Summarizing content contributed by individuals can be challenging, because
people make different lexical choices even when describing the same events.
However, there remains a significant need to summarize such content. Examples
include the student responses to post-class reflective questions, product
reviews, and news articles published by different news agencies related to the
same events. High lexical diversity of these documents hinders the system's
ability to effectively identify salient content and reduce summary redundancy.
In this paper, we overcome this issue by introducing an integer linear
programming-based summarization framework. It incorporates a low-rank
approximation to the sentence-word co-occurrence matrix to intrinsically group
semantically-similar lexical items. We conduct extensive experiments on
datasets of student responses, product reviews, and news documents. Our
approach compares favorably to a number of extractive baselines as well as a
neural abstractive summarization system. The paper finally sheds light on when
and why the proposed framework is effective at summarizing content with high
lexical variety.Comment: Accepted for publication in the journal of Natural Language
Engineering, 201
- …