24,806 research outputs found
Beyond SumBasic: Task-Focused Summarization with Sentence Simplification and Lexical Expansion
In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems
MuLER: Detailed and Scalable Reference-based Evaluation
We propose a novel methodology (namely, MuLER) that transforms any
reference-based evaluation metric for text generation, such as machine
translation (MT) into a fine-grained analysis tool.
Given a system and a metric, MuLER quantifies how much the chosen metric
penalizes specific error types (e.g., errors in translating names of
locations). MuLER thus enables a detailed error analysis which can lead to
targeted improvement efforts for specific phenomena.
We perform experiments in both synthetic and naturalistic settings to support
MuLER's validity and showcase its usability in MT evaluation, and other tasks,
such as summarization. Analyzing all submissions to WMT in 2014-2020, we find
consistent trends. For example, nouns and verbs are among the most frequent POS
tags. However, they are among the hardest to translate. Performance on most POS
tags improves with overall system performance, but a few are not thus
correlated (their identity changes from language to language). Preliminary
experiments with summarization reveal similar trends
Multi-Document Summarization via Discriminative Summary Reranking
Existing multi-document summarization systems usually rely on a specific
summarization model (i.e., a summarization method with a specific parameter
setting) to extract summaries for different document sets with different
topics. However, according to our quantitative analysis, none of the existing
summarization models can always produce high-quality summaries for different
document sets, and even a summarization model with good overall performance may
produce low-quality summaries for some document sets. On the contrary, a
baseline summarization model may produce high-quality summaries for some
document sets. Based on the above observations, we treat the summaries produced
by different summarization models as candidate summaries, and then explore
discriminative reranking techniques to identify high-quality summaries from the
candidates for difference document sets. We propose to extract a set of
candidate summaries for each document set based on an ILP framework, and then
leverage Ranking SVM for summary reranking. Various useful features have been
developed for the reranking process, including word-level features,
sentence-level features and summary-level features. Evaluation results on the
benchmark DUC datasets validate the efficacy and robustness of our proposed
approach
A novel user-centered design for personalized video summarization
In the past, several automatic video summarization systems had been proposed to generate video summary. However, a generic video summary that is generated based only on audio, visual and textual saliencies will not satisfy every user. This paper proposes a novel system for generating semantically meaningful personalized video summaries, which are tailored to the individual user's preferences over video semantics. Each video shot is represented using a semantic multinomial which is a vector of posterior semantic concept probabilities. The proposed system stitches video summary based on summary time span and top-ranked shots that are semantically relevant to the user's preferences. The proposed summarization system is evaluated using both quantitative and subjective evaluation metrics. The experimental results on the performance of the proposed video summarization system are encouraging
Inferring Strategies for Sentence Ordering in Multidocument News Summarization
The problem of organizing information for multidocument summarization so that
the generated summary is coherent has received relatively little attention.
While sentence ordering for single document summarization can be determined
from the ordering of sentences in the input article, this is not the case for
multidocument summarization where summary sentences may be drawn from different
input articles. In this paper, we propose a methodology for studying the
properties of ordering information in the news genre and describe experiments
done on a corpus of multiple acceptable orderings we developed for the task.
Based on these experiments, we implemented a strategy for ordering information
that combines constraints from chronological order of events and topical
relatedness. Evaluation of our augmented algorithm shows a significant
improvement of the ordering over two baseline strategies
A Similarity Measure for Material Appearance
We present a model to measure the similarity in appearance between different
materials, which correlates with human similarity judgments. We first create a
database of 9,000 rendered images depicting objects with varying materials,
shape and illumination. We then gather data on perceived similarity from
crowdsourced experiments; our analysis of over 114,840 answers suggests that
indeed a shared perception of appearance similarity exists. We feed this data
to a deep learning architecture with a novel loss function, which learns a
feature space for materials that correlates with such perceived appearance
similarity. Our evaluation shows that our model outperforms existing metrics.
Last, we demonstrate several applications enabled by our metric, including
appearance-based search for material suggestions, database visualization,
clustering and summarization, and gamut mapping.Comment: 12 pages, 17 figure
- …