1,797 research outputs found
A Novel ILP Framework for Summarizing Content with High Lexical Variety
Summarizing content contributed by individuals can be challenging, because
people make different lexical choices even when describing the same events.
However, there remains a significant need to summarize such content. Examples
include the student responses to post-class reflective questions, product
reviews, and news articles published by different news agencies related to the
same events. High lexical diversity of these documents hinders the system's
ability to effectively identify salient content and reduce summary redundancy.
In this paper, we overcome this issue by introducing an integer linear
programming-based summarization framework. It incorporates a low-rank
approximation to the sentence-word co-occurrence matrix to intrinsically group
semantically-similar lexical items. We conduct extensive experiments on
datasets of student responses, product reviews, and news documents. Our
approach compares favorably to a number of extractive baselines as well as a
neural abstractive summarization system. The paper finally sheds light on when
and why the proposed framework is effective at summarizing content with high
lexical variety.Comment: Accepted for publication in the journal of Natural Language
Engineering, 201
The TRECVID 2007 BBC rushes summarization evaluation pilot
This paper provides an overview of a pilot evaluation of
video summaries using rushes from several BBC dramatic series. It was carried out under the auspices of TRECVID.
Twenty-two research teams submitted video summaries of
up to 4% duration, of 42 individual rushes video files aimed
at compressing out redundant and insignificant material.
The output of two baseline systems built on straightforward
content reduction techniques was contributed by Carnegie
Mellon University as a control. Procedures for developing
ground truth lists of important segments from each video
were developed at Dublin City University and applied to
the BBC video. At NIST each summary was judged by
three humans with respect to how much of the ground truth
was included, how easy the summary was to understand,
and how much repeated material the summary contained.
Additional objective measures included: how long it took
the system to create the summary, how long it took the assessor to judge it against the ground truth, and what the
summary's duration was. Assessor agreement on finding desired segments averaged 78% and results indicate that while it is difficult to exceed the performance of baselines, a few systems did
Can we predict a riot? Disruptive event detection using Twitter
In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook, and YouTube. In these highly interactive systems, the general public are able to post real-time reactions to “real world” events, thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task but would be of high value to public safety organisations such as local police, who need to respond accordingly. To address this challenge, we present an end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization. The integration between classification and clustering enables events to be detected, as well as related smaller-scale “disruptive events,” smaller incidents that threaten social safety and security or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely temporal, spatial, and textual content. We evaluate our framework on a large-scale, real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We use ground-truth data based on intelligence gathered by the London Metropolitan Police Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform as well as terrestrial sources, and even better in some cases
The Structure and Dynamics of Co-Citation Clusters: A Multiple-Perspective Co-Citation Analysis
A multiple-perspective co-citation analysis method is introduced for
characterizing and interpreting the structure and dynamics of co-citation
clusters. The method facilitates analytic and sense making tasks by integrating
network visualization, spectral clustering, automatic cluster labeling, and
text summarization. Co-citation networks are decomposed into co-citation
clusters. The interpretation of these clusters is augmented by automatic
cluster labeling and summarization. The method focuses on the interrelations
between a co-citation cluster's members and their citers. The generic method is
applied to a three-part analysis of the field of Information Science as defined
by 12 journals published between 1996 and 2008: 1) a comparative author
co-citation analysis (ACA), 2) a progressive ACA of a time series of
co-citation networks, and 3) a progressive document co-citation analysis (DCA).
Results show that the multiple-perspective method increases the
interpretability and accountability of both ACA and DCA networks.Comment: 33 pages, 11 figures, 10 tables. To appear in the Journal of the
American Society for Information Science and Technolog
- …