6,289 research outputs found
Fuzzy Sets Across the Natural Language Generation Pipeline
We explore the implications of using fuzzy techniques (mainly those commonly
used in the linguistic description/summarization of data discipline) from a
natural language generation perspective. For this, we provide an extensive
discussion of some general convergence points and an exploration of the
relationship between the different tasks involved in the standard NLG system
pipeline architecture and the most common fuzzy approaches used in linguistic
summarization/description of data, such as fuzzy quantified statements,
evaluation criteria or aggregation operators. Each individual discussion is
illustrated with a related use case. Recent work made in the context of
cross-fertilization of both research fields is also referenced. This paper
encompasses general ideas that emerged as part of the PhD thesis "Application
of fuzzy sets in data-to-text systems". It does not present a specific
application or a formal approach, but rather discusses current high-level
issues and potential usages of fuzzy sets (focused on linguistic summarization
of data) in natural language generation.Comment: Paper features: 16 pages, 2 tables, 13 figure
Model interpretation through lower-dimensional posterior summarization
Nonparametric regression models have recently surged in their power and
popularity, accompanying the trend of increasing dataset size and complexity.
While these models have proven their predictive ability in empirical settings,
they are often difficult to interpret and do not address the underlying
inferential goals of the analyst or decision maker. In this paper, we propose a
modular two-stage approach for creating parsimonious, interpretable summaries
of complex models which allow freedom in the choice of modeling technique and
the inferential target. In the first stage a flexible model is fit which is
believed to be as accurate as possible. In the second stage, lower-dimensional
summaries are constructed by projecting draws from the distribution onto
simpler structures. These summaries naturally come with valid Bayesian
uncertainty estimates. Further, since we use the data only once to move from
prior to posterior, these uncertainty estimates remain valid across multiple
summaries and after iteratively refining a summary. We apply our method and
demonstrate its strengths across a range of simulated and real datasets. Code
to reproduce the examples shown is avaiable at github.com/spencerwoody/ghostComment: 40 pages, 16 figure
A Unified Multi-Faceted Video Summarization System
This paper addresses automatic summarization and search in visual data
comprising of videos, live streams and image collections in a unified manner.
In particular, we propose a framework for multi-faceted summarization which
extracts key-frames (image summaries), skims (video summaries) and entity
summaries (summarization at the level of entities like objects, scenes, humans
and faces in the video). The user can either view these as extractive
summarization, or query focused summarization. Our approach first pre-processes
the video or image collection once, to extract all important visual features,
following which we provide an interactive mechanism to the user to summarize
the video based on their choice. We investigate several diversity, coverage and
representation models for all these problems, and argue the utility of these
different mod- els depending on the application. While most of the prior work
on submodular summarization approaches has focused on combining several models
and learning weighted mixtures, we focus on the explain-ability of different
the diversity, coverage and representation models and their scalability. Most
importantly, we also show that we can summarize hours of video data in a few
seconds, and our system allows the user to generate summaries of various
lengths and types interactively on the fly.Comment: 18 pages, 11 Figure
Vis-DSS: An Open-Source toolkit for Visual Data Selection and Summarization
With increasing amounts of visual data being created in the form of videos
and images, visual data selection and summarization are becoming ever
increasing problems. We present Vis-DSS, an open-source toolkit for Visual Data
Selection and Summarization. Vis-DSS implements a framework of models for
summarization and data subset selection using submodular functions, which are
becoming increasingly popular today for these problems. We present several
classes of models, capturing notions of diversity, coverage, representation and
importance, along with optimization/inference and learning algorithms. Vis-DSS
is the first open source toolkit for several Data selection and summarization
tasks including Image Collection Summarization, Video Summarization, Training
Data selection for Classification and Diversified Active Learning. We
demonstrate state-of-the art performance on all these tasks, and also show how
we can scale to large problems. Vis-DSS allows easy integration for
applications to be built on it, also can serve as a general skeleton that can
be extended to several use cases, including video and image sharing platforms
for creating GIFs, image montage creation, or as a component to surveillance
systems and we demonstrate this by providing a graphical user-interface (GUI)
desktop app built over Qt framework. Vis-DSS is available at
https://github.com/rishabhk108/vis-dssComment: Vis-DSS is available at https://github.com/rishabhk108/vis-ds
Extending a Single-Document Summarizer to Multi-Document: a Hierarchical Approach
The increasing amount of online content motivated the development of
multi-document summarization methods. In this work, we explore straightforward
approaches to extend single-document summarization methods to multi-document
summarization. The proposed methods are based on the hierarchical combination
of single-document summaries, and achieves state of the art results.Comment: 6 pages, Please cite: Proceedings of *SEM: the 4th Joint Conference
on Lexical and Computational Semantics (bibtex:
http://aclweb.org/anthology/S/S15/S15-1020.bib
Network Modeling and Pathway Inference from Incomplete Data ("PathInf")
In this work, we developed a network inference method from incomplete data
("PathInf") , as massive and non-uniformly distributed missing values is a
common challenge in practical problems. PathInf is a two-stages inference
model. In the first stage, it applies a data summarization model based on
maximum likelihood to deal with the massive distributed missing values by
transforming the observation-wise items in the data into state matrix. In the
second stage, transition pattern (i.e. pathway) among variables is inferred as
a graph inference problem solved by greedy algorithm with constraints. The
proposed method was validated and compared with the state-of-art Bayesian
network method on the simulation data, and shown consistently superior
performance. By applying the PathInf on the lymph vascular metastasis data, we
obtained the holistic pathways of the lymph node metastasis with novel
discoveries on the jumping metastasis among nodes that are physically apart.
The discovery indicates the possible presence of sentinel node groups in the
lung lymph nodes which have been previously speculated yet never found. The
pathway map can also improve the current dissection examination protocol for
better individualized treatment planning, for higher diagnostic accuracy and
reducing the patients trauma.Comment: Xiang Li, Qitian Che and Xing Wang contribute equally to this wor
What comes next? Extractive summarization by next-sentence prediction
Existing approaches to automatic summarization assume that a length limit for
the summary is given, and view content selection as an optimization problem to
maximize informativeness and minimize redundancy within this budget. This
framework ignores the fact that human-written summaries have rich internal
structure which can be exploited to train a summarization system. We present
NEXTSUM, a novel approach to summarization based on a model that predicts the
next sentence to include in the summary using not only the source article, but
also the summary produced so far. We show that such a model successfully
captures summary-specific discourse moves, and leads to better content
selection performance, in addition to automatically predicting how long the
target summary should be. We perform experiments on the New York Times
Annotated Corpus of summaries, where NEXTSUM outperforms lead and content-model
summarization baselines by significant margins. We also show that the lengths
of summaries produced by our system correlates with the lengths of the
human-written gold standards
Analyzing Evolving Stories in News Articles
There is an overwhelming number of news articles published every day around
the globe. Following the evolution of a news-story is a difficult task given
that there is no such mechanism available to track back in time to study the
diffusion of the relevant events in digital news feeds. The techniques
developed so far to extract meaningful information from a massive corpus rely
on similarity search, which results in a myopic loopback to the same topic
without providing the needed insights to hypothesize the origin of a story that
may be completely different than the news today. In this paper, we present an
algorithm that mines historical data to detect the origin of an event, segments
the timeline into disjoint groups of coherent news articles, and outlines the
most important documents in a timeline with a soft probability to provide a
better understanding of the evolution of a story. Qualitative and quantitative
approaches to evaluate our framework demonstrate that our algorithm discovers
statistically significant and meaningful stories in reasonable time.
Additionally, a relevant case study on a set of news articles demonstrates that
the generated output of the algorithm holds the promise to aid prediction of
future entities in a story.Comment: This is a pre-print of an article published in the International
Journal of Data Science and Analytics. The final authenticated version is
available online at: https://doi.org/10.1007/s41060-017-0091-
Plan-Recognition-Driven Attention Modeling for Visual Recognition
Human visual recognition of activities or external agents involves an
interplay between high-level plan recognition and low-level perception. Given
that, a natural question to ask is: can low-level perception be improved by
high-level plan recognition? We formulate the problem of leveraging recognized
plans to generate better top-down attention maps
\cite{gazzaniga2009,baluch2011} to improve the perception performance. We call
these top-down attention maps specifically as plan-recognition-driven attention
maps. To address this problem, we introduce the Pixel Dynamics Network. Pixel
Dynamics Network serves as an observation model, which predicts next states of
object points at each pixel location given observation of pixels and
pixel-level action feature. This is like internally learning a pixel-level
dynamics model. Pixel Dynamics Network is a kind of Convolutional Neural
Network (ConvNet), with specially-designed architecture. Therefore, Pixel
Dynamics Network could take the advantage of parallel computation of ConvNets,
while learning the pixel-level dynamics model. We further prove the equivalence
between Pixel Dynamics Network as an observation model, and the belief update
in partially observable Markov decision process (POMDP) framework. We evaluate
our Pixel Dynamics Network in event recognition tasks. We build an event
recognition system, ER-PRN, which takes Pixel Dynamics Network as a subroutine,
to recognize events based on observations augmented by plan-recognition-driven
attention
Conceptual Text Summarizer: A new model in continuous vector space
Traditional methods of summarization are not cost-effective and possible
today. Extractive summarization is a process that helps to extract the most
important sentences from a text automatically and generates a short informative
summary. In this work, we propose an unsupervised method to summarize Persian
texts. This method is a novel hybrid approach that clusters the concepts of the
text using deep learning and traditional statistical methods. First we produce
a word embedding based on Hamshahri2 corpus and a dictionary of word
frequencies. Then the proposed algorithm extracts the keywords of the document,
clusters its concepts, and finally ranks the sentences to produce the summary.
We evaluated the proposed method on Pasokh single-document corpus using the
ROUGE evaluation measure. Without using any hand-crafted features, our proposed
method achieves state-of-the-art results. We compared our unsupervised method
with the best supervised Persian methods and we achieved an overall improvement
of ROUGE-2 recall score of 7.5%.Comment: The experimental results complete
- …