9,600 research outputs found
Efficient Data Representation by Selecting Prototypes with Importance Weights
Prototypical examples that best summarizes and compactly represents an
underlying complex data distribution communicate meaningful insights to humans
in domains where simple explanations are hard to extract. In this paper we
present algorithms with strong theoretical guarantees to mine these data sets
and select prototypes a.k.a. representatives that optimally describes them. Our
work notably generalizes the recent work by Kim et al. (2016) where in addition
to selecting prototypes, we also associate non-negative weights which are
indicative of their importance. This extension provides a single coherent
framework under which both prototypes and criticisms (i.e. outliers) can be
found. Furthermore, our framework works for any symmetric positive definite
kernel thus addressing one of the key open questions laid out in Kim et al.
(2016). By establishing that our objective function enjoys a key property of
that of weak submodularity, we present a fast ProtoDash algorithm and also
derive approximation guarantees for the same. We demonstrate the efficacy of
our method on diverse domains such as retail, digit recognition (MNIST) and on
publicly available 40 health questionnaires obtained from the Center for
Disease Control (CDC) website maintained by the US Dept. of Health. We validate
the results quantitatively as well as qualitatively based on expert feedback
and recently published scientific studies on public health, thus showcasing the
power of our technique in providing actionability (for retail), utility (for
MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an
effective data mining method.Comment: Accepted for publication in International Conference on Data Mining
(ICDM) 201
Activity recognition from videos with parallel hypergraph matching on GPUs
In this paper, we propose a method for activity recognition from videos based
on sparse local features and hypergraph matching. We benefit from special
properties of the temporal domain in the data to derive a sequential and fast
graph matching algorithm for GPUs.
Traditionally, graphs and hypergraphs are frequently used to recognize
complex and often non-rigid patterns in computer vision, either through graph
matching or point-set matching with graphs. Most formulations resort to the
minimization of a difficult discrete energy function mixing geometric or
structural terms with data attached terms involving appearance features.
Traditional methods solve this minimization problem approximately, for instance
with spectral techniques.
In this work, instead of solving the problem approximatively, the exact
solution for the optimal assignment is calculated in parallel on GPUs. The
graphical structure is simplified and regularized, which allows to derive an
efficient recursive minimization algorithm. The algorithm distributes
subproblems over the calculation units of a GPU, which solves them in parallel,
allowing the system to run faster than real-time on medium-end GPUs
The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification
We present the Bayesian Case Model (BCM), a general framework for Bayesian
case-based reasoning (CBR) and prototype classification and clustering. BCM
brings the intuitive power of CBR to a Bayesian generative framework. The BCM
learns prototypes, the "quintessential" observations that best represent
clusters in a dataset, by performing joint inference on cluster labels,
prototypes and important features. Simultaneously, BCM pursues sparsity by
learning subspaces, the sets of features that play important roles in the
characterization of the prototypes. The prototype and subspace representation
provides quantitative benefits in interpretability while preserving
classification accuracy. Human subject experiments verify statistically
significant improvements to participants' understanding when using explanations
produced by BCM, compared to those given by prior art.Comment: Published in Neural Information Processing Systems (NIPS) 2014,
Neural Information Processing Systems (NIPS) 201
Prototype selection for parameter estimation in complex models
Parameter estimation in astrophysics often requires the use of complex
physical models. In this paper we study the problem of estimating the
parameters that describe star formation history (SFH) in galaxies. Here,
high-dimensional spectral data from galaxies are appropriately modeled as
linear combinations of physical components, called simple stellar populations
(SSPs), plus some nonlinear distortions. Theoretical data for each SSP is
produced for a fixed parameter vector via computer modeling. Though the
parameters that define each SSP are continuous, optimizing the signal model
over a large set of SSPs on a fine parameter grid is computationally infeasible
and inefficient. The goal of this study is to estimate the set of parameters
that describes the SFH of each galaxy. These target parameters, such as the
average ages and chemical compositions of the galaxy's stellar populations, are
derived from the SSP parameters and the component weights in the signal model.
Here, we introduce a principled approach of choosing a small basis of SSP
prototypes for SFH parameter estimation. The basic idea is to quantize the
vector space and effective support of the model components. In addition to
greater computational efficiency, we achieve better estimates of the SFH target
parameters. In simulations, our proposed quantization method obtains a
substantial improvement in estimating the target parameters over the common
method of employing a parameter grid. Sparse coding techniques are not
appropriate for this problem without proper constraints, while constrained
sparse coding methods perform poorly for parameter estimation because their
objective is signal reconstruction, not estimation of the target parameters.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS500 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Sub-Nyquist Sampling: Bridging Theory and Practice
Sampling theory encompasses all aspects related to the conversion of
continuous-time signals to discrete streams of numbers. The famous
Shannon-Nyquist theorem has become a landmark in the development of digital
signal processing. In modern applications, an increasingly number of functions
is being pushed forward to sophisticated software algorithms, leaving only
those delicate finely-tuned tasks for the circuit level.
In this paper, we review sampling strategies which target reduction of the
ADC rate below Nyquist. Our survey covers classic works from the early 50's of
the previous century through recent publications from the past several years.
The prime focus is bridging theory and practice, that is to pinpoint the
potential of sub-Nyquist strategies to emerge from the math to the hardware. In
that spirit, we integrate contemporary theoretical viewpoints, which study
signal modeling in a union of subspaces, together with a taste of practical
aspects, namely how the avant-garde modalities boil down to concrete signal
processing systems. Our hope is that this presentation style will attract the
interest of both researchers and engineers in the hope of promoting the
sub-Nyquist premise into practical applications, and encouraging further
research into this exciting new frontier.Comment: 48 pages, 18 figures, to appear in IEEE Signal Processing Magazin
- …