28 research outputs found
Efficient Data Representation by Selecting Prototypes with Importance Weights
Prototypical examples that best summarizes and compactly represents an
underlying complex data distribution communicate meaningful insights to humans
in domains where simple explanations are hard to extract. In this paper we
present algorithms with strong theoretical guarantees to mine these data sets
and select prototypes a.k.a. representatives that optimally describes them. Our
work notably generalizes the recent work by Kim et al. (2016) where in addition
to selecting prototypes, we also associate non-negative weights which are
indicative of their importance. This extension provides a single coherent
framework under which both prototypes and criticisms (i.e. outliers) can be
found. Furthermore, our framework works for any symmetric positive definite
kernel thus addressing one of the key open questions laid out in Kim et al.
(2016). By establishing that our objective function enjoys a key property of
that of weak submodularity, we present a fast ProtoDash algorithm and also
derive approximation guarantees for the same. We demonstrate the efficacy of
our method on diverse domains such as retail, digit recognition (MNIST) and on
publicly available 40 health questionnaires obtained from the Center for
Disease Control (CDC) website maintained by the US Dept. of Health. We validate
the results quantitatively as well as qualitatively based on expert feedback
and recently published scientific studies on public health, thus showcasing the
power of our technique in providing actionability (for retail), utility (for
MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an
effective data mining method.Comment: Accepted for publication in International Conference on Data Mining
(ICDM) 201
Signal Recovery in Perturbed Fourier Compressed Sensing
In many applications in compressed sensing, the measurement matrix is a
Fourier matrix, i.e., it measures the Fourier transform of the underlying
signal at some specified `base' frequencies , where is the
number of measurements. However due to system calibration errors, the system
may measure the Fourier transform at frequencies
that are different from the base frequencies and where
are unknown. Ignoring perturbations of this nature can lead to major errors in
signal recovery. In this paper, we present a simple but effective alternating
minimization algorithm to recover the perturbations in the frequencies \emph{in
situ} with the signal, which we assume is sparse or compressible in some known
basis. In many cases, the perturbations can be expressed
in terms of a small number of unique parameters . We demonstrate that
in such cases, the method leads to excellent quality results that are several
times better than baseline algorithms (which are based on existing off-grid
methods in the recent literature on direction of arrival (DOA) estimation,
modified to suit the computational problem in this paper). Our results are also
robust to noise in the measurement values. We also provide theoretical results
for (1) the convergence of our algorithm, and (2) the uniqueness of its
solution under some restrictions.Comment: New theortical results about uniqueness and convergence now included.
More challenging experiments now include