14,111 research outputs found
Efficient Data Representation by Selecting Prototypes with Importance Weights
Prototypical examples that best summarizes and compactly represents an
underlying complex data distribution communicate meaningful insights to humans
in domains where simple explanations are hard to extract. In this paper we
present algorithms with strong theoretical guarantees to mine these data sets
and select prototypes a.k.a. representatives that optimally describes them. Our
work notably generalizes the recent work by Kim et al. (2016) where in addition
to selecting prototypes, we also associate non-negative weights which are
indicative of their importance. This extension provides a single coherent
framework under which both prototypes and criticisms (i.e. outliers) can be
found. Furthermore, our framework works for any symmetric positive definite
kernel thus addressing one of the key open questions laid out in Kim et al.
(2016). By establishing that our objective function enjoys a key property of
that of weak submodularity, we present a fast ProtoDash algorithm and also
derive approximation guarantees for the same. We demonstrate the efficacy of
our method on diverse domains such as retail, digit recognition (MNIST) and on
publicly available 40 health questionnaires obtained from the Center for
Disease Control (CDC) website maintained by the US Dept. of Health. We validate
the results quantitatively as well as qualitatively based on expert feedback
and recently published scientific studies on public health, thus showcasing the
power of our technique in providing actionability (for retail), utility (for
MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an
effective data mining method.Comment: Accepted for publication in International Conference on Data Mining
(ICDM) 201
Spectral Unmixing with Multiple Dictionaries
Spectral unmixing aims at recovering the spectral signatures of materials,
called endmembers, mixed in a hyperspectral or multispectral image, along with
their abundances. A typical assumption is that the image contains one pure
pixel per endmember, in which case spectral unmixing reduces to identifying
these pixels. Many fully automated methods have been proposed in recent years,
but little work has been done to allow users to select areas where pure pixels
are present manually or using a segmentation algorithm. Additionally, in a
non-blind approach, several spectral libraries may be available rather than a
single one, with a fixed number (or an upper or lower bound) of endmembers to
chose from each. In this paper, we propose a multiple-dictionary constrained
low-rank matrix approximation model that address these two problems. We propose
an algorithm to compute this model, dubbed M2PALS, and its performance is
discussed on both synthetic and real hyperspectral images
Correntropy Maximization via ADMM - Application to Robust Hyperspectral Unmixing
In hyperspectral images, some spectral bands suffer from low signal-to-noise
ratio due to noisy acquisition and atmospheric effects, thus requiring robust
techniques for the unmixing problem. This paper presents a robust supervised
spectral unmixing approach for hyperspectral images. The robustness is achieved
by writing the unmixing problem as the maximization of the correntropy
criterion subject to the most commonly used constraints. Two unmixing problems
are derived: the first problem considers the fully-constrained unmixing, with
both the non-negativity and sum-to-one constraints, while the second one deals
with the non-negativity and the sparsity-promoting of the abundances. The
corresponding optimization problems are solved efficiently using an alternating
direction method of multipliers (ADMM) approach. Experiments on synthetic and
real hyperspectral images validate the performance of the proposed algorithms
for different scenarios, demonstrating that the correntropy-based unmixing is
robust to outlier bands.Comment: 23 page
High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso
The goal of supervised feature selection is to find a subset of input
features that are responsible for predicting output values. The least absolute
shrinkage and selection operator (Lasso) allows computationally efficient
feature selection based on linear dependency between input features and output
values. In this paper, we consider a feature-wise kernelized Lasso for
capturing non-linear input-output dependency. We first show that, with
particular choices of kernel functions, non-redundant features with strong
statistical dependence on output values can be found in terms of kernel-based
independence measures. We then show that the globally optimal solution can be
efficiently computed; this makes the approach scalable to high-dimensional
problems. The effectiveness of the proposed method is demonstrated through
feature selection experiments with thousands of features.Comment: 18 page
- …