38 research outputs found
Learning Big (Image) Data via Coresets for Dictionaries
Signal and image processing have seen an explosion of interest in the last few years in a new form of signal/image characterization via the concept of sparsity with respect to a dictionary. An active field of research is dictionary learning: the representation of a given large set of vectors (e.g. signals or images) as linear combinations of only few vectors (patterns). To further reduce the size of the representation, the combinations are usually required to be sparse, i.e., each signal is a linear combination of only a small number of patterns.
This paper suggests a new computational approach to the problem of dictionary learning, known in computational geometry as coresets. A coreset for dictionary learning is a small smart non-uniform sample from the input signals such that the quality of any given dictionary with respect to the input can be approximated via the coreset. In particular, the optimal dictionary for the input can be approximated by learning the coreset. Since the coreset is small, the learning is faster. Moreover, using merge-and-reduce, the coreset can be constructed for streaming signals that do not fit in memory and can also be computed in parallel.
We apply our coresets for dictionary learning of images using the K-SVD algorithm and bound their size and approximation error analytically. Our simulations demonstrate gain factors of up to 60 in computational time with the same, and even better, performance. We also demonstrate our ability to perform computations on larger patches and high-definition images, where the traditional approach breaks down
Training Gaussian Mixture Models at Scale via Coresets
How can we train a statistical mixture model on a massive data set? In this
work we show how to construct coresets for mixtures of Gaussians. A coreset is
a weighted subset of the data, which guarantees that models fitting the coreset
also provide a good fit for the original data set. We show that, perhaps
surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension
and the number of mixture components, while being independent of the data set
size. Hence, one can harness computationally intensive algorithms to compute a
good approximation on a significantly smaller data set. More importantly, such
coresets can be efficiently constructed both in distributed and streaming
settings and do not impose restrictions on the data generating process. Our
results rely on a novel reduction of statistical estimation to problems in
computational geometry and new combinatorial complexity results for mixtures of
Gaussians. Empirical evaluation on several real-world datasets suggests that
our coreset-based approach enables significant reduction in training-time with
negligible approximation error
Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming
Sketching algorithms have recently proven to be a powerful approach both for
designing low-space streaming algorithms as well as fast polynomial time
approximation schemes (PTAS). In this work, we develop new techniques to extend
the applicability of sketching-based approaches to the sparse dictionary
learning and the Euclidean -means clustering problems. In particular, we
initiate the study of the challenging setting where the dictionary/clustering
assignment for each of the input points must be output, which has
surprisingly received little attention in prior work. On the fast algorithms
front, we obtain a new approach for designing PTAS's for the -means
clustering problem, which generalizes to the first PTAS for the sparse
dictionary learning problem. On the streaming algorithms front, we obtain new
upper bounds and lower bounds for dictionary learning and -means clustering.
In particular, given a design matrix in a
turnstile stream, we show an space
upper bound for -sparse dictionary learning of size , an space upper bound for -means clustering, as
well as an space upper bound for -means clustering on random
order row insertion streams with a natural "bounded sensitivity" assumption. On
the lower bounds side, we obtain a general lower bound for -means clustering, as well as an
lower bound for algorithms which can estimate the
cost of a single fixed set of candidate centers.Comment: To appear in NeurIPS 202
DESIGN OF COMPACT AND DISCRIMINATIVE DICTIONARIES
The objective of this research work is to design compact and discriminative dictionaries
for e�ective classi�cation. The motivation stems from the fact that dictionaries
inherently contain redundant dictionary atoms. This is because the aim of dictionary
learning is reconstruction, not classi�cation. In this thesis, we propose methods to obtain
minimum number discriminative dictionary atoms for e�ective classi�cation and
also reduced computational time.
First, we propose a classi�cation scheme where an example is assigned to a class
based on the weight assigned to both maximum projection and minimum reconstruction
error. Here, the input data is learned by K-SVD dictionary learning which alternates
between sparse coding and dictionary update. For sparse coding, orthogonal
matching pursuit (OMP) is used and for dictionary update, singular value decomposition
is used. This way of classi�cation though e�ective, still there is a scope to
improve dictionary learning by removing redundant atoms because our goal is not reconstruction.
In order to remove such redundant atoms, we propose two approaches
based on information theory to obtain compact discriminative dictionaries. In the
�rst approach, we remove redundant atoms from the dictionary while maintaining
discriminative information. Speci�cally, we propose a constraint optimization problem
which minimizes the mutual information between optimized dictionary and initial
dictionary while maximizing mutual information between class labels and optimized
dictionary. This helps to determine information loss between before and after the
dictionary optimization. To compute information loss, we use Jensen-Shannon diver-
gence with adaptive weights to compare class distributions of each dictionary atom.
The advantage of Jensen-Shannon divergence is its computational e�ciency rather
than calculating information loss from mutual information