1,095 research outputs found
Sketching for Large-Scale Learning of Mixture Models
Learning parameters from voluminous data can be prohibitive in terms of
memory and computational requirements. We propose a "compressive learning"
framework where we estimate model parameters from a sketch of the training
data. This sketch is a collection of generalized moments of the underlying
probability distribution of the data. It can be computed in a single pass on
the training set, and is easily computable on streams or distributed datasets.
The proposed framework shares similarities with compressive sensing, which aims
at drastically reducing the dimension of high-dimensional signals while
preserving the ability to reconstruct them. To perform the estimation task, we
derive an iterative algorithm analogous to sparse reconstruction algorithms in
the context of linear inverse problems. We exemplify our framework with the
compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics
on the choice of the sketching procedure and theoretical guarantees of
reconstruction. We experimentally show on synthetic data that the proposed
algorithm yields results comparable to the classical Expectation-Maximization
(EM) technique while requiring significantly less memory and fewer computations
when the number of database elements is large. We further demonstrate the
potential of the approach on real large-scale data (over 10 8 training samples)
for the task of model-based speaker verification. Finally, we draw some
connections between the proposed framework and approximate Hilbert space
embedding of probability distributions using random features. We show that the
proposed sketching operator can be seen as an innovative method to design
translation-invariant kernels adapted to the analysis of GMMs. We also use this
theoretical framework to derive information preservation guarantees, in the
spirit of infinite-dimensional compressive sensing
Asymmetric compressive learning guarantees with applications to quantized sketches
The compressive learning framework reduces the computational cost of training
on large-scale datasets. In a sketching phase, the data is first compressed to
a lightweight sketch vector, obtained by mapping the data samples through a
well-chosen feature map, and averaging those contributions. In a learning
phase, the desired model parameters are then extracted from this sketch by
solving an optimization problem, which also involves a feature map. When the
feature map is identical during the sketching and learning phases, formal
statistical guarantees (excess risk bounds) have been proven.
However, the desirable properties of the feature map are different during
sketching and learning (e.g. quantized outputs, and differentiability,
respectively). We thus study the relaxation where this map is allowed to be
different for each phase. First, we prove that the existing guarantees carry
over to this asymmetric scheme, up to a controlled error term, provided some
Limited Projected Distortion (LPD) property holds. We then instantiate this
framework to the setting of quantized sketches, by proving that the LPD indeed
holds for binary sketch contributions. Finally, we further validate the
approach with numerical simulations, including a large-scale application in
audio event classification
Preconditioned Data Sparsification for Big Data with Applications to PCA and K-means
We analyze a compression scheme for large data sets that randomly keeps a
small percentage of the components of each data sample. The benefit is that the
output is a sparse matrix and therefore subsequent processing, such as PCA or
K-means, is significantly faster, especially in a distributed-data setting.
Furthermore, the sampling is single-pass and applicable to streaming data. The
sampling mechanism is a variant of previous methods proposed in the literature
combined with a randomized preconditioning to smooth the data. We provide
guarantees for PCA in terms of the covariance matrix, and guarantees for
K-means in terms of the error in the center estimators at a given step. We
present numerical evidence to show both that our bounds are nearly tight and
that our algorithms provide a real benefit when applied to standard test data
sets, as well as providing certain benefits over related sampling approaches.Comment: 28 pages, 10 figure
Projected gradient descent for non-convex sparse spike estimation
We propose a new algorithm for sparse spike estimation from Fourier
measurements. Based on theoretical results on non-convex optimization
techniques for off-the-grid sparse spike estimation, we present a projected
gradient descent algorithm coupled with a spectral initialization procedure.
Our algorithm permits to estimate the positions of large numbers of Diracs in
2d from random Fourier measurements. We present, along with the algorithm,
theoretical qualitative insights explaining the success of our algorithm. This
opens a new direction for practical off-the-grid spike estimation with
theoretical guarantees in imaging applications
Convexity in source separation: Models, geometry, and algorithms
Source separation or demixing is the process of extracting multiple
components entangled within a signal. Contemporary signal processing presents a
host of difficult source separation problems, from interference cancellation to
background subtraction, blind deconvolution, and even dictionary learning.
Despite the recent progress in each of these applications, advances in
high-throughput sensor technology place demixing algorithms under pressure to
accommodate extremely high-dimensional signals, separate an ever larger number
of sources, and cope with more sophisticated signal and mixing models. These
difficulties are exacerbated by the need for real-time action in automated
decision-making systems.
Recent advances in convex optimization provide a simple framework for
efficiently solving numerous difficult demixing problems. This article provides
an overview of the emerging field, explains the theory that governs the
underlying procedures, and surveys algorithms that solve them efficiently. We
aim to equip practitioners with a toolkit for constructing their own demixing
algorithms that work, as well as concrete intuition for why they work
Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery
PCA is one of the most widely used dimension reduction techniques. A related
easier problem is "subspace learning" or "subspace estimation". Given
relatively clean data, both are easily solved via singular value decomposition
(SVD). The problem of subspace learning or PCA in the presence of outliers is
called robust subspace learning or robust PCA (RPCA). For long data sequences,
if one tries to use a single lower dimensional subspace to represent the data,
the required subspace dimension may end up being quite large. For such data, a
better model is to assume that it lies in a low-dimensional subspace that can
change over time, albeit gradually. The problem of tracking such data (and the
subspaces) while being robust to outliers is called robust subspace tracking
(RST). This article provides a magazine-style overview of the entire field of
robust subspace learning and tracking. In particular solutions for three
problems are discussed in detail: RPCA via sparse+low-rank matrix decomposition
(S+LR), RST via S+LR, and "robust subspace recovery (RSR)". RSR assumes that an
entire data vector is either an outlier or an inlier. The S+LR formulation
instead assumes that outliers occur on only a few data vector indices and hence
are well modeled as sparse corruptions.Comment: To appear, IEEE Signal Processing Magazine, July 201
- …