2,213 research outputs found
Sample Complexity Analysis for Learning Overcomplete Latent Variable Models through Tensor Methods
We provide guarantees for learning latent variable models emphasizing on the
overcomplete regime, where the dimensionality of the latent space can exceed
the observed dimensionality. In particular, we consider multiview mixtures,
spherical Gaussian mixtures, ICA, and sparse coding models. We provide tight
concentration bounds for empirical moments through novel covering arguments. We
analyze parameter recovery through a simple tensor power update algorithm. In
the semi-supervised setting, we exploit the label or prior information to get a
rough estimate of the model parameters, and then refine it using the tensor
method on unlabeled samples. We establish that learning is possible when the
number of components scales as , where is the observed
dimension, and is the order of the observed moment employed in the tensor
method. Our concentration bound analysis also leads to minimax sample
complexity for semi-supervised learning of spherical Gaussian mixtures. In the
unsupervised setting, we use a simple initialization algorithm based on SVD of
the tensor slices, and provide guarantees under the stricter condition that
(where constant can be larger than ), where the
tensor method recovers the components under a polynomial running time (and
exponential in ). Our analysis establishes that a wide range of
overcomplete latent variable models can be learned efficiently with low
computational and sample complexity through tensor decomposition methods.Comment: Title change
Sample Complexity of Dictionary Learning and other Matrix Factorizations
Many modern tools in machine learning and signal processing, such as sparse
dictionary learning, principal component analysis (PCA), non-negative matrix
factorization (NMF), -means clustering, etc., rely on the factorization of a
matrix obtained by concatenating high-dimensional vectors from a training
collection. While the idealized task would be to optimize the expected quality
of the factors over the underlying distribution of training vectors, it is
achieved in practice by minimizing an empirical average over the considered
collection. The focus of this paper is to provide sample complexity estimates
to uniformly control how much the empirical average deviates from the expected
cost function. Standard arguments imply that the performance of the empirical
predictor also exhibit such guarantees. The level of genericity of the approach
encompasses several possible constraints on the factors (tensor product
structure, shift-invariance, sparsity \ldots), thus providing a unified
perspective on the sample complexity of several widely used matrix
factorization schemes. The derived generalization bounds behave proportional to
w.r.t.\ the number of samples for the considered matrix
factorization techniques.Comment: to appea
Training Input-Output Recurrent Neural Networks through Spectral Methods
We consider the problem of training input-output recurrent neural networks
(RNN) for sequence labeling tasks. We propose a novel spectral approach for
learning the network parameters. It is based on decomposition of the
cross-moment tensor between the output and a non-linear transformation of the
input, based on score functions. We guarantee consistent learning with
polynomial sample and computational complexity under transparent conditions
such as non-degeneracy of model parameters, polynomial activations for the
neurons, and a Markovian evolution of the input sequence. We also extend our
results to Bidirectional RNN which uses both previous and future information to
output the label at each time point, and is employed in many NLP tasks such as
POS tagging
Learning Co-Sparse Analysis Operators with Separable Structures
In the co-sparse analysis model a set of filters is applied to a signal out
of the signal class of interest yielding sparse filter responses. As such, it
may serve as a prior in inverse problems, or for structural analysis of signals
that are known to belong to the signal class. The more the model is adapted to
the class, the more reliable it is for these purposes. The task of learning
such operators for a given class is therefore a crucial problem. In many
applications, it is also required that the filter responses are obtained in a
timely manner, which can be achieved by filters with a separable structure. Not
only can operators of this sort be efficiently used for computing the filter
responses, but they also have the advantage that less training samples are
required to obtain a reliable estimate of the operator. The first contribution
of this work is to give theoretical evidence for this claim by providing an
upper bound for the sample complexity of the learning process. The second is a
stochastic gradient descent (SGD) method designed to learn an analysis operator
with separable structures, which includes a novel and efficient step size
selection rule. Numerical experiments are provided that link the sample
complexity to the convergence speed of the SGD algorithm.Comment: 11 pages double column, 4 figures, 3 table
Spectral dimensionality reduction for HMMs
Hidden Markov Models (HMMs) can be accurately approximated using
co-occurrence frequencies of pairs and triples of observations by using a fast
spectral method in contrast to the usual slow methods like EM or Gibbs
sampling. We provide a new spectral method which significantly reduces the
number of model parameters that need to be estimated, and generates a sample
complexity that does not depend on the size of the observation vocabulary. We
present an elementary proof giving bounds on the relative accuracy of
probability estimates from our model. (Correlaries show our bounds can be
weakened to provide either L1 bounds or KL bounds which provide easier direct
comparisons to previous work.) Our theorem uses conditions that are checkable
from the data, instead of putting conditions on the unobservable Markov
transition matrix
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
- âŠ