169,337 research outputs found
Sparse Sequential Dirichlet Coding
This short paper describes a simple coding technique, Sparse Sequential
Dirichlet Coding, for multi-alphabet memoryless sources. It is appropriate in
situations where only a small, unknown subset of the possible alphabet symbols
can be expected to occur in any particular data sequence. We provide a
competitive analysis which shows that the performance of Sparse Sequential
Dirichlet Coding will be close to that of a Sequential Dirichlet Coder that
knows in advance the exact subset of occurring alphabet symbols. Empirically we
show that our technique can perform similarly to the more computationally
demanding Sequential Sub-Alphabet Estimator, while using less computational
resources.Comment: 7 page
Sparse Coding on Stereo Video for Object Detection
Deep Convolutional Neural Networks (DCNN) require millions of labeled
training examples for image classification and object detection tasks, which
restrict these models to domains where such datasets are available. In this
paper, we explore the use of unsupervised sparse coding applied to stereo-video
data to help alleviate the need for large amounts of labeled data. We show that
replacing a typical supervised convolutional layer with an unsupervised
sparse-coding layer within a DCNN allows for better performance on a car
detection task when only a limited number of labeled training examples is
available. Furthermore, the network that incorporates sparse coding allows for
more consistent performance over varying initializations and ordering of
training examples when compared to a fully supervised DCNN. Finally, we compare
activations between the unsupervised sparse-coding layer and the supervised
convolutional layer, and show that the sparse representation exhibits an
encoding that is depth selective, whereas encodings from the convolutional
layer do not exhibit such selectivity. These result indicates promise for using
unsupervised sparse-coding approaches in real-world computer vision tasks in
domains with limited labeled training data
On the Sample Complexity of Predictive Sparse Coding
The goal of predictive sparse coding is to learn a representation of examples
as sparse linear combinations of elements from a dictionary, such that a
learned hypothesis linear in the new representation performs well on a
predictive task. Predictive sparse coding algorithms recently have demonstrated
impressive performance on a variety of supervised tasks, but their
generalization properties have not been studied. We establish the first
generalization error bounds for predictive sparse coding, covering two
settings: 1) the overcomplete setting, where the number of features k exceeds
the original dimensionality d; and 2) the high or infinite-dimensional setting,
where only dimension-free bounds are useful. Both learning bounds intimately
depend on stability properties of the learned sparse encoder, as measured on
the training sample. Consequently, we first present a fundamental stability
result for the LASSO, a result characterizing the stability of the sparse codes
with respect to perturbations to the dictionary. In the overcomplete setting,
we present an estimation error bound that decays as \tilde{O}(sqrt(d k/m)) with
respect to d and k. In the high or infinite-dimensional setting, we show a
dimension-free bound that is \tilde{O}(sqrt(k^2 s / m)) with respect to k and
s, where s is an upper bound on the number of non-zeros in the sparse code for
any training data point.Comment: Sparse Coding Stability Theorem from version 1 has been relaxed
considerably using a new notion of coding margin. Old Sparse Coding Stability
Theorem still in new version, now as Theorem 2. Presentation of all proofs
simplified/improved considerably. Paper reorganized. Empirical analysis
showing new coding margin is non-trivial on real dataset
Sparse Coding and Autoencoders
In "Dictionary Learning" one tries to recover incoherent matrices (typically overcomplete and whose columns are assumed
to be normalized) and sparse vectors with a small
support of size for some while having access to observations
where . In this work we undertake a rigorous
analysis of whether gradient descent on the squared loss of an autoencoder can
solve the dictionary learning problem. The "Autoencoder" architecture we
consider is a mapping with a single
ReLU activation layer of size .
Under very mild distributional assumptions on , we prove that the norm
of the expected gradient of the standard squared loss function is
asymptotically (in sparse code dimension) negligible for all points in a small
neighborhood of . This is supported with experimental evidence using
synthetic data. We also conduct experiments to suggest that is a local
minimum. Along the way we prove that a layer of ReLU gates can be set up to
automatically recover the support of the sparse codes. This property holds
independent of the loss function. We believe that it could be of independent
interest.Comment: In this new version of the paper with a small change in the
distributional assumptions we are actually able to prove the asymptotic
criticality of a neighbourhood of the ground truth dictionary for even just
the standard squared loss of the ReLU autoencoder (unlike the regularized
loss in the older version
- …
