169,337 research outputs found

    Sparse Sequential Dirichlet Coding

    Full text link
    This short paper describes a simple coding technique, Sparse Sequential Dirichlet Coding, for multi-alphabet memoryless sources. It is appropriate in situations where only a small, unknown subset of the possible alphabet symbols can be expected to occur in any particular data sequence. We provide a competitive analysis which shows that the performance of Sparse Sequential Dirichlet Coding will be close to that of a Sequential Dirichlet Coder that knows in advance the exact subset of occurring alphabet symbols. Empirically we show that our technique can perform similarly to the more computationally demanding Sequential Sub-Alphabet Estimator, while using less computational resources.Comment: 7 page

    Sparse Coding on Stereo Video for Object Detection

    Get PDF
    Deep Convolutional Neural Networks (DCNN) require millions of labeled training examples for image classification and object detection tasks, which restrict these models to domains where such datasets are available. In this paper, we explore the use of unsupervised sparse coding applied to stereo-video data to help alleviate the need for large amounts of labeled data. We show that replacing a typical supervised convolutional layer with an unsupervised sparse-coding layer within a DCNN allows for better performance on a car detection task when only a limited number of labeled training examples is available. Furthermore, the network that incorporates sparse coding allows for more consistent performance over varying initializations and ordering of training examples when compared to a fully supervised DCNN. Finally, we compare activations between the unsupervised sparse-coding layer and the supervised convolutional layer, and show that the sparse representation exhibits an encoding that is depth selective, whereas encodings from the convolutional layer do not exhibit such selectivity. These result indicates promise for using unsupervised sparse-coding approaches in real-world computer vision tasks in domains with limited labeled training data

    On the Sample Complexity of Predictive Sparse Coding

    Full text link
    The goal of predictive sparse coding is to learn a representation of examples as sparse linear combinations of elements from a dictionary, such that a learned hypothesis linear in the new representation performs well on a predictive task. Predictive sparse coding algorithms recently have demonstrated impressive performance on a variety of supervised tasks, but their generalization properties have not been studied. We establish the first generalization error bounds for predictive sparse coding, covering two settings: 1) the overcomplete setting, where the number of features k exceeds the original dimensionality d; and 2) the high or infinite-dimensional setting, where only dimension-free bounds are useful. Both learning bounds intimately depend on stability properties of the learned sparse encoder, as measured on the training sample. Consequently, we first present a fundamental stability result for the LASSO, a result characterizing the stability of the sparse codes with respect to perturbations to the dictionary. In the overcomplete setting, we present an estimation error bound that decays as \tilde{O}(sqrt(d k/m)) with respect to d and k. In the high or infinite-dimensional setting, we show a dimension-free bound that is \tilde{O}(sqrt(k^2 s / m)) with respect to k and s, where s is an upper bound on the number of non-zeros in the sparse code for any training data point.Comment: Sparse Coding Stability Theorem from version 1 has been relaxed considerably using a new notion of coding margin. Old Sparse Coding Stability Theorem still in new version, now as Theorem 2. Presentation of all proofs simplified/improved considerably. Paper reorganized. Empirical analysis showing new coding margin is non-trivial on real dataset

    Sparse Coding and Autoencoders

    Full text link
    In "Dictionary Learning" one tries to recover incoherent matrices ARn×hA^* \in \mathbb{R}^{n \times h} (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors xRhx^* \in \mathbb{R}^h with a small support of size hph^p for some 0<p<10 <p < 1 while having access to observations yRny \in \mathbb{R}^n where y=Axy = A^*x^*. In this work we undertake a rigorous analysis of whether gradient descent on the squared loss of an autoencoder can solve the dictionary learning problem. The "Autoencoder" architecture we consider is a RnRn\mathbb{R}^n \rightarrow \mathbb{R}^n mapping with a single ReLU activation layer of size hh. Under very mild distributional assumptions on xx^*, we prove that the norm of the expected gradient of the standard squared loss function is asymptotically (in sparse code dimension) negligible for all points in a small neighborhood of AA^*. This is supported with experimental evidence using synthetic data. We also conduct experiments to suggest that AA^* is a local minimum. Along the way we prove that a layer of ReLU gates can be set up to automatically recover the support of the sparse codes. This property holds independent of the loss function. We believe that it could be of independent interest.Comment: In this new version of the paper with a small change in the distributional assumptions we are actually able to prove the asymptotic criticality of a neighbourhood of the ground truth dictionary for even just the standard squared loss of the ReLU autoencoder (unlike the regularized loss in the older version
    corecore