Search CORE

25,001 research outputs found

Minimalistic Unsupervised Learning with the Sparse Manifold Transform

Author: Chen Yubei
LeCun Yann
Ma Yi
Olshausen Bruno
Yun Zeyu
Publication venue
Publication date: 30/09/2022
Field of study

We describe a minimalistic and interpretable method for unsupervised learning, without resorting to data augmentation, hyperparameter tuning, or other engineering designs, that achieves performance close to the SOTA SSL methods. Our approach leverages the sparse manifold transform, which unifies sparse coding, manifold learning, and slow feature analysis. With a one-layer deterministic sparse manifold transform, one can achieve 99.3% KNN top-1 accuracy on MNIST, 81.1% KNN top-1 accuracy on CIFAR-10 and 53.2% on CIFAR-100. With a simple gray-scale augmentation, the model gets 83.2% KNN top-1 accuracy on CIFAR-10 and 57% on CIFAR-100. These results significantly close the gap between simplistic ``white-box'' methods and the SOTA methods. Additionally, we provide visualization to explain how an unsupervised representation transform is formed. The proposed method is closely connected to latent-embedding self-supervised methods and can be treated as the simplest form of VICReg. Though there remains a small performance gap between our simple constructive model and SOTA methods, the evidence points to this as a promising direction for achieving a principled and white-box approach to unsupervised learning

arXiv.org e-Print Archive

Toward a unified theory of sparse dimensionality reduction in Euclidean space

Author: Avron H.
Bühlmann P.
Candès E.
Hegde C.
Lu Y.
Paul S.
Talagrand M.
Woodruff D. P.
Publication venue
Publication date: 01/01/2015
Field of study

Let

\Phi\in\mathbb{R}^{m\times n}

be a sparse Johnson-Lindenstrauss transform [KN14] with

s

non-zeroes per column. For a subset

T

of the unit sphere,

\varepsilon\in(0,1/2)

given, we study settings for

m,s

required to ensure

\mathop{\mathbb{E}}_\Phi \sup_{x\in T} \left|\|\Phi x\|_2^2 - 1 \right| < \varepsilon ,

i.e. so that

\Phi

preserves the norm of every

x\in T

simultaneously and multiplicatively up to

1+\varepsilon

. We introduce a new complexity parameter, which depends on the geometry of

T

, and show that it suffices to choose

s

and

m

such that this parameter is small. Our result is a sparse analog of Gordon's theorem, which was concerned with a dense

\Phi

having i.i.d. Gaussian entries. We qualitatively unify several results related to the Johnson-Lindenstrauss lemma, subspace embeddings, and Fourier-based restricted isometries. Our work also implies new results in using the sparse Johnson-Lindenstrauss transform in numerical linear algebra, classical and model-based compressed sensing, manifold learning, and constrained least squares problems such as the Lasso

arXiv.org e-Print Archive

CiteSeerX

Crossref

Publikationsserver der RWTH Aachen University

Utrecht University Repository

Sampling in the Analysis Transform Domain

Author: Giryes Raja
Publication venue
Publication date: 23/03/2015
Field of study

Many signal and image processing applications have benefited remarkably from the fact that the underlying signals reside in a low dimensional subspace. One of the main models for such a low dimensionality is the sparsity one. Within this framework there are two main options for the sparse modeling: the synthesis and the analysis ones, where the first is considered the standard paradigm for which much more research has been dedicated. In it the signals are assumed to have a sparse representation under a given dictionary. On the other hand, in the analysis approach the sparsity is measured in the coefficients of the signal after applying a certain transformation, the analysis dictionary, on it. Though several algorithms with some theory have been developed for this framework, they are outnumbered by the ones proposed for the synthesis methodology. Given that the analysis dictionary is either a frame or the two dimensional finite difference operator, we propose a new sampling scheme for signals from the analysis model that allows to recover them from their samples using any existing algorithm from the synthesis model. The advantage of this new sampling strategy is that it makes the existing synthesis methods with their theory also available for signals from the analysis framework.Comment: 13 Pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Simultaneous Codeword Optimization (SimCO) for Dictionary Update and Learning

Author: Member Ieee
Senior Member Ieee
Student Member Ieee
Tao Xu
Wei Dai
Wenwu Wang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

We consider the data-driven dictionary learning problem. The goal is to seek an over-complete dictionary from which every training signal can be best approximated by a linear combination of only a few codewords. This task is often achieved by iteratively executing two operations: sparse coding and dictionary update. In the literature, there are two benchmark mechanisms to update a dictionary. The first approach, such as the MOD algorithm, is characterized by searching for the optimal codewords while fixing the sparse coefficients. In the second approach, represented by the K-SVD method, one codeword and the related sparse coefficients are simultaneously updated while all other codewords and coefficients remain unchanged. We propose a novel framework that generalizes the aforementioned two methods. The unique feature of our approach is that one can update an arbitrary set of codewords and the corresponding sparse coefficients simultaneously: when sparse coefficients are fixed, the underlying optimization problem is similar to that in the MOD algorithm; when only one codeword is selected for update, it can be proved that the proposed algorithm is equivalent to the K-SVD method; and more importantly, our method allows us to update all codewords and all sparse coefficients simultaneously, hence the term simultaneous codeword optimization (SimCO). Under the proposed framework, we design two algorithms, namely, primitive and regularized SimCO. We implement these two algorithms based on a simple gradient descent mechanism. Simulations are provided to demonstrate the performance of the proposed algorithms, as compared with two baseline algorithms MOD and K-SVD. Results show that regularized SimCO is particularly appealing in terms of both learning performance and running speed.Comment: 13 page

arXiv.org e-Print Archive

CiteSeerX

Surrey Research Insight