Search CORE

7 research outputs found

EMP-SSL: Towards Self-Supervised Learning in One Training Epoch

Author: Chen Yubei
Lecun Yann
Ma Yi
Tong Shengbang
Publication venue
Publication date: 08/04/2023
Field of study

Recently, self-supervised learning (SSL) has achieved tremendous success in learning image representation. Despite the empirical success, most self-supervised learning methods are rather "inefficient" learners, typically taking hundreds of training epochs to fully converge. In this work, we show that the key towards efficient self-supervised learning is to increase the number of crops from each image instance. Leveraging one of the state-of-the-art SSL method, we introduce a simplistic form of self-supervised learning method called Extreme-Multi-Patch Self-Supervised-Learning (EMP-SSL) that does not rely on many heuristic techniques for SSL such as weight sharing between the branches, feature-wise normalization, output quantization, and stop gradient, etc, and reduces the training epochs by two orders of magnitude. We show that the proposed method is able to converge to 85.1% on CIFAR-10, 58.5% on CIFAR-100, 38.1% on Tiny ImageNet and 58.5% on ImageNet-100 in just one epoch. Furthermore, the proposed method achieves 91.5% on CIFAR-10, 70.1% on CIFAR-100, 51.5% on Tiny ImageNet and 78.9% on ImageNet-100 with linear probing in less than ten training epochs. In addition, we show that EMP-SSL shows significantly better transferability to out-of-domain datasets compared to baseline SSL methods. We will release the code in https://github.com/tsb0601/EMP-SSL

arXiv.org e-Print Archive

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Author: Chu Tianzhe
Dai Xili
Ding Tianjiao
Haeffele Benjamin David
Ma Yi
Tong Shengbang
Vidal Rene
Publication venue
Publication date: 08/06/2023
Field of study

The advent of large pre-trained models has brought about a paradigm shift in both visual representation learning and natural language processing. However, clustering unlabeled images, as a fundamental and classic machine learning problem, still lacks effective solution, particularly for large-scale datasets. In this paper, we propose a novel image clustering pipeline that leverages the powerful feature representation of large pre-trained models such as CLIP and cluster images effectively and efficiently at scale. We show that the pre-trained features are significantly more structured by further optimizing the rate reduction objective. The resulting features may significantly improve the clustering accuracy, e.g., from 57\% to 66\% on ImageNet-1k. Furthermore, by leveraging CLIP's image-text binding, we show how the new clustering method leads to a simple yet effective self-labeling algorithm that successfully works on unlabeled large datasets such as MS-COCO and LAION-Aesthetics. We will release the code in https://github.com/LeslieTrue/CPP.Comment: 21 pages, 13 figure

arXiv.org e-Print Archive

Unsupervised Manifold Linearizing and Clustering

Author: Chan Kwan Ho Ryan
Dai Xili
Ding Tianjiao
Haeffele Benjamin D.
Ma Yi
Tong Shengbang
Publication venue
Publication date: 24/08/2023
Field of study

We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunately, many real-world datasets such as natural images can not be well approximated by linear subspaces. On the other hand, numerous works have attempted to learn an appropriate transformation of the data, such that data is mapped from a union of general non-linear manifolds to a union of linear subspaces (with points from the same manifold being mapped to the same subspace). However, many existing works have limitations such as assuming knowledge of the membership of samples to clusters, requiring high sampling density, or being shown theoretically to learn trivial representations. In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. We give a parameterization of such a representation and membership, allowing efficient mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods, and further learns a latent linear representation of the data

arXiv.org e-Print Archive

White-Box Transformers via Sparse Rate Reduction

Author: Buchanan Sam
Chu Tianzhe
Haeffele Benjamin D.
Ma Yi
Pai Druv
Tong Shengbang
Wu Ziyang
Yu Yaodong
Publication venue
Publication date: 01/06/2023
Field of study

In this paper, we contend that the objective of representation learning is to compress and transform the distribution of the data, say sets of tokens, towards a mixture of low-dimensional Gaussian distributions supported on incoherent subspaces. The quality of the final representation can be measured by a unified objective function called sparse rate reduction. From this perspective, popular deep networks such as transformers can be naturally viewed as realizing iterative schemes to optimize this objective incrementally. Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens. This leads to a family of white-box transformer-like deep network architectures which are mathematically fully interpretable. Despite their simplicity, experiments show that these networks indeed learn to optimize the designed objective: they compress and sparsify representations of large-scale real-world vision datasets such as ImageNet, and achieve performance very close to thoroughly engineered transformers such as ViT. Code is at \url{https://github.com/Ma-Lab-Berkeley/CRATE}.Comment: 33 pages, 11 figure

arXiv.org e-Print Archive

Recommended from our members

CTRL: Closed-Loop Transcription to an LDR via Minimaxing Rate Reduction

Author: Chan Kwan Ho Ryan
Dai Xili
Li Mingyang
Ma Yi
Psenka Michael
Shum Heung-Yeung
Tong Shengbang
Wu Ziyang
Yu Yaodong
Yuan Xiaojun
Zhai Pengyuan
Publication venue: eScholarship, University of California
Publication date: 01/01/2022
Field of study

This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a Closed-loop Transcriptionbetween a multi-class, multi-dimensional data distribution and a Linear discriminative representation (CTRL) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as a two-player minimax game between the encoder and decoderfor the learned representation. A natural utility function for this game is the so-called rate reduction, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing of approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a both discriminative and generative representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and arguably better than existing methods based on GAN, VAE, or a combination of both. Unlike existing generative models, the so-learned features of the multiple classes are structured instead of hidden: different classes are explicitly mapped onto corresponding independent principal subspaces in the feature space, and diverse visual attributes within each class are modeled by the independent principal components within each subspace

eScholarship - University of California

CTRL: Closed-Loop Transcription to an LDR via Minimaxing Rate Reduction

Author: Heung-Yeung Shum
Kwan Ho Ryan Chan
Michael Psenka
Mingyang Li
Pengyuan Zhai
Shengbang Tong
Xiaojun Yuan
Xili Dai
Yaodong Yu
Yi Ma
Ziyang Wu
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Multidisciplinary Digital Publishing Institute

arXiv.org e-Print Archive

Ezid

Directory of Open Access Journals

eScholarship - University of California

Impact of the Three Gorges project on ecological environment changes and snail distribution in Dongting Lake area

Author: A Teklehaimanot
CN Lange
DJ Gray
DP McManus
DP McManus
EY Seto
Feiyue Li
Guanghui Ren
H Dang
H Lin
H Yin
HM Zhu
Hongzhuan Tan
J Utzinger
J Zheng
JY Wu
Kaiping Cai
L Yiping
L Zhenglong
LD Wang
P Steinmann
QB Tong
RC Spear
S Hayashi
S Liang
S Liang
Shujuan Ma
Uwem Friday Ekpo
W Tianping
W Yuerong
XN Zhou
XN Zhou
Xunya Hou
Y Yamashiki
YB Zhou
YB Zhou
Yiyi Li
YS Li
Z Shengbang
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref