85 research outputs found
Information Maximization Clustering via Multi-View Self-Labelling
Image clustering is a particularly challenging computer vision task, which
aims to generate annotations without human supervision. Recent advances focus
on the use of self-supervised learning strategies in image clustering, by first
learning valuable semantics and then clustering the image representations.
These multiple-phase algorithms, however, increase the computational time and
their final performance is reliant on the first stage. By extending the
self-supervised approach, we propose a novel single-phase clustering method
that simultaneously learns meaningful representations and assigns the
corresponding annotations. This is achieved by integrating a discrete
representation into the self-supervised paradigm through a classifier net.
Specifically, the proposed clustering objective employs mutual information, and
maximizes the dependency between the integrated discrete representation and a
discrete probability distribution. The discrete probability distribution is
derived though the self-supervised process by comparing the learnt latent
representation with a set of trainable prototypes. To enhance the learning
performance of the classifier, we jointly apply the mutual information across
multi-crop views. Our empirical results show that the proposed framework
outperforms state-of-the-art techniques with the average accuracy of 89.1% and
49.0%, respectively, on CIFAR-10 and CIFAR-100/20 datasets. Finally, the
proposed method also demonstrates attractive robustness to parameter settings,
making it ready to be applicable to other datasets
Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models
The advent of large pre-trained models has brought about a paradigm shift in
both visual representation learning and natural language processing. However,
clustering unlabeled images, as a fundamental and classic machine learning
problem, still lacks effective solution, particularly for large-scale datasets.
In this paper, we propose a novel image clustering pipeline that leverages the
powerful feature representation of large pre-trained models such as CLIP and
cluster images effectively and efficiently at scale. We show that the
pre-trained features are significantly more structured by further optimizing
the rate reduction objective. The resulting features may significantly improve
the clustering accuracy, e.g., from 57\% to 66\% on ImageNet-1k. Furthermore,
by leveraging CLIP's image-text binding, we show how the new clustering method
leads to a simple yet effective self-labeling algorithm that successfully works
on unlabeled large datasets such as MS-COCO and LAION-Aesthetics. We will
release the code in https://github.com/LeslieTrue/CPP.Comment: 21 pages, 13 figure
Self-supervised object detection from audio-visual correspondence
We tackle the problem of learning object detectors without supervision.
Differently from weakly-supervised object detection, we do not assume
image-level class labels. Instead, we extract a supervisory signal from
audio-visual data, using the audio component to "teach" the object detector.
While this problem is related to sound source localisation, it is considerably
harder because the detector must classify the objects by type, enumerate each
instance of the object, and do so even when the object is silent. We tackle
this problem by first designing a self-supervised framework with a contrastive
objective that jointly learns to classify and localise objects. Then, without
using any supervision, we simply use these self-supervised labels and boxes to
train an image-based object detector. With this, we outperform previous
unsupervised and weakly-supervised detectors for the task of object detection
and sound source localization. We also show that we can align this detector to
ground-truth classes with as little as one label per pseudo-class, and show how
our method can learn to detect generic objects that go beyond instruments, such
as airplanes and cats.Comment: Under revie
A Saliency-based Clustering Framework for Identifying Aberrant Predictions
In machine learning, classification tasks serve as the cornerstone of a wide
range of real-world applications. Reliable, trustworthy classification is
particularly intricate in biomedical settings, where the ground truth is often
inherently uncertain and relies on high degrees of human expertise for
labeling. Traditional metrics such as precision and recall, while valuable, are
insufficient for capturing the nuances of these ambiguous scenarios. Here we
introduce the concept of aberrant predictions, emphasizing that the nature of
classification errors is as critical as their frequency. We propose a novel,
efficient training methodology aimed at both reducing the misclassification
rate and discerning aberrant predictions. Our framework demonstrates a
substantial improvement in model performance, achieving a 20\% increase in
precision. We apply this methodology to the less-explored domain of veterinary
radiology, where the stakes are high but have not been as extensively studied
compared to human medicine. By focusing on the identification and mitigation of
aberrant predictions, we enhance the utility and trustworthiness of machine
learning classifiers in high-stakes, real-world scenarios, including new
applications in the veterinary world
ContraCluster: Learning to Classify without Labels by Contrastive Self-Supervision and Prototype-Based Semi-Supervision
The recent advances in representation learning inspire us to take on the
challenging problem of unsupervised image classification tasks in a principled
way. We propose ContraCluster, an unsupervised image classification method that
combines clustering with the power of contrastive self-supervised learning.
ContraCluster consists of three stages: (1) contrastive self-supervised
pre-training (CPT), (2) contrastive prototype sampling (CPS), and (3)
prototype-based semi-supervised fine-tuning (PB-SFT). CPS can select highly
accurate, categorically prototypical images in an embedding space learned by
contrastive learning. We use sampled prototypes as noisy labeled data to
perform semi-supervised fine-tuning (PB-SFT), leveraging small prototypes and
large unlabeled data to further enhance the accuracy. We demonstrate
empirically that ContraCluster achieves new state-of-the-art results for
standard benchmark datasets including CIFAR-10, STL-10, and ImageNet-10. For
example, ContraCluster achieves about 90.8% accuracy for CIFAR-10, which
outperforms DAC (52.2%), IIC (61.7%), and SCAN (87.6%) by a large margin.
Without any labels, ContraCluster can achieve a 90.8% accuracy that is
comparable to 95.8% by the best supervised counterpart.Comment: Accepted at ICPR 202
Shuffle & Divide: Contrastive Learning for Long Text
We propose a self-supervised learning method for long text documents based on
contrastive learning. A key to our method is Shuffle and Divide (SaD), a simple
text augmentation algorithm that sets up a pretext task required for
contrastive updates to BERT-based document embedding. SaD splits a document
into two sub-documents containing randomly shuffled words in the entire
documents. The sub-documents are considered positive examples, leaving all
other documents in the corpus as negatives. After SaD, we repeat the
contrastive update and clustering phases until convergence. It is naturally a
time-consuming, cumbersome task to label text documents, and our method can
help alleviate human efforts, which are most expensive resources in AI. We have
empirically evaluated our method by performing unsupervised text classification
on the 20 Newsgroups, Reuters-21578, BBC, and BBCSport datasets. In particular,
our method pushes the current state-of-the-art, SS-SB-MT, on 20 Newsgroups by
20.94% in accuracy. We also achieve the state-of-the-art performance on
Reuters-21578 and exceptionally-high accuracy performances (over 95%) for
unsupervised classification on the BBC and BBCSport datasets.Comment: Accepted at ICPR 202
Self-Supervised Classification Network
We present Self-Classifier -- a novel self-supervised end-to-end
classification learning approach. Self-Classifier learns labels and
representations simultaneously in a single-stage end-to-end manner by
optimizing for same-class prediction of two augmented views of the same sample.
To guarantee non-degenerate solutions (i.e., solutions where all labels are
assigned to the same class) we propose a mathematically motivated variant of
the cross-entropy loss that has a uniform prior asserted on the predicted
labels. In our theoretical analysis we prove that degenerate solutions are not
in the set of optimal solutions of our approach. Self-Classifier is simple to
implement and scalable. Unlike other popular unsupervised classification and
contrastive representation learning approaches, it does not require any form of
pre-training, expectation maximization, pseudo-labelling, external clustering,
a second network, stop-gradient operation or negative pairs. Despite its
simplicity, our approach sets a new state of the art for unsupervised
classification of ImageNet; and even achieves comparable to state-of-the-art
results for unsupervised representation learning. Code:
https://github.com/elad-amrani/self-classifierComment: Update method and add experiment
Self-supervised adversarial masking for 3D point cloud representation learning
Self-supervised methods have been proven effective for learning deep
representations of 3D point cloud data. Although recent methods in this domain
often rely on random masking of inputs, the results of this approach can be
improved. We introduce PointCAM, a novel adversarial method for learning a
masking function for point clouds. Our model utilizes a self-distillation
framework with an online tokenizer for 3D point clouds. Compared to previous
techniques that optimize patch-level and object-level objectives, we postulate
applying an auxiliary network that learns how to select masks instead of
choosing them randomly. Our results show that the learned masking function
achieves state-of-the-art or competitive performance on various downstream
tasks. The source code is available at https://github.com/szacho/pointcam
- …