5 research outputs found
AugDMC: Data Augmentation Guided Deep Multiple Clustering
Clustering aims to group similar objects together while separating dissimilar
ones apart. Thereafter, structures hidden in data can be identified to help
understand data in an unsupervised manner. Traditional clustering methods such
as k-means provide only a single clustering for one data set. Deep clustering
methods such as auto-encoder based clustering methods have shown a better
performance, but still provide a single clustering. However, a given dataset
might have multiple clustering structures and each represents a unique
perspective of the data. Therefore, some multiple clustering methods have been
developed to discover multiple independent structures hidden in data. Although
deep multiple clustering methods provide better performance, how to efficiently
capture the alternative perspectives in data is still a problem. In this paper,
we propose AugDMC, a novel data Augmentation guided Deep Multiple Clustering
method, to tackle the challenge. Specifically, AugDMC leverages data
augmentations to automatically extract features related to a certain aspect of
the data using a self-supervised prototype-based representation learning, where
different aspects of the data can be preserved under different data
augmentations. Moreover, a stable optimization strategy is proposed to
alleviate the unstable problem from different augmentations. Thereafter,
multiple clusterings based on different aspects of the data can be obtained.
Experimental results on three real-world datasets compared with
state-of-the-art methods validate the effectiveness of the proposed method
DivClust: Controlling Diversity in Deep Clustering
Clustering has been a major research topic in the field of machine learning,
one to which Deep Learning has recently been applied with significant success.
However, an aspect of clustering that is not addressed by existing deep
clustering methods, is that of efficiently producing multiple, diverse
partitionings for a given dataset. This is particularly important, as a diverse
set of base clusterings are necessary for consensus clustering, which has been
found to produce better and more robust results than relying on a single
clustering. To address this gap, we propose DivClust, a diversity controlling
loss that can be incorporated into existing deep clustering frameworks to
produce multiple clusterings with the desired degree of diversity. We conduct
experiments with multiple datasets and deep clustering frameworks and show
that: a) our method effectively controls diversity across frameworks and
datasets with very small additional computational cost, b) the sets of
clusterings learned by DivClust include solutions that significantly outperform
single-clustering baselines, and c) using an off-the-shelf consensus clustering
algorithm, DivClust produces consensus clustering solutions that consistently
outperform single-clustering baselines, effectively improving the performance
of the base deep clustering framework.Comment: Accepted for publication in CVPR 202
Deep Embedded Non-Redundant Clustering
Complex data types like images can be clustered in multiple valid ways. Non-redundant clustering aims at extracting those meaningful groupings by discouraging redundancy between clusterings. Unfortunately, clustering images in pixel space directly has been shown to work unsatisfactory. This has increased interest in combining the high representational power of deep learning with clustering, termed deep clustering. Algorithms of this type combine the non-linear embedding of an autoencoder with a clustering objective and optimize both simultaneously. None of these algorithms try to find multiple non-redundant clusterings. In this paper, we propose the novel Embedded Non-Redundant Clustering algorithm (ENRC). It is the first algorithm that combines neural-network-based representation learning with non-redundant clustering. ENRC can find multiple highly non-redundant clusterings of different dimensionalities within a data set. This is achieved by (softly) assigning each dimension of the embedded space to the different clusterings. For instance, in image data sets it can group the objects by color, material and shape, without the need for explicit feature engineering. We show the viability of ENRC in extensive experiments and empirically demonstrate the advantage of combining non-linear representation learning with non-redundant clustering