22 research outputs found
Deep Spatiotemporal Clustering: A Temporal Clustering Approach for Multi-dimensional Climate Data
Clustering high-dimensional spatiotemporal data using an unsupervised
approach is a challenging problem for many data-driven applications. Existing
state-of-the-art methods for unsupervised clustering use different similarity
and distance functions but focus on either spatial or temporal features of the
data. Concentrating on joint deep representation learning of spatial and
temporal features, we propose Deep Spatiotemporal Clustering (DSC), a novel
algorithm for the temporal clustering of high-dimensional spatiotemporal data
using an unsupervised deep learning method. Inspired by the U-net architecture,
DSC utilizes an autoencoder integrating CNN-RNN layers to learn latent
representations of the spatiotemporal data. DSC also includes a unique layer
for cluster assignment on latent representations that uses the Student's
t-distribution. By optimizing the clustering loss and data reconstruction loss
simultaneously, the algorithm gradually improves clustering assignments and the
nonlinear mapping between low-dimensional latent feature space and
high-dimensional original data space. A multivariate spatiotemporal climate
dataset is used to evaluate the efficacy of the proposed method. Our extensive
experiments show our approach outperforms both conventional and deep
learning-based unsupervised clustering algorithms. Additionally, we compared
the proposed model with its various variants (CNN encoder, CNN autoencoder,
CNN-RNN encoder, CNN-RNN autoencoder, etc.) to get insight into using both the
CNN and RNN layers in the autoencoder, and our proposed technique outperforms
these variants in terms of clustering results.Comment: 16 pages, 2 figure
Improved Spectral Clustering via Embedded Label Propagation
Spectral clustering is a key research topic in the field of machine learning and data mining. Most of the existing spectral clustering algorithms are built upon Gaussian Laplacian matrices, which are sensitive to parameters. We propose a novel parameter free, distance consistent Locally Linear Embedding. The proposed distance consistent LLE promises that edges between closer data points have greater weight.Furthermore, we propose a novel improved spectral clustering via embedded label propagation. Our algorithm is built upon two advancements of the state of the art:1) label propagation,which propagates a node\'s labels to neighboring nodes according to their proximity; and 2) manifold learning, which has been widely used in its capacity to leverage the manifold structure of data points. First we perform standard spectral clustering on original data and assign each cluster to k nearest data points. Next, we propagate labels through dense, unlabeled data regions. Extensive experiments with various datasets validate the superiority of the proposed algorithm compared to current state of the art spectral algorithms
Balanced k-Means and Min-Cut Clustering
Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their simplicity and efficacy. The classical k-means algorithm partitions a number of data points into several subsets by iteratively updating the clustering centers and the associated data points. By contrast, a weighted undirected graph is constructed in min-cut algorithms which partition the vertices of the graph into two sets. However, existing clustering algorithms tend to cluster minority of data points into a subset, which shall be avoided when the target dataset is balanced. To achieve more accurate clustering for balanced dataset, we propose to leverage exclusive lasso on k-means and min-cut to regulate the balance degree of the clustering results. By optimizing our objective functions that build atop the exclusive lasso, we can make the clustering result as much balanced as possible. Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms
Variational Clustering: Leveraging Variational Autoencoders for Image Clustering
Recent advances in deep learning have shown their ability to learn strong
feature representations for images. The task of image clustering naturally
requires good feature representations to capture the distribution of the data
and subsequently differentiate data points from one another. Often these two
aspects are dealt with independently and thus traditional feature learning
alone does not suffice in partitioning the data meaningfully. Variational
Autoencoders (VAEs) naturally lend themselves to learning data distributions in
a latent space. Since we wish to efficiently discriminate between different
clusters in the data, we propose a method based on VAEs where we use a Gaussian
Mixture prior to help cluster the images accurately. We jointly learn the
parameters of both the prior and the posterior distributions. Our method
represents a true Gaussian Mixture VAE. This way, our method simultaneously
learns a prior that captures the latent distribution of the images and a
posterior to help discriminate well between data points. We also propose a
novel reparametrization of the latent space consisting of a mixture of discrete
and continuous variables. One key takeaway is that our method generalizes
better across different datasets without using any pre-training or learnt
models, unlike existing methods, allowing it to be trained from scratch in an
end-to-end manner. We verify our efficacy and generalizability experimentally
by achieving state-of-the-art results among unsupervised methods on a variety
of datasets. To the best of our knowledge, we are the first to pursue image
clustering using VAEs in a purely unsupervised manner on real image datasets