172,432 research outputs found
Deep Spatiotemporal Clustering: A Temporal Clustering Approach for Multi-dimensional Climate Data
Clustering high-dimensional spatiotemporal data using an unsupervised
approach is a challenging problem for many data-driven applications. Existing
state-of-the-art methods for unsupervised clustering use different similarity
and distance functions but focus on either spatial or temporal features of the
data. Concentrating on joint deep representation learning of spatial and
temporal features, we propose Deep Spatiotemporal Clustering (DSC), a novel
algorithm for the temporal clustering of high-dimensional spatiotemporal data
using an unsupervised deep learning method. Inspired by the U-net architecture,
DSC utilizes an autoencoder integrating CNN-RNN layers to learn latent
representations of the spatiotemporal data. DSC also includes a unique layer
for cluster assignment on latent representations that uses the Student's
t-distribution. By optimizing the clustering loss and data reconstruction loss
simultaneously, the algorithm gradually improves clustering assignments and the
nonlinear mapping between low-dimensional latent feature space and
high-dimensional original data space. A multivariate spatiotemporal climate
dataset is used to evaluate the efficacy of the proposed method. Our extensive
experiments show our approach outperforms both conventional and deep
learning-based unsupervised clustering algorithms. Additionally, we compared
the proposed model with its various variants (CNN encoder, CNN autoencoder,
CNN-RNN encoder, CNN-RNN autoencoder, etc.) to get insight into using both the
CNN and RNN layers in the autoencoder, and our proposed technique outperforms
these variants in terms of clustering results.Comment: 16 pages, 2 figure
Natural data structure extracted from neighborhood-similarity graphs
'Big' high-dimensional data are commonly analyzed in low-dimensions, after
performing a dimensionality-reduction step that inherently distorts the data
structure. For the same purpose, clustering methods are also often used. These
methods also introduce a bias, either by starting from the assumption of a
particular geometric form of the clusters, or by using iterative schemes to
enhance cluster contours, with uncontrollable consequences. The goal of data
analysis should, however, be to encode and detect structural data features at
all scales and densities simultaneously, without assuming a parametric form of
data point distances, or modifying them. We propose a novel approach that
directly encodes data point neighborhood similarities as a sparse graph. Our
non-iterative framework permits a transparent interpretation of data, without
altering the original data dimension and metric. Several natural and synthetic
data applications demonstrate the efficacy of our novel approach
Adaptive Manifold Clustering
Clustering methods seek to partition data such that elements are more similar
to elements in the same cluster than to elements in different clusters. The
main challenge in this task is the lack of a unified definition of a cluster,
especially for high dimensional data. Different methods and approaches have
been proposed to address this problem. This paper continues the study
originated by Efimov, Adamyan and Spokoiny (2019) where a novel approach to
adaptive nonparametric clustering called Adaptive Weights Clustering (AWC) was
offered. The method allows analyzing high-dimensional data with an unknown
number of unbalanced clusters of arbitrary shape under very weak modeling
assumptions. The procedure demonstrates a state-of-the-art performance and is
very efficient even for large data dimension D. However, the theoretical study
in Efimov, Adamyan and Spokoiny (2019) is very limited and did not really
address the question of efficiency. This paper makes a significant step in
understanding the remarkable performance of the AWC procedure, particularly in
high dimension. The approach is based on combining the ideas of adaptive
clustering and manifold learning. The manifold hypothesis means that high
dimensional data can be well approximated by a d-dimensional manifold for small
d helping to overcome the curse of dimensionality problem and to get sharp
bounds on the cluster separation which only depend on the intrinsic dimension
d. We also address the problem of parameter tuning. Our general theoretical
results are illustrated by some numerical experiments
Recommended from our members
Adaptive manifold clustering
Clustering methods seek to partition data such that elements are more similar to elements in the same cluster than to elements in different clusters. The main challenge in this task is the lack of a unified definition of a cluster, especially for high dimensional data. Different methods and approaches have been proposed to address this problem. This paper continues the study originated by [6] where a novel approach to adaptive nonparametric clustering called Adaptive Weights Clustering (AWC) was offered. The method allows analyzing high-dimensional data with an unknown number of unbalanced clusters of arbitrary shape under very weak modeling as-sumptions. The procedure demonstrates a state-of-the-art performance and is very efficient even for large data dimension D. However, the theoretical study in [6] is very limited and did not re-ally address the question of efficiency. This paper makes a significant step in understanding the remarkable performance of the AWC procedure, particularly in high dimension. The approach is based on combining the ideas of adaptive clustering and manifold learning. The manifold hypoth-esis means that high dimensional data can be well approximated by a d-dimensional manifold for small d helping to overcome the curse of dimensionality problem and to get sharp bounds on the cluster separation which only depend on the intrinsic dimension d. We also address the problem of parameter tuning. Our general theoretical results are illustrated by some numerical experiments
Adaptive manifold clustering
Clustering methods seek to partition data such that elements are more similar to elements in the same cluster than to elements in different clusters. The main challenge in this task is the lack of a unified definition of a cluster, especially for high dimensional data. Different methods and approaches have been proposed to address this problem. This paper continues the study originated by [6] where a novel approach to adaptive nonparametric clustering called Adaptive Weights Clustering (AWC) was offered. The method allows analyzing high-dimensional data with an unknown number of unbalanced clusters of arbitrary shape under very weak modeling as-sumptions. The procedure demonstrates a state-of-the-art performance and is very efficient even for large data dimension D. However, the theoretical study in [6] is very limited and did not re-ally address the question of efficiency. This paper makes a significant step in understanding the remarkable performance of the AWC procedure, particularly in high dimension. The approach is based on combining the ideas of adaptive clustering and manifold learning. The manifold hypoth-esis means that high dimensional data can be well approximated by a d-dimensional manifold for small d helping to overcome the curse of dimensionality problem and to get sharp bounds on the cluster separation which only depend on the intrinsic dimension d. We also address the problem of parameter tuning. Our general theoretical results are illustrated by some numerical experiments
Visual comparison of clustering using link-based clustering method (Lbcm) without predetermining initial centroid information
High dimensional data are difficult to view in two-dimensional plot. However, having a mechanism to reduce to a selected number of salient features that can well present the data is essential. We attempted to reduce N dimensional data to two-dimensional data using the combination of Information Gain (IG) and Principal Component Analysis (PCA) and to perform the link-based clustering which is our novel technique presented in this work in determining the linked clusters automatically using visual approach. Link-based Clustering Method (LbCM) is applied on the two-dimensional data to determine the clusters automatically. The significance of the method is that it does not require prior information such as the number of linked clusters. The approach using a combination of IG-PCA for feature selection is also useful to deal with high dimensional data. The LbCM is able to detect the number of linked clusters automatically by analyzing the X-Y coordinate positions of the points and visual information such as gaps between points and of two extreme points for both axes. Since the number of clusters is represented visually in two dimensions, LbCM performance can be compared visually
- …