10 research outputs found
Federated Deep Multi-View Clustering with Global Self-Supervision
Federated multi-view clustering has the potential to learn a global
clustering model from data distributed across multiple devices. In this
setting, label information is unknown and data privacy must be preserved,
leading to two major challenges. First, views on different clients often have
feature heterogeneity, and mining their complementary cluster information is
not trivial. Second, the storage and usage of data from multiple clients in a
distributed environment can lead to incompleteness of multi-view data. To
address these challenges, we propose a novel federated deep multi-view
clustering method that can mine complementary cluster structures from multiple
clients, while dealing with data incompleteness and privacy concerns.
Specifically, in the server environment, we propose sample alignment and data
extension techniques to explore the complementary cluster structures of
multiple views. The server then distributes global prototypes and global
pseudo-labels to each client as global self-supervised information. In the
client environment, multiple clients use the global self-supervised information
and deep autoencoders to learn view-specific cluster assignments and embedded
features, which are then uploaded to the server for refining the global
self-supervised information. Finally, the results of our extensive experiments
demonstrate that our proposed method exhibits superior performance in
addressing the challenges of incomplete multi-view data in distributed
environments
Tensor-based Intrinsic Subspace Representation Learning for Multi-view Clustering
As a hot research topic, many multi-view clustering approaches are proposed
over the past few years. Nevertheless, most existing algorithms merely take the
consensus information among different views into consideration for clustering.
Actually, it may hinder the multi-view clustering performance in real-life
applications, since different views usually contain diverse statistic
properties. To address this problem, we propose a novel Tensor-based Intrinsic
Subspace Representation Learning (TISRL) for multi-view clustering in this
paper. Concretely, the rank preserving decomposition is proposed firstly to
effectively deal with the diverse statistic information contained in different
views. Then, to achieve the intrinsic subspace representation, the
tensor-singular value decomposition based low-rank tensor constraint is also
utilized in our method. It can be seen that specific information contained in
different views is fully investigated by the rank preserving decomposition, and
the high-order correlations of multi-view data are also mined by the low-rank
tensor constraint. The objective function can be optimized by an augmented
Lagrangian multiplier based alternating direction minimization algorithm.
Experimental results on nine common used real-world multi-view datasets
illustrate the superiority of TISRL
Improved K-means clustering algorithms : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, New Zealand
K-means clustering algorithm is designed to divide the samples into subsets with the goal that maximizes the intra-subset similarity and inter-subset dissimilarity where the similarity measures the relationship between two samples. As an unsupervised learning technique, K-means clustering algorithm is considered one of the most used clustering algorithms and has been applied in a variety of areas such as artificial intelligence, data mining, biology, psychology, marketing, medicine, etc.
K-means clustering algorithm is not robust and its clustering result depends on the initialization, the similarity measure, and the predefined cluster number. Previous research focused on solving a part of these issues but has not focused on solving them in a unified framework. However, fixing one of these issues does not guarantee the best performance. To improve K-means clustering algorithm, one of the most famous and widely used clustering algorithms, by solving its issues simultaneously is challenging and significant.
This thesis conducts an extensive research on K-means clustering algorithm aiming to improve it.
First, we propose the Initialization-Similarity (IS) clustering algorithm to solve the issues of the initialization and the similarity measure of K-means clustering algorithm in a unified way. Specifically, we propose to fix the initialization of the clustering by using sum-of-norms (SON) which outputs the new representation of the original samples and to learn the similarity matrix based on the data distribution. Furthermore, the derived new representation is used to conduct K-means clustering.
Second, we propose a Joint Feature Selection with Dynamic Spectral (FSDS) clustering algorithm to solve the issues of the cluster number determination, the similarity measure, and the robustness of the clustering by selecting effective features and reducing the influence of outliers simultaneously. Specifically, we propose to learn the similarity matrix based on the data distribution as well as adding the ranked constraint on the Laplacian matrix of the learned similarity matrix to automatically output the cluster number. Furthermore, the proposed algorithm employs the L2,1-norm as the sparse constraints on the regularization term and the loss function to remove the redundant features and reduce the influence of outliers respectively.
Third, we propose a Joint Robust Multi-view (JRM) spectral clustering algorithm that conducts clustering for multi-view data while solving the initialization issue, the cluster number determination, the similarity measure learning, the removal of the redundant features, and the reduction of outlier influence in a unified way.
Finally, the proposed algorithms outperformed the state-of-the-art clustering algorithms on real data sets. Moreover, we theoretically prove the convergences of the proposed optimization methods for the proposed objective functions