110,462 research outputs found
Late Fusion Multi-view Clustering via Global and Local Alignment Maximization
Multi-view clustering (MVC) optimally integrates complementary information
from different views to improve clustering performance. Although demonstrating
promising performance in various applications, most of existing approaches
directly fuse multiple pre-specified similarities to learn an optimal
similarity matrix for clustering, which could cause over-complicated
optimization and intensive computational cost. In this paper, we propose late
fusion MVC via alignment maximization to address these issues. To do so, we
first reveal the theoretical connection of existing k-means clustering and the
alignment between base partitions and the consensus one. Based on this
observation, we propose a simple but effective multi-view algorithm termed
LF-MVC-GAM. It optimally fuses multiple source information in partition level
from each individual view, and maximally aligns the consensus partition with
these weighted base ones. Such an alignment is beneficial to integrate
partition level information and significantly reduce the computational
complexity by sufficiently simplifying the optimization procedure. We then
design another variant, LF-MVC-LAM to further improve the clustering
performance by preserving the local intrinsic structure among multiple
partition spaces. After that, we develop two three-step iterative algorithms to
solve the resultant optimization problems with theoretically guaranteed
convergence. Further, we provide the generalization error bound analysis of the
proposed algorithms. Extensive experiments on eighteen multi-view benchmark
datasets demonstrate the effectiveness and efficiency of the proposed
LF-MVC-GAM and LF-MVC-LAM, ranging from small to large-scale data items. The
codes of the proposed algorithms are publicly available at
https://github.com/wangsiwei2010/latefusionalignment
DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep Neural Networks
Deep clustering has recently emerged as a promising technique for complex
data clustering. Despite the considerable progress, previous deep clustering
works mostly build or learn the final clustering by only utilizing a single
layer of representation, e.g., by performing the K-means clustering on the last
fully-connected layer or by associating some clustering loss to a specific
layer, which neglect the possibilities of jointly leveraging multi-layer
representations for enhancing the deep clustering performance. In view of this,
this paper presents a Deep Clustering via Ensembles (DeepCluE) approach, which
bridges the gap between deep clustering and ensemble clustering by harnessing
the power of multiple layers in deep neural networks. In particular, we utilize
a weight-sharing convolutional neural network as the backbone, which is trained
with both the instance-level contrastive learning (via an instance projector)
and the cluster-level contrastive learning (via a cluster projector) in an
unsupervised manner. Thereafter, multiple layers of feature representations are
extracted from the trained network, upon which the ensemble clustering process
is further conducted. Specifically, a set of diversified base clusterings are
generated from the multi-layer representations via a highly efficient
clusterer. Then the reliability of clusters in multiple base clusterings is
automatically estimated by exploiting an entropy-based criterion, based on
which the set of base clusterings are re-formulated into a weighted-cluster
bipartite graph. By partitioning this bipartite graph via transfer cut, the
final consensus clustering can be obtained. Experimental results on six image
datasets confirm the advantages of DeepCluE over the state-of-the-art deep
clustering approaches.Comment: To appear in IEEE Transactions on Emerging Topics in Computational
Intelligenc
Improving Clustering Methods By Exploiting Richness Of Text Data
Clustering is an unsupervised machine learning technique, which involves discovering different clusters (groups) of similar objects in unlabeled data and is generally considered to be a NP hard problem. Clustering methods are widely used in a verity of disciplines for analyzing different types of data, and a small improvement in clustering method can cause a ripple effect in advancing research of multiple fields.
Clustering any type of data is challenging and there are many open research questions. The clustering problem is exacerbated in the case of text data because of the additional challenges such as issues in capturing semantics of a document, handling rich features of text data and dealing with the well known problem of the curse of dimensionality.
In this thesis, we investigate the limitations of existing text clustering methods and address these limitations by providing five new text clustering methods--Query Sense Clustering (QSC), Dirichlet Weighted K-means (DWKM), Multi-View Multi-Objective Evolutionary Algorithm (MMOEA), Multi-objective Document Clustering (MDC) and Multi-Objective Multi-View Ensemble Clustering (MOMVEC). These five new clustering methods showed that the use of rich features in text clustering methods could outperform the existing state-of-the-art text clustering methods.
The first new text clustering method QSC exploits user queries (one of the rich features in text data) to generate better quality clusters and cluster labels.
The second text clustering method DWKM uses probability based weighting scheme to formulate a semantically weighted distance measure to improve the clustering results.
The third text clustering method MMOEA is based on a multi-objective evolutionary algorithm. MMOEA exploits rich features to generate a diverse set of candidate clustering solutions, and forms a better clustering solution using a cluster-oriented approach.
The fourth and the fifth text clustering method MDC and MOMVEC address the limitations of MMOEA. MDC and MOMVEC differ in terms of the implementation of their multi-objective evolutionary approaches.
All five methods are compared with existing state-of-the-art methods. The results of the comparisons show that the newly developed text clustering methods out-perform existing methods by achieving up to 16\% improvement for some comparisons. In general, almost all newly developed clustering algorithms showed statistically significant improvements over other existing methods.
The key ideas of the thesis highlight that exploiting user queries improves Search Result Clustering(SRC); utilizing rich features in weighting schemes and distance measures improves soft subspace clustering; utilizing multiple views and a multi-objective cluster oriented method improves clustering ensemble methods; and better evolutionary operators and objective functions improve multi-objective evolutionary clustering ensemble methods.
The new text clustering methods introduced in this thesis can be widely applied in various domains that involve analysis of text data. The contributions of this thesis which include five new text clustering methods, will not only help researchers in the data mining field but also to help a wide range of researchers in other fields
Multi-view Graph Embedding with Hub Detection for Brain Network Analysis
Multi-view graph embedding has become a widely studied problem in the area of
graph learning. Most of the existing works on multi-view graph embedding aim to
find a shared common node embedding across all the views of the graph by
combining the different views in a specific way. Hub detection, as another
essential topic in graph mining has also drawn extensive attentions in recent
years, especially in the context of brain network analysis. Both the graph
embedding and hub detection relate to the node clustering structure of graphs.
The multi-view graph embedding usually implies the node clustering structure of
the graph based on the multiple views, while the hubs are the boundary-spanning
nodes across different node clusters in the graph and thus may potentially
influence the clustering structure of the graph. However, none of the existing
works in multi-view graph embedding considered the hubs when learning the
multi-view embeddings. In this paper, we propose to incorporate the hub
detection task into the multi-view graph embedding framework so that the two
tasks could benefit each other. Specifically, we propose an auto-weighted
framework of Multi-view Graph Embedding with Hub Detection (MVGE-HD) for brain
network analysis. The MVGE-HD framework learns a unified graph embedding across
all the views while reducing the potential influence of the hubs on blurring
the boundaries between node clusters in the graph, thus leading to a clear and
discriminative node clustering structure for the graph. We apply MVGE-HD on two
real multi-view brain network datasets (i.e., HIV and Bipolar). The
experimental results demonstrate the superior performance of the proposed
framework in brain network analysis for clinical investigation and application
A Survey on Soft Subspace Clustering
Subspace clustering (SC) is a promising clustering technology to identify
clusters based on their associations with subspaces in high dimensional spaces.
SC can be classified into hard subspace clustering (HSC) and soft subspace
clustering (SSC). While HSC algorithms have been extensively studied and well
accepted by the scientific community, SSC algorithms are relatively new but
gaining more attention in recent years due to better adaptability. In the
paper, a comprehensive survey on existing SSC algorithms and the recent
development are presented. The SSC algorithms are classified systematically
into three main categories, namely, conventional SSC (CSSC), independent SSC
(ISSC) and extended SSC (XSSC). The characteristics of these algorithms are
highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
Multi-view constrained clustering with an incomplete mapping between views
Multi-view learning algorithms typically assume a complete bipartite mapping
between the different views in order to exchange information during the
learning process. However, many applications provide only a partial mapping
between the views, creating a challenge for current methods. To address this
problem, we propose a multi-view algorithm based on constrained clustering that
can operate with an incomplete mapping. Given a set of pairwise constraints in
each view, our approach propagates these constraints using a local similarity
measure to those instances that can be mapped to the other views, allowing the
propagated constraints to be transferred across views via the partial mapping.
It uses co-EM to iteratively estimate the propagation within each view based on
the current clustering model, transfer the constraints across views, and then
update the clustering model. By alternating the learning process between views,
this approach produces a unified clustering model that is consistent with all
views. We show that this approach significantly improves clustering performance
over several other methods for transferring constraints and allows multi-view
clustering to be reliably applied when given a limited mapping between the
views. Our evaluation reveals that the propagated constraints have high
precision with respect to the true clusters in the data, explaining their
benefit to clustering performance in both single- and multi-view learning
scenarios
- …