6,236 research outputs found
Clustering Ensemble Meets Low-rank Tensor Approximation
This paper explores the problem of clustering ensemble, which aims to combine
multiple base clusterings to produce better performance than that of the
individual one. The existing clustering ensemble methods generally construct a
co-association matrix, which indicates the pairwise similarity between samples,
as the weighted linear combination of the connective matrices from different
base clusterings, and the resulting co-association matrix is then adopted as
the input of an off-the-shelf clustering algorithm, e.g., spectral clustering.
However, the co-association matrix may be dominated by poor base clusterings,
resulting in inferior performance. In this paper, we propose a novel low-rank
tensor approximation-based method to solve the problem from a global
perspective. Specifically, by inspecting whether two samples are clustered to
an identical cluster under different base clusterings, we derive a
coherent-link matrix, which contains limited but highly reliable relationships
between samples. We then stack the coherent-link matrix and the co-association
matrix to form a three-dimensional tensor, the low-rankness property of which
is further explored to propagate the information of the coherent-link matrix to
the co-association matrix, producing a refined co-association matrix. We
formulate the proposed method as a convex constrained optimization problem and
solve it efficiently. Experimental results over 7 benchmark data sets show that
the proposed model achieves a breakthrough in clustering performance, compared
with 12 state-of-the-art methods. To the best of our knowledge, this is the
first work to explore the potential of low-rank tensor on clustering ensemble,
which is fundamentally different from previous approaches
Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis
The clustering ensemble technique aims to combine multiple clusterings into a
probably better and more robust clustering and has been receiving an increasing
attention in recent years. There are mainly two aspects of limitations in the
existing clustering ensemble approaches. Firstly, many approaches lack the
ability to weight the base clusterings without access to the original data and
can be affected significantly by the low-quality, or even ill clusterings.
Secondly, they generally focus on the instance level or cluster level in the
ensemble system and fail to integrate multi-granularity cues into a unified
model. To address these two limitations, this paper proposes to solve the
clustering ensemble problem via crowd agreement estimation and
multi-granularity link analysis. We present the normalized crowd agreement
index (NCAI) to evaluate the quality of base clusterings in an unsupervised
manner and thus weight the base clusterings in accordance with their clustering
validity. To explore the relationship between clusters, the source aware
connected triple (SACT) similarity is introduced with regard to their common
neighbors and the source reliability. Based on NCAI and multi-granularity
information collected among base clusterings, clusters, and data instances, we
further propose two novel consensus functions, termed weighted evidence
accumulation clustering (WEAC) and graph partitioning with multi-granularity
link analysis (GP-MGLA) respectively. The experiments are conducted on eight
real-world datasets. The experimental results demonstrate the effectiveness and
robustness of the proposed methods.Comment: The MATLAB source code of this work is available at:
https://www.researchgate.net/publication/28197031
Element-centric clustering comparison unifies overlaps and hierarchy
Clustering is one of the most universal approaches for understanding complex
data. A pivotal aspect of clustering analysis is quantitatively comparing
clusterings; clustering comparison is the basis for many tasks such as
clustering evaluation, consensus clustering, and tracking the temporal
evolution of clusters. In particular, the extrinsic evaluation of clustering
methods requires comparing the uncovered clusterings to planted clusterings or
known metadata. Yet, as we demonstrate, existing clustering comparison measures
have critical biases which undermine their usefulness, and no measure
accommodates both overlapping and hierarchical clusterings. Here we unify the
comparison of disjoint, overlapping, and hierarchically structured clusterings
by proposing a new element-centric framework: elements are compared based on
the relationships induced by the cluster structure, as opposed to the
traditional cluster-centric philosophy. We demonstrate that, in contrast to
standard clustering similarity measures, our framework does not suffer from
critical biases and naturally provides unique insights into how the clusterings
differ. We illustrate the strengths of our framework by revealing new insights
into the organization of clusters in two applications: the improved
classification of schizophrenia based on the overlapping and hierarchical
community structure of fMRI brain networks, and the disentanglement of various
social homophily factors in Facebook social networks. The universality of
clustering suggests far-reaching impact of our framework throughout all areas
of science
- …