19,171 research outputs found
Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis
The clustering ensemble technique aims to combine multiple clusterings into a
probably better and more robust clustering and has been receiving an increasing
attention in recent years. There are mainly two aspects of limitations in the
existing clustering ensemble approaches. Firstly, many approaches lack the
ability to weight the base clusterings without access to the original data and
can be affected significantly by the low-quality, or even ill clusterings.
Secondly, they generally focus on the instance level or cluster level in the
ensemble system and fail to integrate multi-granularity cues into a unified
model. To address these two limitations, this paper proposes to solve the
clustering ensemble problem via crowd agreement estimation and
multi-granularity link analysis. We present the normalized crowd agreement
index (NCAI) to evaluate the quality of base clusterings in an unsupervised
manner and thus weight the base clusterings in accordance with their clustering
validity. To explore the relationship between clusters, the source aware
connected triple (SACT) similarity is introduced with regard to their common
neighbors and the source reliability. Based on NCAI and multi-granularity
information collected among base clusterings, clusters, and data instances, we
further propose two novel consensus functions, termed weighted evidence
accumulation clustering (WEAC) and graph partitioning with multi-granularity
link analysis (GP-MGLA) respectively. The experiments are conducted on eight
real-world datasets. The experimental results demonstrate the effectiveness and
robustness of the proposed methods.Comment: The MATLAB source code of this work is available at:
https://www.researchgate.net/publication/28197031
DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep Neural Networks
Deep clustering has recently emerged as a promising technique for complex
data clustering. Despite the considerable progress, previous deep clustering
works mostly build or learn the final clustering by only utilizing a single
layer of representation, e.g., by performing the K-means clustering on the last
fully-connected layer or by associating some clustering loss to a specific
layer, which neglect the possibilities of jointly leveraging multi-layer
representations for enhancing the deep clustering performance. In view of this,
this paper presents a Deep Clustering via Ensembles (DeepCluE) approach, which
bridges the gap between deep clustering and ensemble clustering by harnessing
the power of multiple layers in deep neural networks. In particular, we utilize
a weight-sharing convolutional neural network as the backbone, which is trained
with both the instance-level contrastive learning (via an instance projector)
and the cluster-level contrastive learning (via a cluster projector) in an
unsupervised manner. Thereafter, multiple layers of feature representations are
extracted from the trained network, upon which the ensemble clustering process
is further conducted. Specifically, a set of diversified base clusterings are
generated from the multi-layer representations via a highly efficient
clusterer. Then the reliability of clusters in multiple base clusterings is
automatically estimated by exploiting an entropy-based criterion, based on
which the set of base clusterings are re-formulated into a weighted-cluster
bipartite graph. By partitioning this bipartite graph via transfer cut, the
final consensus clustering can be obtained. Experimental results on six image
datasets confirm the advantages of DeepCluE over the state-of-the-art deep
clustering approaches.Comment: To appear in IEEE Transactions on Emerging Topics in Computational
Intelligenc
Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery
Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.National Institute for Health Researc
Relaxation dynamics of maximally clustered networks
We study the relaxation dynamics of fully clustered networks (maximal number
of triangles) to an unclustered state under two different edge dynamics---the
double-edge swap, corresponding to degree-preserving randomization of the
configuration model, and single edge replacement, corresponding to full
randomization of the Erd\H{o}s--R\'enyi random graph. We derive expressions for
the time evolution of the degree distribution, edge multiplicity distribution
and clustering coefficient. We show that under both dynamics networks undergo a
continuous phase transition in which a giant connected component is formed. We
calculate the position of the phase transition analytically using the
Erd\H{o}s--R\'enyi phenomenology
Evolution in the Clustering of Galaxies for Z < 1
Measuring the evolution in the clustering of galaxies over a large redshift
range is a challenging problem. For a two-dimensional galaxy catalog, however,
we can measure the galaxy-galaxy angular correlation function which provides
information on the density distribution of galaxies. By utilizing photometric
redshifts, we can measure the angular correlation function in redshift shells
(Brunner 1997, Connolly et al. 1998) which minimizes the galaxy projection
effect, and allows for a measurement of the evolution in the correlation
strength with redshift. In this proceedings, we present some preliminary
results which extend our previous work using more accurate photometric
redshifts, and also incorporate absolute magnitudes, so that we can measure the
evolution of clustering with either redshift or intrinsic luminosity.Comment: 6 pages, 6 figures requires paspconf.sty. To be published in
"Photometric Redshifts and High Redshift Galaxies", eds. R. Weymann, L.
Storrie-Lombardi, M. Sawicki & R. Brunner, (San Francisco: ASP Conference
Series
Coupling geometry on binary bipartite networks: hypotheses testing on pattern geometry and nestedness
Upon a matrix representation of a binary bipartite network, via the
permutation invariance, a coupling geometry is computed to approximate the
minimum energy macrostate of a network's system. Such a macrostate is supposed
to constitute the intrinsic structures of the system, so that the coupling
geometry should be taken as information contents, or even the nonparametric
minimum sufficient statistics of the network data. Then pertinent null and
alternative hypotheses, such as nestedness, are to be formulated according to
the macrostate. That is, any efficient testing statistic needs to be a function
of this coupling geometry. These conceptual architectures and mechanisms are by
and large still missing in community ecology literature, and rendered
misconceptions prevalent in this research area. Here the algorithmically
computed coupling geometry is shown consisting of deterministic multiscale
block patterns, which are framed by two marginal ultrametric trees on row and
column axes, and stochastic uniform randomness within each block found on the
finest scale. Functionally a series of increasingly larger ensembles of matrix
mimicries is derived by conforming to the multiscale block configurations. Here
matrix mimicking is meant to be subject to constraints of row and column sums
sequences. Based on such a series of ensembles, a profile of distributions
becomes a natural device for checking the validity of testing statistics or
structural indexes. An energy based index is used for testing whether network
data indeed contains structural geometry. A new version block-based nestedness
index is also proposed. Its validity is checked and compared with the existing
ones. A computing paradigm, called Data Mechanics, and its application on one
real data network are illustrated throughout the developments and discussions
in this paper
- …