5,335 research outputs found
Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis
The clustering ensemble technique aims to combine multiple clusterings into a
probably better and more robust clustering and has been receiving an increasing
attention in recent years. There are mainly two aspects of limitations in the
existing clustering ensemble approaches. Firstly, many approaches lack the
ability to weight the base clusterings without access to the original data and
can be affected significantly by the low-quality, or even ill clusterings.
Secondly, they generally focus on the instance level or cluster level in the
ensemble system and fail to integrate multi-granularity cues into a unified
model. To address these two limitations, this paper proposes to solve the
clustering ensemble problem via crowd agreement estimation and
multi-granularity link analysis. We present the normalized crowd agreement
index (NCAI) to evaluate the quality of base clusterings in an unsupervised
manner and thus weight the base clusterings in accordance with their clustering
validity. To explore the relationship between clusters, the source aware
connected triple (SACT) similarity is introduced with regard to their common
neighbors and the source reliability. Based on NCAI and multi-granularity
information collected among base clusterings, clusters, and data instances, we
further propose two novel consensus functions, termed weighted evidence
accumulation clustering (WEAC) and graph partitioning with multi-granularity
link analysis (GP-MGLA) respectively. The experiments are conducted on eight
real-world datasets. The experimental results demonstrate the effectiveness and
robustness of the proposed methods.Comment: The MATLAB source code of this work is available at:
https://www.researchgate.net/publication/28197031
Ultra-Scalable Spectral Clustering and Ensemble Clustering
This paper focuses on scalability and robustness of spectral clustering for
extremely large-scale datasets with limited resources. Two novel algorithms are
proposed, namely, ultra-scalable spectral clustering (U-SPEC) and
ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative
selection strategy and a fast approximation method for K-nearest
representatives are proposed for the construction of a sparse affinity
sub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the
transfer cut is then utilized to efficiently partition the graph and obtain the
clustering result. In U-SENC, multiple U-SPEC clusterers are further integrated
into an ensemble clustering framework to enhance the robustness of U-SPEC while
maintaining high efficiency. Based on the ensemble generation via multiple
U-SEPC's, a new bipartite graph is constructed between objects and base
clusters and then efficiently partitioned to achieve the consensus clustering
result. It is noteworthy that both U-SPEC and U-SENC have nearly linear time
and space complexity, and are capable of robustly and efficiently partitioning
ten-million-level nonlinearly-separable datasets on a PC with 64GB memory.
Experiments on various large-scale datasets have demonstrated the scalability
and robustness of our algorithms. The MATLAB code and experimental data are
available at https://www.researchgate.net/publication/330760669.Comment: To appear in IEEE Transactions on Knowledge and Data Engineering,
201
Image Clustering with Contrastive Learning and Multi-scale Graph Convolutional Networks
Deep clustering has recently attracted significant attention. Despite the
remarkable progress, most of the previous deep clustering works still suffer
from two limitations. First, many of them focus on some distribution-based
clustering loss, lacking the ability to exploit sample-wise (or
augmentation-wise) relationships via contrastive learning. Second, they often
neglect the indirect sample-wise structure information, overlooking the rich
possibilities of multi-scale neighborhood structure learning. In view of this,
this paper presents a new deep clustering approach termed Image clustering with
contrastive learning and multi-scale Graph Convolutional Networks (IcicleGCN),
which bridges the gap between convolutional neural network (CNN) and graph
convolutional network (GCN) as well as the gap between contrastive learning and
multi-scale neighborhood structure learning for the image clustering task. The
proposed IcicleGCN framework consists of four main modules, namely, the
CNN-based backbone, the Instance Similarity Module (ISM), the Joint Cluster
Structure Learning and Instance reconstruction Module (JC-SLIM), and the
Multi-scale GCN module (M-GCN). Specifically, with two random augmentations
performed on each image, the backbone network with two weight-sharing views is
utilized to learn the representations for the augmented samples, which are then
fed to ISM and JC-SLIM for instance-level and cluster-level contrastive
learning, respectively. Further, to enforce multi-scale neighborhood structure
learning, two streams of GCNs and an auto-encoder are simultaneously trained
via (i) the layer-wise interaction with representation fusion and (ii) the
joint self-adaptive learning that ensures their last-layer output distributions
to be consistent. Experiments on multiple image datasets demonstrate the
superior clustering performance of IcicleGCN over the state-of-the-art
Engineering a portable riboswitch-LacP hybrid device for two-way gene regulation
Riboswitches are RNA-based regulatory devices that mediate ligand-dependent control of gene expression. However, there has been limited success in rationally designing riboswitches. Moreover, most previous riboswitches are confined to a particular gene and only perform one-way regulation. Here, we used a library screening strategy for efficient creation of ON and OFF riboswitches of lacI on the chromosome of Escherichia coli. We then engineered a riboswitch-LacP hybrid device to achieve portable gene control in response to theophylline and IPTG. Moreover, this device regulated target expression in a ātwo-wayā manner: the default state of target expression was ON; the expression was switched off by adding theophylline and restored to the ON state by adding IPTG without changing growth medium. We showcased the portability and two-way regulation of this device by applying it to the small RNA CsrB and the RpoS protein. Finally, the use of the hybrid device uncovered an inhibitory role of RpoS in acetate assimilation, a function which is otherwise neglected using conventional genetic approaches. Overall, this work establishes a portable riboswitch-LacP device that achieves sequential OFF-and-ON gene regulation. The two-way control of gene expression has various potential scientific and biotechnological applications and helps reveal novel gene functions
- ā¦