30,692 research outputs found
Partitioning Complex Networks via Size-constrained Clustering
The most commonly used method to tackle the graph partitioning problem in
practice is the multilevel approach. During a coarsening phase, a multilevel
graph partitioning algorithm reduces the graph size by iteratively contracting
nodes and edges until the graph is small enough to be partitioned by some other
algorithm. A partition of the input graph is then constructed by successively
transferring the solution to the next finer graph and applying a local search
algorithm to improve the current solution.
In this paper, we describe a novel approach to partition graphs effectively
especially if the networks have a highly irregular structure. More precisely,
our algorithm provides graph coarsening by iteratively contracting
size-constrained clusterings that are computed using a label propagation
algorithm. The same algorithm that provides the size-constrained clusterings
can also be used during uncoarsening as a fast and simple local search
algorithm.
Depending on the algorithm's configuration, we are able to compute partitions
of very high quality outperforming all competitors, or partitions that are
comparable to the best competitor in terms of quality, hMetis, while being
nearly an order of magnitude faster on average. The fastest configuration
partitions the largest graph available to us with 3.3 billion edges using a
single machine in about ten minutes while cutting less than half of the edges
than the fastest competitor, kMetis
Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis
The clustering ensemble technique aims to combine multiple clusterings into a
probably better and more robust clustering and has been receiving an increasing
attention in recent years. There are mainly two aspects of limitations in the
existing clustering ensemble approaches. Firstly, many approaches lack the
ability to weight the base clusterings without access to the original data and
can be affected significantly by the low-quality, or even ill clusterings.
Secondly, they generally focus on the instance level or cluster level in the
ensemble system and fail to integrate multi-granularity cues into a unified
model. To address these two limitations, this paper proposes to solve the
clustering ensemble problem via crowd agreement estimation and
multi-granularity link analysis. We present the normalized crowd agreement
index (NCAI) to evaluate the quality of base clusterings in an unsupervised
manner and thus weight the base clusterings in accordance with their clustering
validity. To explore the relationship between clusters, the source aware
connected triple (SACT) similarity is introduced with regard to their common
neighbors and the source reliability. Based on NCAI and multi-granularity
information collected among base clusterings, clusters, and data instances, we
further propose two novel consensus functions, termed weighted evidence
accumulation clustering (WEAC) and graph partitioning with multi-granularity
link analysis (GP-MGLA) respectively. The experiments are conducted on eight
real-world datasets. The experimental results demonstrate the effectiveness and
robustness of the proposed methods.Comment: The MATLAB source code of this work is available at:
https://www.researchgate.net/publication/28197031
Ultra-Scalable Spectral Clustering and Ensemble Clustering
This paper focuses on scalability and robustness of spectral clustering for
extremely large-scale datasets with limited resources. Two novel algorithms are
proposed, namely, ultra-scalable spectral clustering (U-SPEC) and
ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative
selection strategy and a fast approximation method for K-nearest
representatives are proposed for the construction of a sparse affinity
sub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the
transfer cut is then utilized to efficiently partition the graph and obtain the
clustering result. In U-SENC, multiple U-SPEC clusterers are further integrated
into an ensemble clustering framework to enhance the robustness of U-SPEC while
maintaining high efficiency. Based on the ensemble generation via multiple
U-SEPC's, a new bipartite graph is constructed between objects and base
clusters and then efficiently partitioned to achieve the consensus clustering
result. It is noteworthy that both U-SPEC and U-SENC have nearly linear time
and space complexity, and are capable of robustly and efficiently partitioning
ten-million-level nonlinearly-separable datasets on a PC with 64GB memory.
Experiments on various large-scale datasets have demonstrated the scalability
and robustness of our algorithms. The MATLAB code and experimental data are
available at https://www.researchgate.net/publication/330760669.Comment: To appear in IEEE Transactions on Knowledge and Data Engineering,
201
- …