30,629 research outputs found
Ultra-Scalable Spectral Clustering and Ensemble Clustering
This paper focuses on scalability and robustness of spectral clustering for
extremely large-scale datasets with limited resources. Two novel algorithms are
proposed, namely, ultra-scalable spectral clustering (U-SPEC) and
ultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative
selection strategy and a fast approximation method for K-nearest
representatives are proposed for the construction of a sparse affinity
sub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the
transfer cut is then utilized to efficiently partition the graph and obtain the
clustering result. In U-SENC, multiple U-SPEC clusterers are further integrated
into an ensemble clustering framework to enhance the robustness of U-SPEC while
maintaining high efficiency. Based on the ensemble generation via multiple
U-SEPC's, a new bipartite graph is constructed between objects and base
clusters and then efficiently partitioned to achieve the consensus clustering
result. It is noteworthy that both U-SPEC and U-SENC have nearly linear time
and space complexity, and are capable of robustly and efficiently partitioning
ten-million-level nonlinearly-separable datasets on a PC with 64GB memory.
Experiments on various large-scale datasets have demonstrated the scalability
and robustness of our algorithms. The MATLAB code and experimental data are
available at https://www.researchgate.net/publication/330760669.Comment: To appear in IEEE Transactions on Knowledge and Data Engineering,
201
On Instance Weighted Clustering Ensembles
© ESANN, 2023. This is the accepted manuscript version of an article which has been published in final form at: www.esann.org/proceedings/2023Ensemble clustering is a technique which combines multipleclustering results, and instance weighting is a technique which highlightsimportant instances in a dataset. Both techniques are known to enhanceclustering performance and robustness. In this research, ensembles andinstance weighting are integrated with the spectral clustering algorithm.We believe this is the first attempt at creating diversity in the generativemechanism using density based instance weighting for a spectral ensemble.The proposed approach is empirically validated using synthetic datasetscomparing against spectral and a spectral ensemble with random instanceweighting. Results show that using the instance weighted sub-samplingapproach as the generative mechanism for an ensemble of spectral cluster-ing leads to improved clustering performance on datasets with imbalancedclusters.Peer reviewe
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
- …