6,040 research outputs found
Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce
The kernel -means is an effective method for data clustering which extends
the commonly-used -means algorithm to work on a similarity matrix over
complex data structures. The kernel -means algorithm is however
computationally very complex as it requires the complete data matrix to be
calculated and stored. Further, the kernelized nature of the kernel -means
algorithm hinders the parallelization of its computations on modern
infrastructures for distributed computing. In this paper, we are defining a
family of kernel-based low-dimensional embeddings that allows for scaling
kernel -means on MapReduce via an efficient and unified parallelization
strategy. Afterwards, we propose two methods for low-dimensional embedding that
adhere to our definition of the embedding family. Exploiting the proposed
parallelization strategy, we present two scalable MapReduce algorithms for
kernel -means. We demonstrate the effectiveness and efficiency of the
proposed algorithms through an empirical evaluation on benchmark data sets.Comment: Appears in Proceedings of the SIAM International Conference on Data
Mining (SDM), 201
A Convex Formulation for Spectral Shrunk Clustering
Spectral clustering is a fundamental technique in the field of data mining
and information processing. Most existing spectral clustering algorithms
integrate dimensionality reduction into the clustering process assisted by
manifold learning in the original space. However, the manifold in
reduced-dimensional subspace is likely to exhibit altered properties in
contrast with the original space. Thus, applying manifold information obtained
from the original space to the clustering process in a low-dimensional subspace
is prone to inferior performance. Aiming to address this issue, we propose a
novel convex algorithm that mines the manifold structure in the low-dimensional
subspace. In addition, our unified learning process makes the manifold learning
particularly tailored for the clustering. Compared with other related methods,
the proposed algorithm results in more structured clustering result. To
validate the efficacy of the proposed algorithm, we perform extensive
experiments on several benchmark datasets in comparison with some
state-of-the-art clustering approaches. The experimental results demonstrate
that the proposed algorithm has quite promising clustering performance.Comment: AAAI201
- …