A Faster kk-means++ Algorithm

Abstract

K-means++ is an important algorithm to choose initial cluster centers for the k-means clustering algorithm. In this work, we present a new algorithm that can solve the kk-means++ problem with near optimal running time. Given nn data points in Rd\mathbb{R}^d, the current state-of-the-art algorithm runs in O~(k)\widetilde{O}(k ) iterations, and each iteration takes O~(ndk)\widetilde{O}(nd k) time. The overall running time is thus O~(ndk2)\widetilde{O}(n d k^2). We propose a new algorithm \textsc{FastKmeans++} that only takes in O~(nd+nk2)\widetilde{O}(nd + nk^2) time, in total

    Similar works

    Full text

    thumbnail-image

    Available Versions