Search CORE

1 research outputs found

Scalable K-Medoids via True Error Bound and Familywise Bandits

Author: Agarwal Saurabh
Babu Aravindakshan
Babu Sudarshan
Chandrasekaran Hariharan
Publication venue
Publication date: 29/10/2019
Field of study

K-Medoids(KM) is a standard clustering method, used extensively on semi-metric data.Error analyses of KM have traditionally used an in-sample notion of error,which can be far from the true error and suffer from generalization gap. We formalize the true K-Medoid error based on the underlying data distribution.We decompose the true error into fundamental statistical problems of: minimum estimation (ME) and minimum mean estimation (MME). We provide a convergence result for MME. We show \errMME decreases no slower than

\Theta(\frac{1}{n^{\frac{2}{3}}})

, where

n

is a measure of sample size. Inspired by this bound, we propose a computationally efficient, distributed KM algorithm namely MCPAM. MCPAM has expected runtime

\mathcal{O}(km)

,where

k

is the number of medoids and

m

is number of samples. MCPAM provides massive computational savings for a small tradeoff in accuracy. We verify the quality and scaling properties of MCPAM on various datasets. And achieve the hitherto unachieved feat of calculating the KM of 1 billion points on semi-metric spaces

arXiv.org e-Print Archive