3 research outputs found

    A Framework for Statistical Clustering with a Constant Time Approximation Algorithms for K-Median Clustering

    Full text link
    We consider a framework in which the clustering algorithm gets as input a sample generated i.i.d by some unknown arbitrary distribution, and has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clusterings that approximate the optimal clustering. We show that the K-median clustering, as well as the Vector Quantization problem, satisfy these conditions. In particular our results apply to the sampling - based approximate clustering scenario. As a corollary, we get a samplingbased algorithm for the K-median clustering problem that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the running time of our algorithm is independent of the Euclidean dimension
    corecore