53 research outputs found
Quickshift++: Provably Good Initializations for Sample-Based Mean Shift
We provide initial seedings to the Quick Shift clustering algorithm, which
approximate the locally high-density regions of the data. Such seedings act as
more stable and expressive cluster-cores than the singleton modes found by
Quick Shift. We establish statistical consistency guarantees for this
modification. We then show strong clustering performance on real datasets as
well as promising applications to image segmentation.Comment: ICML 2018. Code release: https://github.com/google/quickshif
Generalized Markov Chain Monte Carlo Initialization for Clustering Gaussian Mixtures Using K-means
Gaussian mixtures are considered to be a good estimate of real life data. Any clustering algorithm that can efficiently cluster such mixtures is expected to work well in practical applications dealing with real life data. K-means is popular for such applications given its ease of implementation and scalability; yet it suffers from the plague of poor seeding. Moreover, if the Gaussian mixture has overlapping clusters, k-means is not able to separate them if initial conditions are not good. Kmeans++ is a good seeding method with high time complexity. It can be made fast by using Markov chain Monte Carlo sampling. This paper proposes a method that improves seed quality and retains speed of sampling technique. The desired effects are demonstrated on several Gaussian mixtures
- …