On Generalization Bounds for Projective Clustering

Abstract

Given a set of points, clustering consists of finding a partition of a point set into kk clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous kk-median and kk-means objectives. One may also choose centers to be jj dimensional subspaces, which gives rise to subspace clustering. In this paper, we consider learning bounds for these problems. That is, given a set of nn samples PP drawn independently from some unknown, but fixed distribution D\mathcal{D}, how quickly does a solution computed on PP converge to the optimal clustering of D\mathcal{D}? We give several near optimal results. In particular, For center-based objectives, we show a convergence rate of O~(k/n)\tilde{O}\left(\sqrt{{k}/{n}}\right). This matches the known optimal bounds of [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] and [Bartlett, Linder, and Lugosi, IEEE Trans. Inf. Theory 1998] for kk-means and extends it to other important objectives such as kk-median. For subspace clustering with jj-dimensional subspaces, we show a convergence rate of O~(kj2n)\tilde{O}\left(\sqrt{\frac{kj^2}{n}}\right). These are the first provable bounds for most of these problems. For the specific case of projective clustering, which generalizes kk-means, we show a convergence rate of Ω(kjn)\Omega\left(\sqrt{\frac{kj}{n}}\right) is necessary, thereby proving that the bounds from [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] are essentially optimal

    Similar works

    Full text

    thumbnail-image

    Available Versions