Location of Repository

Given a set F of n positive functions over a ground set X, we consider the problem of computing x ∗ that minimizes the expression ∑ f∈F f(x), over x ∈ X. A typical application is shape fitting, where we wish to approximate a set P of n elements (say, points) by a shape x from a (possibly infinite) family X of shapes. Here, each point p ∈ P corresponds to a function f such that f(x) is the distance from p to x, and we seek a shape x that minimizes the sum of distances from each point in P. In the k-clustering variant, each x ∈ X is a tuple ofk shapes, andf(x) is the distance frompto its closest shape inx. Our main result is a unified framework for constructing coresets and approximate clustering for such general sets of functions. To achieve our results, we forge a link between the classic and well defined notion of ε-approximations from the theory of PAC Learning and VC dimension, to the relatively new (and not so consistent) paradigm of coresets, which are some kind of “compressed representation " of the input set F. Using traditional techniques, a coreset usually implies an LTAS (linear time approximation scheme) for the corresponding optimization problem, which can be computed in parallel, via one pass over the data, and using only polylogarithmic space (i.e, in the streaming model). For several function families F for which coresets are known not to exist, or the corresponding (approximate) optimization problems are hard, our framework yields bicriteria approximations, or coresets that are large, but contained in a low-dimensional space. We demonstrate our unified framework by applying it on projective clustering problems. We obtain new coreset constructions and significantly smaller coresets, over the ones tha

Year: 2011

OAI identifier:
oai:CiteSeerX.psu:10.1.1.353.4460

Provided by:
CiteSeerX

Download PDF:To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.