Search CORE

265 research outputs found

Approximation and Streaming Algorithms for Projective Clustering via Random Projections

Author: Kerber Michael
Raghvendra Sharath
Publication venue
Publication date: 08/07/2014
Field of study

Let

P

be a set of

n

points in

\mathbb{R}^d

. In the projective clustering problem, given

k, q

and norm

\rho \in [1,\infty]

, we have to compute a set

\mathcal{F}

k

q

-dimensional flats such that

(\sum_{p\in P}d(p, \mathcal{F})^\rho)^{1/\rho}

is minimized; here

d(p, \mathcal{F})

represents the (Euclidean) distance of

p

to the closest flat in

\mathcal{F}

. We let

f_k^q(P,\rho)

denote the minimal value and interpret

f_k^q(P,\infty)

to be

\max_{r\in P}d(r, \mathcal{F})

. When

\rho=1,2

and

\infty

and

q=0

, the problem corresponds to the

k

-median,

k

-mean and the

k

-center clustering problems respectively. For every

0 < \epsilon < 1

S\subset P

and

\rho \ge 1

, we show that the orthogonal projection of

P

onto a randomly chosen flat of dimension

O(((q+1)^2\log(1/\epsilon)/\epsilon^3) \log n)

will

\epsilon

-approximate

f_1^q(S,\rho)

. This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of

P

to an

O(((q+1)^2 \log ((q+1)/\epsilon)/\epsilon^3) \log n)

dimensional randomly chosen subspace

\epsilon

-approximates projective clusterings for every

k

and

\rho

simultaneously. Note that the dimension of this subspace is independent of the number of clusters~

k

. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of

n

points, we show how to compute an

\epsilon

-approximate projective clustering for every

k

and

\rho

simultaneously using only

O((n+d)((q+1)^2\log ((q+1)/\epsilon))/\epsilon^3 \log n)

space. Compared to standard streaming algorithms with

\Omega(kd)

space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

Measured descent: A new embedding method for finite metrics

Author: Krauthgamer Robert
Lee James R.
Mendel Manor
Naor Assaf
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Frechet embeddings for finite metrics, due to [Bourgain, 1985] and [Rao, 1999]. We prove that any n-point metric space (X,d) embeds in Hilbert space with distortion O(sqrt{alpha_X log n}), where alpha_X is a geometric estimate on the decomposability of X. As an immediate corollary, we obtain an O(sqrt{(log lambda_X) \log n}) distortion embedding, where \lambda_X is the doubling constant of X. Since \lambda_X\le n, this result recovers Bourgain's theorem, but when the metric X is, in a sense, ``low-dimensional,'' improved bounds are achieved. Our embeddings are volume-respecting for subsets of arbitrary size. One consequence is the existence of (k, O(log n)) volume-respecting embeddings for all 1 \leq k \leq n, which is the best possible, and answers positively a question posed by U. Feige. Our techniques are also used to answer positively a question of Y. Rabinovich, showing that any weighted n-point planar graph embeds in l_\infty^{O(log n)} with O(1) distortion. The O(log n) bound on the dimension is optimal, and improves upon the previously known bound of O((log n)^2).Comment: 17 pages. No figures. Appeared in FOCS '04. To appeaer in Geometric & Functional Analysis. This version fixes a subtle error in Section 2.

arXiv.org e-Print Archive

CiteSeerX

Crossref

Is margin preserved after random projection?

Author: Hengel Anton van den
Hill Rhys
Shen Chunhua
Shi Qinfeng
Publication venue
Publication date: 01/01/2012
Field of study

Random projections have been applied in many machine learning algorithms. However, whether margin is preserved after random projection is non-trivial and not well studied. In this paper we analyse margin distortion after random projection, and give the conditions of margin preservation for binary classification problems. We also extend our analysis to margin for multiclass problems, and provide theoretical bounds on multiclass margin on the projected data.Comment: ICML201

arXiv.org e-Print Archive

CiteSeerX

Adelaide Research & Scholarship