265 research outputs found
Approximation and Streaming Algorithms for Projective Clustering via Random Projections
Let be a set of points in . In the projective
clustering problem, given and norm , we have to
compute a set of -dimensional flats such that is minimized; here
represents the (Euclidean) distance of to the closest flat in
. We let denote the minimal value and interpret
to be . When and
and , the problem corresponds to the -median, -mean and the
-center clustering problems respectively.
For every , and , we show that the
orthogonal projection of onto a randomly chosen flat of dimension
will -approximate
. This result combines the concepts of geometric coresets and
subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence,
an orthogonal projection of to an dimensional randomly chosen subspace
-approximates projective clusterings for every and
simultaneously. Note that the dimension of this subspace is independent of the
number of clusters~.
Using this dimension reduction result, we obtain new approximation and
streaming algorithms for projective clustering problems. For example, given a
stream of points, we show how to compute an -approximate
projective clustering for every and simultaneously using only
space. Compared to
standard streaming algorithms with space requirement, our approach
is a significant improvement when the number of input points and their
dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015
Measured descent: A new embedding method for finite metrics
We devise a new embedding technique, which we call measured descent, based on
decomposing a metric space locally, at varying speeds, according to the density
of some probability measure. This provides a refined and unified framework for
the two primary methods of constructing Frechet embeddings for finite metrics,
due to [Bourgain, 1985] and [Rao, 1999]. We prove that any n-point metric space
(X,d) embeds in Hilbert space with distortion O(sqrt{alpha_X log n}), where
alpha_X is a geometric estimate on the decomposability of X. As an immediate
corollary, we obtain an O(sqrt{(log lambda_X) \log n}) distortion embedding,
where \lambda_X is the doubling constant of X. Since \lambda_X\le n, this
result recovers Bourgain's theorem, but when the metric X is, in a sense,
``low-dimensional,'' improved bounds are achieved.
Our embeddings are volume-respecting for subsets of arbitrary size. One
consequence is the existence of (k, O(log n)) volume-respecting embeddings for
all 1 \leq k \leq n, which is the best possible, and answers positively a
question posed by U. Feige. Our techniques are also used to answer positively a
question of Y. Rabinovich, showing that any weighted n-point planar graph
embeds in l_\infty^{O(log n)} with O(1) distortion. The O(log n) bound on the
dimension is optimal, and improves upon the previously known bound of O((log
n)^2).Comment: 17 pages. No figures. Appeared in FOCS '04. To appeaer in Geometric &
Functional Analysis. This version fixes a subtle error in Section 2.
Is margin preserved after random projection?
Random projections have been applied in many machine learning algorithms.
However, whether margin is preserved after random projection is non-trivial and
not well studied. In this paper we analyse margin distortion after random
projection, and give the conditions of margin preservation for binary
classification problems. We also extend our analysis to margin for multiclass
problems, and provide theoretical bounds on multiclass margin on the projected
data.Comment: ICML201
- …