57,716 research outputs found
Net and Prune: A Linear Time Algorithm for Euclidean Distance Problems
We provide a general framework for getting expected linear time constant
factor approximations (and in many cases FPTAS's) to several well known
problems in Computational Geometry, such as -center clustering and farthest
nearest neighbor. The new approach is robust to variations in the input
problem, and yet it is simple, elegant and practical. In particular, many of
these well studied problems which fit easily into our framework, either
previously had no linear time approximation algorithm, or required rather
involved algorithms and analysis. A short list of the problems we consider
include farthest nearest neighbor, -center clustering, smallest disk
enclosing points, th largest distance, th smallest -nearest
neighbor distance, th heaviest edge in the MST and other spanning forest
type problems, problems involving upward closed set systems, and more. Finally,
we show how to extend our framework such that the linear running time bound
holds with high probability
Approximation and Streaming Algorithms for Projective Clustering via Random Projections
Let be a set of points in . In the projective
clustering problem, given and norm , we have to
compute a set of -dimensional flats such that is minimized; here
represents the (Euclidean) distance of to the closest flat in
. We let denote the minimal value and interpret
to be . When and
and , the problem corresponds to the -median, -mean and the
-center clustering problems respectively.
For every , and , we show that the
orthogonal projection of onto a randomly chosen flat of dimension
will -approximate
. This result combines the concepts of geometric coresets and
subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence,
an orthogonal projection of to an dimensional randomly chosen subspace
-approximates projective clusterings for every and
simultaneously. Note that the dimension of this subspace is independent of the
number of clusters~.
Using this dimension reduction result, we obtain new approximation and
streaming algorithms for projective clustering problems. For example, given a
stream of points, we show how to compute an -approximate
projective clustering for every and simultaneously using only
space. Compared to
standard streaming algorithms with space requirement, our approach
is a significant improvement when the number of input points and their
dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015
On Variants of k-means Clustering
\textit{Clustering problems} often arise in the fields like data mining,
machine learning etc. to group a collection of objects into similar groups with
respect to a similarity (or dissimilarity) measure. Among the clustering
problems, specifically \textit{-means} clustering has got much attention
from the researchers. Despite the fact that -means is a very well studied
problem its status in the plane is still an open problem. In particular, it is
unknown whether it admits a PTAS in the plane. The best known approximation
bound in polynomial time is 9+\eps.
In this paper, we consider the following variant of -means. Given a set
of points in and a real , find a finite set of
points in that minimizes the quantity . For any fixed dimension , we design a local
search PTAS for this problem. We also give a "bi-criterion" local search
algorithm for -means which uses (1+\eps)k centers and yields a solution
whose cost is at most (1+\eps) times the cost of an optimal -means
solution. The algorithm runs in polynomial time for any fixed dimension.
The contribution of this paper is two fold. On the one hand, we are being
able to handle the square of distances in an elegant manner, which yields near
optimal approximation bound. This leads us towards a better understanding of
the -means problem. On the other hand, our analysis of local search might
also be useful for other geometric problems. This is important considering that
very little is known about the local search method for geometric approximation.Comment: 15 page
The Bane of Low-Dimensionality Clustering
In this paper, we give a conditional lower bound of on
running time for the classic k-median and k-means clustering objectives (where
n is the size of the input), even in low-dimensional Euclidean space of
dimension four, assuming the Exponential Time Hypothesis (ETH). We also
consider k-median (and k-means) with penalties where each point need not be
assigned to a center, in which case it must pay a penalty, and extend our lower
bound to at least three-dimensional Euclidean space.
This stands in stark contrast to many other geometric problems such as the
traveling salesman problem, or computing an independent set of unit spheres.
While these problems benefit from the so-called (limited) blessing of
dimensionality, as they can be solved in time or
in d dimensions, our work shows that widely-used clustering
objectives have a lower bound of , even in dimension four.
We complete the picture by considering the two-dimensional case: we show that
there is no algorithm that solves the penalized version in time less than
, and provide a matching upper bound of .
The main tool we use to establish these lower bounds is the placement of
points on the moment curve, which takes its inspiration from constructions of
point sets yielding Delaunay complexes of high complexity
Making Laplacians commute
In this paper, we construct multimodal spectral geometry by finding a pair of
closest commuting operators (CCO) to a given pair of Laplacians. The CCOs are
jointly diagonalizable and hence have the same eigenbasis. Our construction
naturally extends classical data analysis tools based on spectral geometry,
such as diffusion maps and spectral clustering. We provide several synthetic
and real examples of applications in dimensionality reduction, shape analysis,
and clustering, demonstrating that our method better captures the inherent
structure of multi-modal data
Regression on fixed-rank positive semidefinite matrices: a Riemannian approach
The paper addresses the problem of learning a regression model parameterized
by a fixed-rank positive semidefinite matrix. The focus is on the nonlinear
nature of the search space and on scalability to high-dimensional problems. The
mathematical developments rely on the theory of gradient descent algorithms
adapted to the Riemannian geometry that underlies the set of fixed-rank
positive semidefinite matrices. In contrast with previous contributions in the
literature, no restrictions are imposed on the range space of the learned
matrix. The resulting algorithms maintain a linear complexity in the problem
size and enjoy important invariance properties. We apply the proposed
algorithms to the problem of learning a distance function parameterized by a
positive semidefinite matrix. Good performance is observed on classical
benchmarks
- …