Search CORE

57,716 research outputs found

Net and Prune: A Linear Time Algorithm for Euclidean Distance Problems

Author: Har-Peled Sariel
Raichel Banjamin
Publication venue
Publication date: 25/09/2014
Field of study

We provide a general framework for getting expected linear time constant factor approximations (and in many cases FPTAS's) to several well known problems in Computational Geometry, such as

k

-center clustering and farthest nearest neighbor. The new approach is robust to variations in the input problem, and yet it is simple, elegant and practical. In particular, many of these well studied problems which fit easily into our framework, either previously had no linear time approximation algorithm, or required rather involved algorithms and analysis. A short list of the problems we consider include farthest nearest neighbor,

k

-center clustering, smallest disk enclosing

k

points,

k

th largest distance,

k

th smallest

m

-nearest neighbor distance,

k

th heaviest edge in the MST and other spanning forest type problems, problems involving upward closed set systems, and more. Finally, we show how to extend our framework such that the linear running time bound holds with high probability

arXiv.org e-Print Archive

CiteSeerX

Approximation and Streaming Algorithms for Projective Clustering via Random Projections

Author: Kerber Michael
Raghvendra Sharath
Publication venue
Publication date: 08/07/2014
Field of study

Let

P

be a set of

n

points in

\mathbb{R}^d

. In the projective clustering problem, given

k, q

and norm

\rho \in [1,\infty]

, we have to compute a set

\mathcal{F}

k

q

-dimensional flats such that

(\sum_{p\in P}d(p, \mathcal{F})^\rho)^{1/\rho}

is minimized; here

d(p, \mathcal{F})

represents the (Euclidean) distance of

p

to the closest flat in

\mathcal{F}

. We let

f_k^q(P,\rho)

denote the minimal value and interpret

f_k^q(P,\infty)

to be

\max_{r\in P}d(r, \mathcal{F})

. When

\rho=1,2

and

\infty

and

q=0

, the problem corresponds to the

k

-median,

k

-mean and the

k

-center clustering problems respectively. For every

0 < \epsilon < 1

S\subset P

and

\rho \ge 1

, we show that the orthogonal projection of

P

onto a randomly chosen flat of dimension

O(((q+1)^2\log(1/\epsilon)/\epsilon^3) \log n)

will

\epsilon

-approximate

f_1^q(S,\rho)

. This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of

P

to an

O(((q+1)^2 \log ((q+1)/\epsilon)/\epsilon^3) \log n)

dimensional randomly chosen subspace

\epsilon

-approximates projective clusterings for every

k

and

\rho

simultaneously. Note that the dimension of this subspace is independent of the number of clusters~

k

. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of

n

points, we show how to compute an

\epsilon

-approximate projective clustering for every

k

and

\rho

simultaneously using only

O((n+d)((q+1)^2\log ((q+1)/\epsilon))/\epsilon^3 \log n)

space. Compared to standard streaming algorithms with

\Omega(kd)

space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.Comment: Canadian Conference on Computational Geometry (CCCG 2015

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

On Variants of k-means Clustering

Author: Bandyapadhyay Sayan
Varadarajan Kasturi
Publication venue
Publication date: 09/12/2015
Field of study

\textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems, specifically \textit{

k

-means} clustering has got much attention from the researchers. Despite the fact that

k

-means is a very well studied problem its status in the plane is still an open problem. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound in polynomial time is 9+\eps. In this paper, we consider the following variant of

k

-means. Given a set

C

of points in

\mathcal{R}^d

and a real

f > 0

, find a finite set

F

of points in

\mathcal{R}^d

that minimizes the quantity

f*|F|+\sum_{p\in C} \min_{q \in F} {||p-q||}^2

. For any fixed dimension

d

, we design a local search PTAS for this problem. We also give a "bi-criterion" local search algorithm for

k

-means which uses (1+\eps)k centers and yields a solution whose cost is at most (1+\eps) times the cost of an optimal

k

-means solution. The algorithm runs in polynomial time for any fixed dimension. The contribution of this paper is two fold. On the one hand, we are being able to handle the square of distances in an elegant manner, which yields near optimal approximation bound. This leads us towards a better understanding of the

k

-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that very little is known about the local search method for geometric approximation.Comment: 15 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

The Bane of Low-Dimensionality Clustering

Author: Cohen-Addad Vincent
de Mesmay Arnaud
Rotenberg Eva
Roytman Alan
Publication venue
Publication date: 03/11/2017
Field of study

In this paper, we give a conditional lower bound of

n^{\Omega(k)}

on running time for the classic k-median and k-means clustering objectives (where n is the size of the input), even in low-dimensional Euclidean space of dimension four, assuming the Exponential Time Hypothesis (ETH). We also consider k-median (and k-means) with penalties where each point need not be assigned to a center, in which case it must pay a penalty, and extend our lower bound to at least three-dimensional Euclidean space. This stands in stark contrast to many other geometric problems such as the traveling salesman problem, or computing an independent set of unit spheres. While these problems benefit from the so-called (limited) blessing of dimensionality, as they can be solved in time

n^{O(k^{1-1/d})}

2^{n^{1-1/d}}

in d dimensions, our work shows that widely-used clustering objectives have a lower bound of

n^{\Omega(k)}

, even in dimension four. We complete the picture by considering the two-dimensional case: we show that there is no algorithm that solves the penalized version in time less than

n^{o(\sqrt{k})}

, and provide a matching upper bound of

n^{O(\sqrt{k})}

. The main tool we use to establish these lower bounds is the placement of points on the moment curve, which takes its inspiration from constructions of point sets yielding Delaunay complexes of high complexity

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Copenhagen University Research Information System

Online Research Database In Technology

Making Laplacians commute

Author: Bronstein Michael M.
Glashoff Klaus
Loring Terry A.
Publication venue
Publication date: 19/07/2013
Field of study

In this paper, we construct multimodal spectral geometry by finding a pair of closest commuting operators (CCO) to a given pair of Laplacians. The CCOs are jointly diagonalizable and hence have the same eigenbasis. Our construction naturally extends classical data analysis tools based on spectral geometry, such as diffusion maps and spectral clustering. We provide several synthetic and real examples of applications in dimensionality reduction, shape analysis, and clustering, demonstrating that our method better captures the inherent structure of multi-modal data

arXiv.org e-Print Archive

CiteSeerX

Regression on fixed-rank positive semidefinite matrices: a Riemannian approach

Author: Bonnabel Silvere
Meyer Gilles
Sepulchre Rodolphe
Publication venue
Publication date: 31/01/2011
Field of study

The paper addresses the problem of learning a regression model parameterized by a fixed-rank positive semidefinite matrix. The focus is on the nonlinear nature of the search space and on scalability to high-dimensional problems. The mathematical developments rely on the theory of gradient descent algorithms adapted to the Riemannian geometry that underlies the set of fixed-rank positive semidefinite matrices. In contrast with previous contributions in the literature, no restrictions are imposed on the range space of the learned matrix. The resulting algorithms maintain a linear complexity in the problem size and enjoy important invariance properties. We apply the proposed algorithms to the problem of learning a distance function parameterized by a positive semidefinite matrix. Good performance is observed on classical benchmarks

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège