Search CORE

32 research outputs found

An $O^*(1.0821^n)$ -Time Algorithm for Computing Maximum Independent Set in Graphs with Bounded Degree 3

Author: Issac Davis
Jaiswal Ragesh
Publication venue
Publication date: 12/04/2015
Field of study

We give an

O^*(1.0821^n)

-time, polynomial space algorithm for computing Maximum Independent Set in graphs with bounded degree 3. This improves all the previous running time bounds known for the problem

arXiv.org e-Print Archive

CiteSeerX

A Quantum Approximation Scheme for k-Means

Author: Jaiswal Ragesh
Publication venue
Publication date: 16/08/2023
Field of study

We give a quantum approximation scheme (i.e.,

(1 + \varepsilon)

-approximation for every

\varepsilon > 0

) for the classical

k

-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset

V

with

N

points in

\mathbb{R}^d

stored in QRAM data structure, our quantum algorithm runs in time

\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)

and with high probability outputs a set

C

k

centers such that

cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})

. Here

C_{OPT}

denotes the optimal

k

-centers,

cost(.)

denotes the standard

k

-means cost function (i.e., the sum of the squared distance of points to the closest center), and

\eta

is the aspect ratio (i.e., the ratio of maximum distance to minimum distance). This is the first quantum algorithm with a polylogarithmic running time that gives a provable approximation guarantee of

(1+\varepsilon)

for the

k

-means problem. Also, unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines and has a running time independent of parameters (e.g., condition number) that appear in such procedures

arXiv.org e-Print Archive

A simple D^2-sampling based PTAS for k-means and other Clustering Problems

Author: Jaiswal Ragesh
Kumar Amit
Sen Sandeep
Publication venue
Publication date: 20/01/2012
Field of study

Given a set of points

P \subset \mathbb{R}^d

, the

k

-means clustering problem is to find a set of

k

{\em centers}

C = \{c_1,...,c_k\}, c_i \in \mathbb{R}^d,

such that the objective function

\sum_{x \in P} d(x,C)^2

, where

d(x,C)

denotes the distance between

x

and the closest center in

C

, is minimized. This is one of the most prominent objective functions that have been studied with respect to clustering.

D^2

-sampling \cite{ArthurV07} is a simple non-uniform sampling technique for choosing points from a set of points. It works as follows: given a set of points

P \subseteq \mathbb{R}^d

, the first point is chosen uniformly at random from

P

. Subsequently, a point from

P

is chosen as the next sample with probability proportional to the square of the distance of this point to the nearest previously sampled points.

D^2

-sampling has been shown to have nice properties with respect to the

k

-means clustering problem. Arthur and Vassilvitskii \cite{ArthurV07} show that

k

points chosen as centers from

P

using

D^2

-sampling gives an

O(\log{k})

approximation in expectation. Ailon et. al. \cite{AJMonteleoni09} and Aggarwal et. al. \cite{AggarwalDK09} extended results of \cite{ArthurV07} to show that

O(k)

points chosen as centers using

D^2

-sampling give

O(1)

approximation to the

k

-means objective function with high probability. In this paper, we further demonstrate the power of

D^2

-sampling by giving a simple randomized

(1 + \epsilon)

-approximation algorithm that uses the

D^2

-sampling in its core

arXiv.org e-Print Archive

CiteSeerX

Hardness of Approximation for Euclidean k-Median

Author: Bhattacharya Anup
Goyal Dishant
Jaiswal Ragesh
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)
Publication date: 09/11/2020
Field of study

The Euclidean k-median problem is defined in the following manner: given a set ? of n points in d-dimensional Euclidean space ?^d, and an integer k, find a set C ? ?^d of k points (called centers) such that the cost function ?(C,?) ? ?_{x ? ?} min_{c ? C} ?x-c?? is minimized. The Euclidean k-means problem is defined similarly by replacing the distance with squared Euclidean distance in the cost function. Various hardness of approximation results are known for the Euclidean k-means problem [Pranjal Awasthi et al., 2015; Euiwoong Lee et al., 2017; Vincent Cohen{-}Addad and {Karthik {C. S.}}, 2019]. However, no hardness of approximation result was known for the Euclidean k-median problem. In this work, assuming the unique games conjecture (UGC), we provide the hardness of approximation result for the Euclidean k-median problem in O(log k) dimensional space. This solves an open question posed explicitly in the work of Awasthi et al. [Pranjal Awasthi et al., 2015]. Furthermore, we study the hardness of approximation for the Euclidean k-means/k-median problems in the bi-criteria setting where an algorithm is allowed to choose more than k centers. That is, bi-criteria approximation algorithms are allowed to output ? k centers (for constant ? > 1) and the approximation ratio is computed with respect to the optimal k-means/k-median cost. We show the hardness of bi-criteria approximation result for the Euclidean k-median problem for any ? < 1.015, assuming UGC. We also show a similar hardness of bi-criteria approximation result for the Euclidean k-means problem with a stronger bound of ? < 1.28, again assuming UGC

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

FPT Approximation for Constrained Metric k-Median/Means

Author: Goyal Dishant
Jaiswal Ragesh
Kumar Amit
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 15th International Symposium on Parameterized and Exact Computation (IPEC 2020)
Publication date: 01/01/2020
Field of study

The Metric

k

-median problem over a metric space

(\mathcal{X}, d)

is defined as follows: given a set

L \subseteq \mathcal{X}

of facility locations and a set

C \subseteq \mathcal{X}

of clients, open a set

F \subseteq L

k

facilities such that the total service cost, defined as

\Phi(F, C) \equiv \sum_{x \in C} \min_{f \in F} d(x, f)

, is minimised. The metric

k

-means problem is defined similarly using squared distances. In many applications there are additional constraints that any solution needs to satisfy. This gives rise to different constrained versions of the problem such as

r

-gather, fault-tolerant, outlier

k

-means/

k

-median problem. Surprisingly, for many of these constrained problems, no constant-approximation algorithm is known. We give FPT algorithms with constant approximation guarantee for a range of constrained

k

-median/means problems. For some of the constrained problems, ours is the first constant factor approximation algorithm whereas for others, we improve or match the approximation guarantee of previous works. We work within the unified framework of Ding and Xu that allows us to simultaneously obtain algorithms for a range of constrained problems. In particular, we obtain a

(3+\varepsilon)

-approximation and

(9+\varepsilon)

-approximation for the constrained versions of the

k

-median and

k

-means problem respectively in FPT time. In many practical settings of the

k

-median/means problem, one is allowed to open a facility at any client location, i.e.,

C \subseteq L

. For this special case, our algorithm gives a

(2+\varepsilon)

-approximation and

(4+\varepsilon)

-approximation for the constrained versions of

k

-median and

k

-means problem respectively in FPT time. Since our algorithm is based on simple sampling technique, it can also be converted to a constant-pass log-space streaming algorithm

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Faster Algorithms for the Constrained k-Means Problem

Author: Bhattacharya Anup
Jaiswal Ragesh
Kumar Amit
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016)
Publication date: 10/04/2015
Field of study

The classical center based clustering problems such as k-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. For instance, consider the r-gather clustering problem where there is an additional constraint that each of the clusters should have at least r points or the capacitated clustering problem where there is an upper bound on the cluster sizes. Consider a variant of the k-means problem that may be regarded as a general version of such problems. Here, the optimal clusters O_1, ..., O_k are an arbitrary partition of the dataset and the goal is to output k-centers c_1, ..., c_k such that the objective function sum_{i=1}^{k} sum_{x in O_{i}} ||x - c_{i}||^2 is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of k centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such k centers such that at least one of these k centers behaves well. Given an error parameter epsilon > 0, let l denote the size of the smallest list of k-centers such that at least one of the k-centers gives a (1+epsilon) approximation w.r.t. the objective function above. In this paper, we show an upper bound on l by giving a randomized algorithm that outputs a list of 2^{~O(k/epsilon)} k-centers. We also give a closely matching lower bound of 2^{~Omega(k/sqrt{epsilon})}. Moreover, our algorithm runs in time O(n * d * 2^{~O(k/epsilon)}). This is a significant improvement over the previous result of Ding and Xu who gave an algorithm with running time O(n * d * (log{n})^{k} * 2^{poly(k/epsilon)}) and output a list of size O((log{n})^k * 2^{poly(k/epsilon)}). Our techniques generalize for the k-median problem and for many other settings where non-Euclidean distance measures are involved

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

On the Distribution of the Fourier Spectrum of Halfspaces

Author: Diakonikolas Ilias
Jaiswal Ragesh
Servedio Rocco A.
Tan Li-Yang
Wan Andrew
Publication venue
Publication date: 01/01/2012
Field of study

Bourgain showed that any noise stable Boolean function

f

can be well-approximated by a junta. In this note we give an exponential sharpening of the parameters of Bourgain's result under the additional assumption that

f

is a halfspace

arXiv.org e-Print Archive

CiteSeerX

Approximate Clustering with Same-Cluster Queries

Author: Ailon Nir
Bhattacharya Anup
Jaiswal Ragesh
Kumar Amit
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)
Publication date: 04/10/2017
Field of study

Ashtiani et al. proposed a Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to make adaptive queries to a domain expert. The queries are of the kind "do two given points belong to the same optimal cluster?", where the answers to these queries are assumed to be consistent with a unique optimal solution. There are many clustering contexts where such same cluster queries are feasible. Ashtiani et al. exhibited the power of such queries by showing that any instance of the k-means clustering problem, with additional margin assumption, can be solved efficiently if one is allowed to make O(k^2 log{k} + k log{n}) same-cluster queries. This is interesting since the k-means problem, even with the margin assumption, is NP-hard. In this paper, we extend the work of Ashtiani et al. to the approximation setting by showing that a few of such same-cluster queries enables one to get a polynomial-time (1+eps)-approximation algorithm for the k-means problem without any margin assumption on the input dataset. Again, this is interesting since the k-means problem is NP-hard to approximate within a factor (1+c) for a fixed constant 0 < c < 1. The number of same-cluster queries used by the algorithm is poly(k/eps) which is independent of the size n of the dataset. Our algorithm is based on the D^2-sampling technique, also known as the k-means++ seeding algorithm. We also give a conditional lower bound on the number of same-cluster queries showing that if the Exponential Time Hypothesis (ETH) holds, then any such efficient query algorithm needs to make Omega (k/poly log k) same-cluster queries. Our algorithm can be extended for the case where the query answers are wrong with some bounded probability. Another result we show for the k-means++ seeding is that a small modification of the k-means++ seeding within the SSAC framework converts it to a constant factor approximation algorithm instead of the well known O(log k)-approximation algorithm

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server