Search CORE

50,957 research outputs found

Dependent randomized rounding for clustering and partition systems with knapsack constraints

Author: Harris David G.
Pensyl Thomas
Srinivasan Aravind
Trinh Khoa
Publication venue
Publication date: 01/04/2020
Field of study

Clustering problems are fundamental to unsupervised learning. There is an increased emphasis on fairness in machine learning and AI; one representative notion of fairness is that no single demographic group should be over-represented among the cluster-centers. This, and much more general clustering problems, can be formulated with "knapsack" and "partition" constraints. We develop new randomized algorithms targeting such problems, and study two in particular: multi-knapsack median and multi-knapsack center. Our rounding algorithms give new approximation and pseudo-approximation algorithms for these problems. One key technical tool, which may be of independent interest, is a new tail bound analogous to Feige (2006) for sums of random variables with unbounded variances. Such bounds are very useful in inferring properties of large networks using few samples

arXiv.org e-Print Archive

Optimistic Concurrency Control for Distributed Unsupervised Learning

Author: Broderick Tamara
Gonzalez Joseph E.
Jegelka Stefanie
Jordan Michael I.
Pan Xinghao
Publication venue
Publication date: 01/01/2013
Field of study

Research on distributed machine learning algorithms has focused primarily on one of two extremes - algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this "optimistic concurrency control" paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting. We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. We evaluate our methods via large-scale experiments in a cluster computing environment.Comment: 25 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

The Bane of Low-Dimensionality Clustering

Author: Cohen-Addad Vincent
de Mesmay Arnaud
Rotenberg Eva
Roytman Alan
Publication venue
Publication date: 03/11/2017
Field of study

In this paper, we give a conditional lower bound of

n^{\Omega(k)}

on running time for the classic k-median and k-means clustering objectives (where n is the size of the input), even in low-dimensional Euclidean space of dimension four, assuming the Exponential Time Hypothesis (ETH). We also consider k-median (and k-means) with penalties where each point need not be assigned to a center, in which case it must pay a penalty, and extend our lower bound to at least three-dimensional Euclidean space. This stands in stark contrast to many other geometric problems such as the traveling salesman problem, or computing an independent set of unit spheres. While these problems benefit from the so-called (limited) blessing of dimensionality, as they can be solved in time

n^{O(k^{1-1/d})}

2^{n^{1-1/d}}

in d dimensions, our work shows that widely-used clustering objectives have a lower bound of

n^{\Omega(k)}

, even in dimension four. We complete the picture by considering the two-dimensional case: we show that there is no algorithm that solves the penalized version in time less than

n^{o(\sqrt{k})}

, and provide a matching upper bound of

n^{O(\sqrt{k})}

. The main tool we use to establish these lower bounds is the placement of points on the moment curve, which takes its inspiration from constructions of point sets yielding Delaunay complexes of high complexity

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Copenhagen University Research Information System

Online Research Database In Technology