50,957 research outputs found
Dependent randomized rounding for clustering and partition systems with knapsack constraints
Clustering problems are fundamental to unsupervised learning. There is an
increased emphasis on fairness in machine learning and AI; one representative
notion of fairness is that no single demographic group should be
over-represented among the cluster-centers. This, and much more general
clustering problems, can be formulated with "knapsack" and "partition"
constraints. We develop new randomized algorithms targeting such problems, and
study two in particular: multi-knapsack median and multi-knapsack center. Our
rounding algorithms give new approximation and pseudo-approximation algorithms
for these problems. One key technical tool, which may be of independent
interest, is a new tail bound analogous to Feige (2006) for sums of random
variables with unbounded variances. Such bounds are very useful in inferring
properties of large networks using few samples
Optimistic Concurrency Control for Distributed Unsupervised Learning
Research on distributed machine learning algorithms has focused primarily on
one of two extremes - algorithms that obey strict concurrency constraints or
algorithms that obey few or no such constraints. We consider an intermediate
alternative in which algorithms optimistically assume that conflicts are
unlikely and if conflicts do arise a conflict-resolution protocol is invoked.
We view this "optimistic concurrency control" paradigm as particularly
appropriate for large-scale machine learning algorithms, particularly in the
unsupervised setting. We demonstrate our approach in three problem areas:
clustering, feature learning and online facility location. We evaluate our
methods via large-scale experiments in a cluster computing environment.Comment: 25 pages, 5 figure
The Bane of Low-Dimensionality Clustering
In this paper, we give a conditional lower bound of on
running time for the classic k-median and k-means clustering objectives (where
n is the size of the input), even in low-dimensional Euclidean space of
dimension four, assuming the Exponential Time Hypothesis (ETH). We also
consider k-median (and k-means) with penalties where each point need not be
assigned to a center, in which case it must pay a penalty, and extend our lower
bound to at least three-dimensional Euclidean space.
This stands in stark contrast to many other geometric problems such as the
traveling salesman problem, or computing an independent set of unit spheres.
While these problems benefit from the so-called (limited) blessing of
dimensionality, as they can be solved in time or
in d dimensions, our work shows that widely-used clustering
objectives have a lower bound of , even in dimension four.
We complete the picture by considering the two-dimensional case: we show that
there is no algorithm that solves the penalized version in time less than
, and provide a matching upper bound of .
The main tool we use to establish these lower bounds is the placement of
points on the moment curve, which takes its inspiration from constructions of
point sets yielding Delaunay complexes of high complexity
- …