45,512 research outputs found
Proportional Approval Voting, Harmonic k-median, and Negative Association
We study a generic framework that provides a unified view on two important classes of problems: (i) extensions of the k-median problem where clients are interested in having multiple facilities in their vicinity (e.g., due to the fact that, with some small probability, the closest facility might be malfunctioning and so might not be available for using), and (ii) finding winners according to some appealing multiwinner election rules, i.e., election system aimed for choosing representatives bodies, such as parliaments, based on preferences of a population of voters over individual candidates. Each problem in our framework is associated with a vector of weights: we show that the approximability of the problem depends on structural properties of these vectors. We specifically focus on the harmonic sequence of weights, since it results in particularly appealing properties of the considered problem. In particular, the objective function interpreted in a multiwinner election setup reflects to the well-known Proportional Approval Voting (PAV) rule.
Our main result is that, due to the specific (harmonic) structure of weights, the problem allows constant factor approximation. This is surprising since the problem can be interpreted as a variant of the k-median problem where we do not assume that the connection costs satisfy the triangle inequality. To the best of our knowledge this is the first constant factor approximation algorithm for a variant of k-median that does not require this assumption. The algorithm we propose is based on dependent rounding [Srinivasan, FOCS\u2701] applied to the solution of a natural LP-relaxation of the problem. The rounding process is well known to produce distributions over integral solutions satisfying Negative Correlation (NC), which is usually sufficient for the analysis of approximation guarantees offered by rounding procedures. In our analysis, however, we need to use the fact that the carefully implemented rounding process satisfies a stronger property, called Negative Association (NA), which allows us to apply standard concentration bounds for conditional random variables
Learning mixtures of separated nonspherical Gaussians
Mixtures of Gaussian (or normal) distributions arise in a variety of
application areas. Many heuristics have been proposed for the task of finding
the component Gaussians given samples from the mixture, such as the EM
algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy.
Statist. Soc. Ser. B 39 (1977) 1-38]. These do not provably run in polynomial
time. We present the first algorithm that provably learns the component
Gaussians in time that is polynomial in the dimension. The Gaussians may have
arbitrary shape, but they must satisfy a ``separation condition'' which places
a lower bound on the distance between the centers of any two component
Gaussians. The mathematical results at the heart of our proof are ``distance
concentration'' results--proved using isoperimetric inequalities--which
establish bounds on the probability distribution of the distance between a pair
of points generated according to the mixture. We also formalize the more
general problem of max-likelihood fit of a Gaussian mixture to unstructured
data.Comment: Published at http://dx.doi.org/10.1214/105051604000000512 in the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
Concentration for independent random variables with heavy tails
If a random variable is not exponentially integrable, it is known that no
concentration inequality holds for an infinite sequence of independent copies.
Under mild conditions, we establish concentration inequalities for finite
sequences of independent copies, with good dependence in
Transport Inequalities. A Survey
This is a survey of recent developments in the area of transport
inequalities. We investigate their consequences in terms of concentration and
deviation inequalities and sketch their links with other functional
inequalities and also large deviation theory.Comment: Proceedings of the conference Inhomogeneous Random Systems 2009; 82
pages
Dependent randomized rounding for clustering and partition systems with knapsack constraints
Clustering problems are fundamental to unsupervised learning. There is an
increased emphasis on fairness in machine learning and AI; one representative
notion of fairness is that no single demographic group should be
over-represented among the cluster-centers. This, and much more general
clustering problems, can be formulated with "knapsack" and "partition"
constraints. We develop new randomized algorithms targeting such problems, and
study two in particular: multi-knapsack median and multi-knapsack center. Our
rounding algorithms give new approximation and pseudo-approximation algorithms
for these problems. One key technical tool, which may be of independent
interest, is a new tail bound analogous to Feige (2006) for sums of random
variables with unbounded variances. Such bounds are very useful in inferring
properties of large networks using few samples
- …