9 research outputs found
Proportionally Representative Clustering
In recent years, there has been a surge in effort to formalize notions of
fairness in machine learning. We focus on clustering -- one of the fundamental
tasks in unsupervised machine learning. We propose a new axiom ``proportional
representation fairness'' (PRF) that is designed for clustering problems where
the selection of centroids reflects the distribution of data points and how
tightly they are clustered together. Our fairness concept is not satisfied by
existing fair clustering algorithms. We design efficient algorithms to achieve
PRF both for unconstrained and discrete clustering problems. Our algorithm for
the unconstrained setting is also the first known polynomial-time approximation
algorithm for the well-studied Proportional Fairness (PF) axiom (Chen, Fain,
Lyu, and Munagala, ICML, 2019). Our algorithm for the discrete setting also
matches the best known approximation factor for PF.Comment: Revised version includes a new author (Jeremy Vollen) and new
results: Our algorithm for the unconstrained setting is also the first known
polynomial-time approximation algorithm for the well-studied Proportional
Fairness (PF) axiom (Chen, Fain, Lyu, and Munagala, ICML, 2019). Our
algorithm for the discrete setting also matches the best known approximation
factor for P
Approximation Algorithms for Fair Range Clustering
This paper studies the fair range clustering problem in which the data points
are from different demographic groups and the goal is to pick centers with
the minimum clustering cost such that each group is at least minimally
represented in the centers set and no group dominates the centers set. More
precisely, given a set of points in a metric space where each point
belongs to one of the different demographics (i.e., ) and a set of intervals on desired number of centers from
each group, the goal is to pick a set of centers with minimum
-clustering cost (i.e., ) such that for
each group , . In particular,
the fair range -clustering captures fair range -center, -median
and -means as its special cases. In this work, we provide efficient constant
factor approximation algorithms for fair range -clustering for all
values of .Comment: ICML 202
Proportional Fairness in Clustering: A Social Choice Perspective
We study the proportional clustering problem of Chen et al. [ICML'19] and
relate it to the area of multiwinner voting in computational social choice. We
show that any clustering satisfying a weak proportionality notion of Brill and
Peters [EC'23] simultaneously obtains the best known approximations to the
proportional fairness notion of Chen et al. [ICML'19], but also to individual
fairness [Jung et al., FORC'20] and the "core" [Li et al. ICML'21]. In fact, we
show that any approximation to proportional fairness is also an approximation
to individual fairness and vice versa. Finally, we also study stronger notions
of proportional representation, in which deviations do not only happen to
single, but multiple candidate centers, and show that stronger proportionality
notions of Brill and Peters [EC'23] imply approximations to these stronger
guarantees
Proportional Representation in Metric Spaces and Low-Distortion Committee Selection
We introduce a novel definition for a small set R of k points being
"representative" of a larger set in a metric space. Given a set V (e.g.,
documents or voters) to represent, and a set C of possible representatives, our
criterion requires that for any subset S comprising a theta fraction of V, the
average distance of S to their best theta*k points in R should not be more than
a factor gamma compared to their average distance to the best theta*k points
among all of C. This definition is a strengthening of proportional fairness and
core fairness, but - different from those notions - requires that large
cohesive clusters be represented proportionally to their size.
Since there are instances for which - unless gamma is polynomially large - no
solutions exist, we study this notion in a resource augmentation framework,
implicitly stating the constraints for a set R of size k as though its size
were only k/alpha, for alpha > 1. Furthermore, motivated by the application to
elections, we mostly focus on the "ordinal" model, where the algorithm does not
learn the actual distances; instead, it learns only for each point v in V and
each candidate pairs c, c' which of c, c' is closer to v. Our main result is
that the Expanding Approvals Rule (EAR) of Aziz and Lee is (alpha, gamma)
representative with gamma <= 1 + 6.71 * (alpha)/(alpha-1).
Our results lead to three notable byproducts. First, we show that the EAR
achieves constant proportional fairness in the ordinal model, giving the first
positive result on metric proportional fairness with ordinal information.
Second, we show that for the core fairness objective, the EAR achieves the same
asymptotic tradeoff between resource augmentation and approximation as the
recent results of Li et al., which used full knowledge of the metric. Finally,
our results imply a very simple single-winner voting rule with metric
distortion at most 44.Comment: 24 pages, Accepted to AAAI 2
Multi-Winner Voting with Approval Preferences
From fundamental concepts and results to recent advances in computational social choice, this open access book provides a thorough and in-depth look at multi-winner voting based on approval preferences. The main focus is on axiomatic analysis, algorithmic results and several applications that are relevant in artificial intelligence, computer science and elections of any kind. What is the best way to select a set of candidates for a shortlist, for an executive committee, or for product recommendations? Multi-winner voting is the process of selecting a fixed-size set of candidates based on the preferences expressed by the voters. A wide variety of decision processes in settings ranging from politics (parliamentary elections) to the design of modern computer applications (collaborative filtering, dynamic Q&A platforms, diversity in search results, etc.) share the problem of identifying a representative subset of alternatives. The study of multi-winner voting provides the principled analysis of this task. Approval-based committee voting rules (in short: ABC rules) are multi-winner voting rules particularly suitable for practical use. Their usability is founded on the straightforward form in which the voters can express preferences: voters simply have to differentiate between approved and disapproved candidates. Proposals for ABC rules are numerous, some dating back to the late 19th century while others have been introduced only very recently. This book explains and discusses these rules, highlighting their individual strengths and weaknesses. With the help of this book, the reader will be able to choose a suitable ABC voting rule in a principled fashion, participate in, and be up to date with the ongoing research on this topic
On Algorithmic Fairness and Stochastic Models for Combinatorial Optimization and Unsupervised Machine Learning
Combinatorial optimization and unsupervised machine learning problems have been extensively studied and are relatively well-understood. Examples of such problems that play a central role in this work are clustering problems and problems of finding cuts in graphs. The goal of the research presented in this dissertation is to introduce novel variants of the aforementioned problems, by generalizing their classic variants into two, not necessarily disjoint, directions. The first direction involves incorporating fairness aspects to a problem's specifications, and the second involves the introduction of some form of randomness in the problem definition, e.g., stochastic uncertainty about the problem's parameters.
Fairness in the design of algorithms and in machine learning has received a significant amount of attention during the last few years, mainly due to the realization that standard optimization approaches can frequently lead to severely unfair outcomes, that can potentially hurt the individuals or the groups involved in the corresponding application. As far as considerations of fairness are concerned, in this work we begin by presenting two novel individually-fair clustering models, together with algorithms with provable guarantees for them. The first such model exploits randomness in order to provide fair solutions, while the second is purely deterministic. The high-level motivation behind both of them is trying to treat similar individuals similarly. Moving forward, we focus on a graph cut problem that captures situations of disaster containment in a network. For this problem we introduce two novel fair variants. The first variant focuses on demographic fairness, while the second considers a probabilistic notion of individual fairness. Again, we give algorithms with provable guarantees for the newly introduced variants.
In the next part of this thesis we turn our attention to generalizing problems through the introduction of stochasticity. At first, we present algorithmic results for a computational epidemiology problem, whose goal is to control the stochastic diffusion of a disease in a contact network. This problem can be interpreted as a stochastic generalization of a static graph cut problem. Finally, this dissertation also includes work on a well-known paradigm in stochastic optimization, namely the two-stage stochastic setting with recourse. Two-stage problems capture a wide variety of applications revolving around the trade-off between provisioning and rapid response. In this setting, we present a family of clustering problems that had not yet been studied in the literature, and for this family we show novel algorithmic techniques that provide constant factor approximation algorithms.
We conclude the dissertation with a discussion on open problems and future research directions in the general area of algorithmic fairness