9 research outputs found

    Proportionally Fair Clustering Revisited

    Get PDF

    Proportionally Representative Clustering

    Full text link
    In recent years, there has been a surge in effort to formalize notions of fairness in machine learning. We focus on clustering -- one of the fundamental tasks in unsupervised machine learning. We propose a new axiom ``proportional representation fairness'' (PRF) that is designed for clustering problems where the selection of centroids reflects the distribution of data points and how tightly they are clustered together. Our fairness concept is not satisfied by existing fair clustering algorithms. We design efficient algorithms to achieve PRF both for unconstrained and discrete clustering problems. Our algorithm for the unconstrained setting is also the first known polynomial-time approximation algorithm for the well-studied Proportional Fairness (PF) axiom (Chen, Fain, Lyu, and Munagala, ICML, 2019). Our algorithm for the discrete setting also matches the best known approximation factor for PF.Comment: Revised version includes a new author (Jeremy Vollen) and new results: Our algorithm for the unconstrained setting is also the first known polynomial-time approximation algorithm for the well-studied Proportional Fairness (PF) axiom (Chen, Fain, Lyu, and Munagala, ICML, 2019). Our algorithm for the discrete setting also matches the best known approximation factor for P

    Approximation Algorithms for Fair Range Clustering

    Full text link
    This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick kk centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of nn points in a metric space (P,d)(P,d) where each point belongs to one of the β„“\ell different demographics (i.e., P=P1⊎P2βŠŽβ‹―βŠŽPβ„“P = P_1 \uplus P_2 \uplus \cdots \uplus P_\ell) and a set of β„“\ell intervals [Ξ±1,Ξ²1],⋯ ,[Ξ±β„“,Ξ²β„“][\alpha_1, \beta_1], \cdots, [\alpha_\ell, \beta_\ell] on desired number of centers from each group, the goal is to pick a set of kk centers CC with minimum β„“p\ell_p-clustering cost (i.e., (βˆ‘v∈Pd(v,C)p)1/p(\sum_{v\in P} d(v,C)^p)^{1/p}) such that for each group iβˆˆβ„“i\in \ell, ∣C∩Pi∣∈[Ξ±i,Ξ²i]|C\cap P_i| \in [\alpha_i, \beta_i]. In particular, the fair range β„“p\ell_p-clustering captures fair range kk-center, kk-median and kk-means as its special cases. In this work, we provide efficient constant factor approximation algorithms for fair range β„“p\ell_p-clustering for all values of p∈[1,∞)p\in [1,\infty).Comment: ICML 202

    Proportional Fairness in Clustering: A Social Choice Perspective

    Full text link
    We study the proportional clustering problem of Chen et al. [ICML'19] and relate it to the area of multiwinner voting in computational social choice. We show that any clustering satisfying a weak proportionality notion of Brill and Peters [EC'23] simultaneously obtains the best known approximations to the proportional fairness notion of Chen et al. [ICML'19], but also to individual fairness [Jung et al., FORC'20] and the "core" [Li et al. ICML'21]. In fact, we show that any approximation to proportional fairness is also an approximation to individual fairness and vice versa. Finally, we also study stronger notions of proportional representation, in which deviations do not only happen to single, but multiple candidate centers, and show that stronger proportionality notions of Brill and Peters [EC'23] imply approximations to these stronger guarantees

    Proportional Representation in Metric Spaces and Low-Distortion Committee Selection

    Full text link
    We introduce a novel definition for a small set R of k points being "representative" of a larger set in a metric space. Given a set V (e.g., documents or voters) to represent, and a set C of possible representatives, our criterion requires that for any subset S comprising a theta fraction of V, the average distance of S to their best theta*k points in R should not be more than a factor gamma compared to their average distance to the best theta*k points among all of C. This definition is a strengthening of proportional fairness and core fairness, but - different from those notions - requires that large cohesive clusters be represented proportionally to their size. Since there are instances for which - unless gamma is polynomially large - no solutions exist, we study this notion in a resource augmentation framework, implicitly stating the constraints for a set R of size k as though its size were only k/alpha, for alpha > 1. Furthermore, motivated by the application to elections, we mostly focus on the "ordinal" model, where the algorithm does not learn the actual distances; instead, it learns only for each point v in V and each candidate pairs c, c' which of c, c' is closer to v. Our main result is that the Expanding Approvals Rule (EAR) of Aziz and Lee is (alpha, gamma) representative with gamma <= 1 + 6.71 * (alpha)/(alpha-1). Our results lead to three notable byproducts. First, we show that the EAR achieves constant proportional fairness in the ordinal model, giving the first positive result on metric proportional fairness with ordinal information. Second, we show that for the core fairness objective, the EAR achieves the same asymptotic tradeoff between resource augmentation and approximation as the recent results of Li et al., which used full knowledge of the metric. Finally, our results imply a very simple single-winner voting rule with metric distortion at most 44.Comment: 24 pages, Accepted to AAAI 2

    Multi-Winner Voting with Approval Preferences

    Get PDF
    From fundamental concepts and results to recent advances in computational social choice, this open access book provides a thorough and in-depth look at multi-winner voting based on approval preferences. The main focus is on axiomatic analysis, algorithmic results and several applications that are relevant in artificial intelligence, computer science and elections of any kind. What is the best way to select a set of candidates for a shortlist, for an executive committee, or for product recommendations? Multi-winner voting is the process of selecting a fixed-size set of candidates based on the preferences expressed by the voters. A wide variety of decision processes in settings ranging from politics (parliamentary elections) to the design of modern computer applications (collaborative filtering, dynamic Q&A platforms, diversity in search results, etc.) share the problem of identifying a representative subset of alternatives. The study of multi-winner voting provides the principled analysis of this task. Approval-based committee voting rules (in short: ABC rules) are multi-winner voting rules particularly suitable for practical use. Their usability is founded on the straightforward form in which the voters can express preferences: voters simply have to differentiate between approved and disapproved candidates. Proposals for ABC rules are numerous, some dating back to the late 19th century while others have been introduced only very recently. This book explains and discusses these rules, highlighting their individual strengths and weaknesses. With the help of this book, the reader will be able to choose a suitable ABC voting rule in a principled fashion, participate in, and be up to date with the ongoing research on this topic

    On Algorithmic Fairness and Stochastic Models for Combinatorial Optimization and Unsupervised Machine Learning

    Get PDF
    Combinatorial optimization and unsupervised machine learning problems have been extensively studied and are relatively well-understood. Examples of such problems that play a central role in this work are clustering problems and problems of finding cuts in graphs. The goal of the research presented in this dissertation is to introduce novel variants of the aforementioned problems, by generalizing their classic variants into two, not necessarily disjoint, directions. The first direction involves incorporating fairness aspects to a problem's specifications, and the second involves the introduction of some form of randomness in the problem definition, e.g., stochastic uncertainty about the problem's parameters. Fairness in the design of algorithms and in machine learning has received a significant amount of attention during the last few years, mainly due to the realization that standard optimization approaches can frequently lead to severely unfair outcomes, that can potentially hurt the individuals or the groups involved in the corresponding application. As far as considerations of fairness are concerned, in this work we begin by presenting two novel individually-fair clustering models, together with algorithms with provable guarantees for them. The first such model exploits randomness in order to provide fair solutions, while the second is purely deterministic. The high-level motivation behind both of them is trying to treat similar individuals similarly. Moving forward, we focus on a graph cut problem that captures situations of disaster containment in a network. For this problem we introduce two novel fair variants. The first variant focuses on demographic fairness, while the second considers a probabilistic notion of individual fairness. Again, we give algorithms with provable guarantees for the newly introduced variants. In the next part of this thesis we turn our attention to generalizing problems through the introduction of stochasticity. At first, we present algorithmic results for a computational epidemiology problem, whose goal is to control the stochastic diffusion of a disease in a contact network. This problem can be interpreted as a stochastic generalization of a static graph cut problem. Finally, this dissertation also includes work on a well-known paradigm in stochastic optimization, namely the two-stage stochastic setting with recourse. Two-stage problems capture a wide variety of applications revolving around the trade-off between provisioning and rapid response. In this setting, we present a family of clustering problems that had not yet been studied in the literature, and for this family we show novel algorithmic techniques that provide constant factor approximation algorithms. We conclude the dissertation with a discussion on open problems and future research directions in the general area of algorithmic fairness
    corecore