4,366 research outputs found

    Proportionally Fair Clustering Revisited

    Get PDF

    Proportionally Representative Clustering

    Full text link
    In recent years, there has been a surge in effort to formalize notions of fairness in machine learning. We focus on clustering -- one of the fundamental tasks in unsupervised machine learning. We propose a new axiom ``proportional representation fairness'' (PRF) that is designed for clustering problems where the selection of centroids reflects the distribution of data points and how tightly they are clustered together. Our fairness concept is not satisfied by existing fair clustering algorithms. We design efficient algorithms to achieve PRF both for unconstrained and discrete clustering problems. Our algorithm for the unconstrained setting is also the first known polynomial-time approximation algorithm for the well-studied Proportional Fairness (PF) axiom (Chen, Fain, Lyu, and Munagala, ICML, 2019). Our algorithm for the discrete setting also matches the best known approximation factor for PF.Comment: Revised version includes a new author (Jeremy Vollen) and new results: Our algorithm for the unconstrained setting is also the first known polynomial-time approximation algorithm for the well-studied Proportional Fairness (PF) axiom (Chen, Fain, Lyu, and Munagala, ICML, 2019). Our algorithm for the discrete setting also matches the best known approximation factor for P

    Service in Your Neighborhood: Fairness in Center Location

    Get PDF
    When selecting locations for a set of centers, standard clustering algorithms may place unfair burden on some individuals and neighborhoods. We formulate a fairness concept that takes local population densities into account. In particular, given k centers to locate and a population of size n, we define the "neighborhood radius" of an individual i as the minimum radius of a ball centered at i that contains at least n/k individuals. Our objective is to ensure that each individual has a center that is within at most a small constant factor of her neighborhood radius. We present several theoretical results: We show that optimizing this factor is NP-hard; we give an approximation algorithm that guarantees a factor of at most 2 in all metric spaces; and we prove matching lower bounds in some metric spaces. We apply a variant of this algorithm to real-world address data, showing that it is quite different from standard clustering algorithms and outperforms them on our objective function and balances the load between centers more evenly

    Proportional Fairness in Clustering: A Social Choice Perspective

    Full text link
    We study the proportional clustering problem of Chen et al. [ICML'19] and relate it to the area of multiwinner voting in computational social choice. We show that any clustering satisfying a weak proportionality notion of Brill and Peters [EC'23] simultaneously obtains the best known approximations to the proportional fairness notion of Chen et al. [ICML'19], but also to individual fairness [Jung et al., FORC'20] and the "core" [Li et al. ICML'21]. In fact, we show that any approximation to proportional fairness is also an approximation to individual fairness and vice versa. Finally, we also study stronger notions of proportional representation, in which deviations do not only happen to single, but multiple candidate centers, and show that stronger proportionality notions of Brill and Peters [EC'23] imply approximations to these stronger guarantees

    Approximation Algorithms for Fair Range Clustering

    Full text link
    This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick kk centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of nn points in a metric space (P,d)(P,d) where each point belongs to one of the \ell different demographics (i.e., P=P1P2PP = P_1 \uplus P_2 \uplus \cdots \uplus P_\ell) and a set of \ell intervals [α1,β1],,[α,β][\alpha_1, \beta_1], \cdots, [\alpha_\ell, \beta_\ell] on desired number of centers from each group, the goal is to pick a set of kk centers CC with minimum p\ell_p-clustering cost (i.e., (vPd(v,C)p)1/p(\sum_{v\in P} d(v,C)^p)^{1/p}) such that for each group ii\in \ell, CPi[αi,βi]|C\cap P_i| \in [\alpha_i, \beta_i]. In particular, the fair range p\ell_p-clustering captures fair range kk-center, kk-median and kk-means as its special cases. In this work, we provide efficient constant factor approximation algorithms for fair range p\ell_p-clustering for all values of p[1,)p\in [1,\infty).Comment: ICML 202

    Changes in epidemiological patterns of sea lice infestation on farmed Atlantic salmon, Salmo salar L., in Scotland between 1996 and 2006

    Get PDF
    Analyses of a unique database containing sea lice records over an 11 year period provide evidence of changing infestation patterns in Scotland. The data, collected from more than 50 commercial Atlantic salmon farms, indicate that both species of sea lice commonly found in Scotland, Lepeophtheirus salmonis and Caligus elongatus, have declined on farms over the past decade. Reductions for both species have been particularly marked since 2001 when more effective veterinary medicines became available. Treatment data were also available in the database and these show a growing trend towards the use of the in feed medication emamectin benzoate (Slice), particularly in the first year of the salmon production cycle. However, this trend to wards single product use has not been sustained in 2006, the latest year for which data are available. There is some evidence of region to region variation within Scotland with the Western Isles experiencing higher levels of infestation. However, compared to the levels observed between 1996 and 2000, all regions have benefited from reduced lice infestation, with the overall pattern showing a particular reduction in the second and third quarters of the second year of production

    Role of homeostasis in learning sparse representations

    Full text link
    Neurons in the input layer of primary visual cortex in primates develop edge-like receptive fields. One approach to understanding the emergence of this response is to state that neural activity has to efficiently represent sensory data with respect to the statistics of natural scenes. Furthermore, it is believed that such an efficient coding is achieved using a competition across neurons so as to generate a sparse representation, that is, where a relatively small number of neurons are simultaneously active. Indeed, different models of sparse coding, coupled with Hebbian learning and homeostasis, have been proposed that successfully match the observed emergent response. However, the specific role of homeostasis in learning such sparse representations is still largely unknown. By quantitatively assessing the efficiency of the neural representation during learning, we derive a cooperative homeostasis mechanism that optimally tunes the competition between neurons within the sparse coding algorithm. We apply this homeostasis while learning small patches taken from natural images and compare its efficiency with state-of-the-art algorithms. Results show that while different sparse coding algorithms give similar coding results, the homeostasis provides an optimal balance for the representation of natural images within the population of neurons. Competition in sparse coding is optimized when it is fair. By contributing to optimizing statistical competition across neurons, homeostasis is crucial in providing a more efficient solution to the emergence of independent components
    corecore