Search CORE

27 research outputs found

Approximation Algorithms for Continuous Clustering and Facility Location Problems

Author: Chakrabarty Deeparnab
Negahbani Maryam
Sarkar Ankita
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

We consider the approximability of center-based clustering problems where the points to be clustered lie in a metric space, and no candidate centers are specified. We call such problems "continuous", to distinguish from "discrete" clustering where candidate centers are specified. For many objectives, one can reduce the continuous case to the discrete case, and use an

\alpha

-approximation algorithm for the discrete case to get a

\beta\alpha

-approximation for the continuous case, where

\beta

depends on the objective: e.g. for

k

-median,

\beta = 2

, and for

k

-means,

\beta = 4

. Our motivating question is whether this gap of

\beta

is inherent, or are there better algorithms for continuous clustering than simply reducing to the discrete case? In a recent SODA 2021 paper, Cohen-Addad, Karthik, and Lee prove a factor-

2

and a factor-

4

hardness, respectively, for continuous

k

-median and

k

-means, even when the number of centers

k

is a constant. The discrete case for a constant

k

is exactly solvable in polytime, so the

\beta

loss seems unavoidable in some regimes. In this paper, we approach continuous clustering via the round-or-cut framework. For four continuous clustering problems, we outperform the reduction to the discrete case. Notably, for the problem

\lambda

-UFL, where

\beta = 2

and the discrete case has a hardness of

1.27

, we obtain an approximation ratio of

2.32 < 2 \times 1.27

for the continuous case. Also, for continuous

k

-means, where the best known approximation ratio for the discrete case is

9

, we obtain an approximation ratio of

32 < 4 \times 9

. The key challenge is that most algorithms for discrete clustering, including the state of the art, depend on linear programs that become infinite-sized in the continuous case. To overcome this, we design new linear programs for the continuous case which are amenable to the round-or-cut framework.Comment: 24 pages, 0 figures. Full version of ESA 2022 paper https://drops.dagstuhl.de/opus/volltexte/2022/16971 . This version adds a link to the conference version and fixes minor formatting issue

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Sanitized Clustering against Confounding Bias

Author: Li Jing
Pan Yuangang
Tsang Ivor W.
Yao Xin
Yao Yinghua
Publication venue
Publication date: 02/11/2023
Field of study

Real-world datasets inevitably contain biases that arise from different sources or conditions during data collection. Consequently, such inconsistency itself acts as a confounding factor that disturbs the cluster analysis. Existing methods eliminate the biases by projecting data onto the orthogonal complement of the subspace expanded by the confounding factor before clustering. Therein, the interested clustering factor and the confounding factor are coarsely considered in the raw feature space, where the correlation between the data and the confounding factor is ideally assumed to be linear for convenient solutions. These approaches are thus limited in scope as the data in real applications is usually complex and non-linearly correlated with the confounding factor. This paper presents a new clustering framework named Sanitized Clustering Against confounding Bias (SCAB), which removes the confounding factor in the semantic latent space of complex data through a non-linear dependence measure. To be specific, we eliminate the bias information in the latent space by minimizing the mutual information between the confounding factor and the latent representation delivered by Variational Auto-Encoder (VAE). Meanwhile, a clustering module is introduced to cluster over the purified latent representations. Extensive experiments on complex datasets demonstrate that our SCAB achieves a significant gain in clustering performance by removing the confounding bias. The code is available at \url{https://github.com/EvaFlower/SCAB}.Comment: Machine Learning, in pres

arXiv.org e-Print Archive

Models and Mechanisms for Fairness in Location Data Processing

Author: Ghinita Gabriel
Shahabi Cyrus
Shaham Sina
Publication venue
Publication date: 04/04/2022
Field of study

Location data use has become pervasive in the last decade due to the advent of mobile apps, as well as novel areas such as smart health, smart cities, etc. At the same time, significant concerns have surfaced with respect to fairness in data processing. Individuals from certain population segments may be unfairly treated when being considered for loan or job applications, access to public resources, or other types of services. In the case of location data, fairness is an important concern, given that an individual's whereabouts are often correlated with sensitive attributes, e.g., race, income, education. While fairness has received significant attention recently, e.g., in the case of machine learning, there is little focus on the challenges of achieving fairness when dealing with location data. Due to their characteristics and specific type of processing algorithms, location data pose important fairness challenges that must be addressed in a comprehensive and effective manner. In this paper, we adapt existing fairness models to suit the specific properties of location data and spatial processing. We focus on individual fairness, which is more difficult to achieve, and more relevant for most location data processing scenarios. First, we devise a novel building block to achieve fairness in the form of fair polynomials. Then, we propose two mechanisms based on fair polynomials that achieve individual fairness, corresponding to two common interaction types based on location data. Extensive experimental results on real data show that the proposed mechanisms achieve individual location fairness without sacrificing utility

arXiv.org e-Print Archive

Approximation Algorithms for Fair Range Clustering

Author: Hotegni Sèdjro S.
Mahabadi Sepideh
Vakilian Ali
Publication venue
Publication date: 22/06/2023
Field of study

This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick

k

centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of

n

points in a metric space

(P,d)

where each point belongs to one of the

\ell

different demographics (i.e.,

P = P_1 \uplus P_2 \uplus \cdots \uplus P_\ell

) and a set of

\ell

intervals

[\alpha_1, \beta_1], \cdots, [\alpha_\ell, \beta_\ell]

on desired number of centers from each group, the goal is to pick a set of

k

centers

C

with minimum

\ell_p

-clustering cost (i.e.,

(\sum_{v\in P} d(v,C)^p)^{1/p}

) such that for each group

i\in \ell

|C\cap P_i| \in [\alpha_i, \beta_i]

. In particular, the fair range

\ell_p

-clustering captures fair range

k

-center,

k

-median and

k

-means as its special cases. In this work, we provide efficient constant factor approximation algorithms for fair range

\ell_p

-clustering for all values of

p\in [1,\infty)

.Comment: ICML 202

arXiv.org e-Print Archive

Fair Clustering via Hierarchical Fair-Dirichlet Process

Author: Bhattacharya Anirban
Chakraborty Abhisek
Pati Debdeep
Publication venue
Publication date: 27/05/2023
Field of study

The advent of ML-driven decision-making and policy formation has led to an increasing focus on algorithmic fairness. As clustering is one of the most commonly used unsupervised machine learning approaches, there has naturally been a proliferation of literature on {\em fair clustering}. A popular notion of fairness in clustering mandates the clusters to be {\em balanced}, i.e., each level of a protected attribute must be approximately equally represented in each cluster. Building upon the original framework, this literature has rapidly expanded in various aspects. In this article, we offer a novel model-based formulation of fair clustering, complementing the existing literature which is almost exclusively based on optimizing appropriate objective functions

arXiv.org e-Print Archive

Spectral Normalized-Cut Graph Partitioning with Fairness Constraints

Author: Li Jia
Merchant Arpit
Wang Yanhao
Publication venue
Publication date: 22/07/2023
Field of study

Normalized-cut graph partitioning aims to divide the set of nodes in a graph into

k

disjoint clusters to minimize the fraction of the total edges between any cluster and all other clusters. In this paper, we consider a fair variant of the partitioning problem wherein nodes are characterized by a categorical sensitive attribute (e.g., gender or race) indicating membership to different demographic groups. Our goal is to ensure that each group is approximately proportionally represented in each cluster while minimizing the normalized cut value. To resolve this problem, we propose a two-phase spectral algorithm called FNM. In the first phase, we add an augmented Lagrangian term based on our fairness criteria to the objective function for obtaining a fairer spectral node embedding. Then, in the second phase, we design a rounding scheme to produce

k

clusters from the fair embedding that effectively trades off fairness and partition quality. Through comprehensive experiments on nine benchmark datasets, we demonstrate the superior performance of FNM compared with three baseline methods.Comment: 17 pages, 7 figures, accepted to the 26th European Conference on Artificial Intelligence (ECAI 2023

arXiv.org e-Print Archive