11 research outputs found
Fair Algorithms for Hierarchical Agglomerative Clustering
Hierarchical Agglomerative Clustering (HAC) algorithms are extensively
utilized in modern data science, and seek to partition the dataset into
clusters while generating a hierarchical relationship between the data samples.
HAC algorithms are employed in many applications, such as biology, natural
language processing, and recommender systems. Thus, it is imperative to ensure
that these algorithms are fair -- even if the dataset contains biases against
certain protected groups, the cluster outputs generated should not discriminate
against samples from any of these groups. However, recent work in clustering
fairness has mostly focused on center-based clustering algorithms, such as
k-median and k-means clustering. In this paper, we propose fair algorithms for
performing HAC that enforce fairness constraints 1) irrespective of the
distance linkage criteria used, 2) generalize to any natural measures of
clustering fairness for HAC, 3) work for multiple protected groups, and 4) have
competitive running times to vanilla HAC. Through extensive experiments on
multiple real-world UCI datasets, we show that our proposed algorithm finds
fairer clusterings compared to vanilla HAC as well as other state-of-the-art
fair clustering approaches
Recommended from our members
Towards Robust and Fair Machine Learning
Recent advances in Machine Learning (ML) and Deep Learning (DL) have resulted in the widespread adoption of models across various application pipelines. However, despite these performance improvements, ML/DL models have been shown to be vulnerable to adversarial inputs that can reduce functionality. Concerns over these issues have prompted researchers to study model robustness from multiple perspectives-- such as privacy, fairness, security, interpretability, among others. In this thesis, we build upon these ideas of robustness, by investigating adversarial and social robustness for a number of different learning models and problem settings. We first study adversarial robustness of unsupervised clustering models, by proposing novel poisoning and evasion attacks for both deep and classical models. We then study the social robustness of models in the context of fairness, and propose the antidote data problem for fair clustering, as well as the fair video summarization problem. Finally, we investigate two problems at the intersection of adversarial and social robustness. We propose a new robust fair clustering method that can jointly ensure adversarial and social robustness, and data selection approaches that can improve interpretability, and optimize the utility, fairness, and robustness for classification models. Through the concepts and ideas proposed in this thesis we aim to lay the groundwork for analyzing and ensuring robustness of ML/DL models of the future
Suspicion-Free Adversarial Attacks on Clustering Algorithms
Clustering algorithms are used in a large number of applications and play an important role in modern machine learning– yet, adversarial attacks on clustering algorithms seem to be broadly overlooked unlike supervised learning. In this paper, we seek to bridge this gap by proposing a black-box adversarial attack for clustering models for linearly separable clusters. Our attack works by perturbing a single sample close to the decision boundary, which leads to the misclustering of multiple unperturbed samples, named spill-over adversarial samples. We theoretically show the existence of such adversarial samples for the K-Means clustering. Our attack is especially strong as (1) we ensure the perturbed sample is not an outlier, hence not detectable, and (2) the exact metric used for clustering is not known to the attacker. We theoretically justify that the attack can indeed be successful without the knowledge of the true metric. We conclude by providing empirical results on a number of datasets, and clustering algorithms. To the best of our knowledge, this is the first work that generates spill-over adversarial samples without the knowledge of the true metric ensuring that the perturbed sample is not an outlier, and theoretically proves the above
Recommended from our members
Auditing YouTubes recommendation system for ideologically congenial, extreme, and problematic recommendations.
Algorithms of social media platforms are often criticized for recommending ideologically congenial and radical content to their users. Despite these concerns, evidence on such filter bubbles and rabbit holes of radicalization is inconclusive. We conduct an audit of the platform using 100,000 sock puppets that allow us to systematically and at scale isolate the influence of the algorithm in recommendations. We test 1) whether recommended videos are congenial with regard to users ideology, especially deeper in the watch trail and whether 2) recommendations deeper in the trail become progressively more extreme and come from problematic channels. We find that YouTubes algorithm recommends congenial content to its partisan users, although some moderate and cross-cutting exposure is possible and that congenial recommendations increase deeper in the trail for right-leaning users. We do not find meaningful increases in ideological extremity of recommendations deeper in the trail, yet we show that a growing proportion of recommendations comes from channels categorized as problematic (e.g., IDW, Alt-right, Conspiracy, and QAnon), with this increase being most pronounced among the very-right users. Although the proportion of these problematic recommendations is low (max of 2.5%), they are still encountered by over 36.1% of users and up to 40% in the case of very-right users