7 research outputs found
Kernel functions based on triplet comparisons
Given only information in the form of similarity triplets "Object A is more
similar to object B than to object C" about a data set, we propose two ways of
defining a kernel function on the data set. While previous approaches construct
a low-dimensional Euclidean embedding of the data set that reflects the given
similarity triplets, we aim at defining kernel functions that correspond to
high-dimensional embeddings. These kernel functions can subsequently be used to
apply any kernel method to the data set
Are Two Heads the Same as One? Identifying Disparate Treatment in Fair Neural Networks
We show that deep neural networks that satisfy demographic parity do so
through a form of race or gender awareness, and that the more we force a
network to be fair, the more accurately we can recover race or gender from the
internal state of the network. Based on this observation, we propose a simple
two-stage solution for enforcing fairness. First, we train a two-headed network
to predict the protected attribute (such as race or gender) alongside the
original task, and second, we enforce demographic parity by taking a weighted
sum of the heads. In the end, this approach creates a single-headed network
with the same backbone architecture as the original network. Our approach has
near identical performance compared to existing regularization-based or
preprocessing methods, but has greater stability and higher accuracy where near
exact demographic parity is required. To cement the relationship between these
two approaches, we show that an unfair and optimally accurate classifier can be
recovered by taking a weighted sum of a fair classifier and a classifier
predicting the protected attribute. We use this to argue that both the fairness
approaches and our explicit formulation demonstrate disparate treatment and
that, consequentially, they are likely to be unlawful in a wide range of
scenarios under the US law
Active Sampling for Min-Max Fairness
We propose simple active sampling and reweighting strategies for optimizing
min-max fairness that can be applied to any classification or regression model
learned via loss minimization. The key intuition behind our approach is to use
at each timestep a datapoint from the group that is worst off under the current
model for updating the model. The ease of implementation and the generality of
our robust formulation make it an attractive option for improving model
performance on disadvantaged groups. For convex learning problems, such as
linear or logistic regression, we provide a fine-grained analysis, proving the
rate of convergence to a min-max fair solution
Individual Preference Stability for Clustering
In this paper, we propose a natural notion of individual preference (IP)
stability for clustering, which asks that every data point, on average, is
closer to the points in its own cluster than to the points in any other
cluster. Our notion can be motivated from several perspectives, including game
theory and algorithmic fairness. We study several questions related to our
proposed notion. We first show that deciding whether a given data set allows
for an IP-stable clustering in general is NP-hard. As a result, we explore the
design of efficient algorithms for finding IP-stable clusterings in some
restricted metric spaces. We present a polytime algorithm to find a clustering
satisfying exact IP-stability on the real line, and an efficient algorithm to
find an IP-stable 2-clustering for a tree metric. We also consider relaxing the
stability constraint, i.e., every data point should not be too far from its own
cluster compared to any other cluster. For this case, we provide polytime
algorithms with different guarantees. We evaluate some of our algorithms and
several standard clustering approaches on real data sets.Comment: Accepted to ICML'22. This is a full version of the ICML version as
well as a substantially improved version of arXiv:2006.0496