19 research outputs found
Protecting the Protected Group: Circumventing Harmful Fairness
Machine Learning (ML) algorithms shape our lives. Banks use them to determine
if we are good borrowers; IT companies delegate them recruitment decisions;
police apply ML for crime-prediction, and judges base their verdicts on ML.
However, real-world examples show that such automated decisions tend to
discriminate against protected groups. This potential discrimination generated
a huge hype both in media and in the research community. Quite a few formal
notions of fairness were proposed, which take a form of constraints a "fair"
algorithm must satisfy. We focus on scenarios where fairness is imposed on a
self-interested party (e.g., a bank that maximizes its revenue). We find that
the disadvantaged protected group can be worse off after imposing a fairness
constraint. We introduce a family of \textit{Welfare-Equalizing} fairness
constraints that equalize per-capita welfare of protected groups, and include
\textit{Demographic Parity} and \textit{Equal Opportunity} as particular cases.
In this family, we characterize conditions under which the fairness constraint
helps the disadvantaged group. We also characterize the structure of the
optimal \textit{Welfare-Equalizing} classifier for the self-interested party,
and provide an algorithm to compute it. Overall, our
\textit{Welfare-Equalizing} fairness approach provides a unified framework for
discussing fairness in classification in the presence of a self-interested
party.Comment: Published in AAAI 202
Convergence of Learning Dynamics in Information Retrieval Games
We consider a game-theoretic model of information retrieval with strategic
authors. We examine two different utility schemes: authors who aim at
maximizing exposure and authors who want to maximize active selection of their
content (i.e. the number of clicks). We introduce the study of author learning
dynamics in such contexts. We prove that under the probability ranking
principle (PRP), which forms the basis of the current state of the art ranking
methods, any better-response learning dynamics converges to a pure Nash
equilibrium. We also show that other ranking methods induce a strategic
environment under which such a convergence may not occur
Learning with Exposure Constraints in Recommendation Systems
Recommendation systems are dynamic economic systems that balance the needs of
multiple stakeholders. A recent line of work studies incentives from the
content providers' point of view. Content providers, e.g., vloggers and
bloggers, contribute fresh content and rely on user engagement to create
revenue and finance their operations. In this work, we propose a contextual
multi-armed bandit setting to model the dependency of content providers on
exposure. In our model, the system receives a user context in every round and
has to select one of the arms. Every arm is a content provider who must receive
a minimum number of pulls every fixed time period (e.g., a month) to remain
viable in later rounds; otherwise, the arm departs and is no longer available.
The system aims to maximize the users' (content consumers) welfare. To that
end, it should learn which arms are vital and ensure they remain viable by
subsidizing arm pulls if needed. We develop algorithms with sub-linear regret,
as well as a lower bound that demonstrates that our algorithms are optimal up
to logarithmic factors.Comment: Published in The Web Conference 2023 (WWW 23
Principal-Agent Reward Shaping in MDPs
Principal-agent problems arise when one party acts on behalf of another,
leading to conflicts of interest. The economic literature has extensively
studied principal-agent problems, and recent work has extended this to more
complex scenarios such as Markov Decision Processes (MDPs). In this paper, we
further explore this line of research by investigating how reward shaping under
budget constraints can improve the principal's utility. We study a two-player
Stackelberg game where the principal and the agent have different reward
functions, and the agent chooses an MDP policy for both players. The principal
offers an additional reward to the agent, and the agent picks their policy
selfishly to maximize their reward, which is the sum of the original and the
offered reward. Our results establish the NP-hardness of the problem and offer
polynomial approximation algorithms for two classes of instances: Stochastic
trees and deterministic decision processes with a finite horizon.Comment: Full version of a paper accepted to AAAI'2