1,388 research outputs found
Calibrated Fairness in Bandits
We study fairness within the stochastic, \emph{multi-armed bandit} (MAB)
decision making framework. We adapt the fairness framework of "treating similar
individuals similarly" to this setting. Here, an `individual' corresponds to an
arm and two arms are `similar' if they have a similar quality distribution.
First, we adopt a {\em smoothness constraint} that if two arms have a similar
quality distribution then the probability of selecting each arm should be
similar. In addition, we define the {\em fairness regret}, which corresponds to
the degree to which an algorithm is not calibrated, where perfect calibration
requires that the probability of selecting an arm is equal to the probability
with which the arm has the best quality realization. We show that a variation
on Thompson sampling satisfies smooth fairness for total variation distance,
and give an bound on fairness regret. This complements
prior work, which protects an on-average better arm from being less favored. We
also explain how to extend our algorithm to the dueling bandit setting.Comment: To be presented at the FAT-ML'17 worksho
Distributed Learning in Multi-Armed Bandit with Multiple Players
We formulate and study a decentralized multi-armed bandit (MAB) problem.
There are M distributed players competing for N independent arms. Each arm,
when played, offers i.i.d. reward according to a distribution with an unknown
parameter. At each time, each player chooses one arm to play without exchanging
observations or any information with other players. Players choosing the same
arm collide, and, depending on the collision model, either no one receives
reward or the colliding players share the reward in an arbitrary way. We show
that the minimum system regret of the decentralized MAB grows with time at the
same logarithmic order as in the centralized counterpart where players act
collectively as a single entity by exchanging observations and making decisions
jointly. A decentralized policy is constructed to achieve this optimal order
while ensuring fairness among players and without assuming any pre-agreement or
information exchange among players. Based on a Time Division Fair Sharing
(TDFS) of the M best arms, the proposed policy is constructed and its order
optimality is proven under a general reward model. Furthermore, the basic
structure of the TDFS policy can be used with any order-optimal single-player
policy to achieve order optimality in the decentralized setting. We also
establish a lower bound on the system regret growth rate for a general class of
decentralized polices, to which the proposed policy belongs. This problem finds
potential applications in cognitive radio networks, multi-channel communication
systems, multi-agent systems, web search and advertising, and social networks.Comment: 31 pages, 8 figures, revised paper submitted to IEEE Transactions on
Signal Processing, April, 2010, the pre-agreement in the decentralized TDFS
policy is eliminated to achieve a complete decentralization among player
Fairness Incentives for Myopic Agents
We consider settings in which we wish to incentivize myopic agents (such as
Airbnb landlords, who may emphasize short-term profits and property safety) to
treat arriving clients fairly, in order to prevent overall discrimination
against individuals or groups. We model such settings in both classical and
contextual bandit models in which the myopic agents maximize rewards according
to current empirical averages, but are also amenable to exogenous payments that
may cause them to alter their choices. Our notion of fairness asks that more
qualified individuals are never (probabilistically) preferred over less
qualified ones [Joseph et al].
We investigate whether it is possible to design inexpensive {subsidy} or
payment schemes for a principal to motivate myopic agents to play fairly in all
or almost all rounds. When the principal has full information about the state
of the myopic agents, we show it is possible to induce fair play on every round
with a subsidy scheme of total cost (for the classic setting with
arms, , and for the -dimensional linear contextual
setting ). If the principal has much more limited
information (as might often be the case for an external regulator or watchdog),
and only observes the number of rounds in which members from each of the
groups were selected, but not the empirical estimates maintained by the myopic
agent, the design of such a scheme becomes more complex. We show both positive
and negative results in the classic and linear bandit settings by upper and
lower bounding the cost of fair subsidy schemes
An ADMM Based Framework for AutoML Pipeline Configuration
We study the AutoML problem of automatically configuring machine learning
pipelines by jointly selecting algorithms and their appropriate
hyper-parameters for all steps in supervised learning pipelines. This black-box
(gradient-free) optimization with mixed integer & continuous variables is a
challenging problem. We propose a novel AutoML scheme by leveraging the
alternating direction method of multipliers (ADMM). The proposed framework is
able to (i) decompose the optimization problem into easier sub-problems that
have a reduced number of variables and circumvent the challenge of mixed
variable categories, and (ii) incorporate black-box constraints along-side the
black-box optimization objective. We empirically evaluate the flexibility (in
utilizing existing AutoML techniques), effectiveness (against open source
AutoML toolkits),and unique capability (of executing AutoML with practically
motivated black-box constraints) of our proposed scheme on a collection of
binary classification data sets from UCI ML& OpenML repositories. We observe
that on an average our framework provides significant gains in comparison to
other AutoML frameworks (Auto-sklearn & TPOT), highlighting the practical
advantages of this framework
- …