45 research outputs found
Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence
An important challenge for safety in machine learning and artificial
intelligence systems is a~set of related failures involving specification
gaming, reward hacking, fragility to distributional shifts, and Goodhart's or
Campbell's law. This paper presents additional failure modes for interactions
within multi-agent systems that are closely related. These multi-agent failure
modes are more complex, more problematic, and less well understood than the
single-agent case, and are also already occurring, largely unnoticed. After
motivating the discussion with examples from poker-playing artificial
intelligence (AI), the paper explains why these failure modes are in some
senses unavoidable. Following this, the paper categorizes failure modes,
provides definitions, and cites examples for each of the modes: accidental
steering, coordination failures, adversarial misalignment, input spoofing and
filtering, and goal co-option or direct hacking. The paper then discusses how
extant literature on multi-agent AI fails to address these failure modes, and
identifies work which may be useful for the mitigation of these failure modes.Comment: 12 Pages, This version re-submitted to Big Data and Cognitive
Computing, Special Issue "Artificial Superintelligence: Coordination &
Strategy
Prediction without Preclusion: Recourse Verification with Reachable Sets
Machine learning models are often used to decide who will receive a loan, a
job interview, or a public benefit. Standard techniques to build these models
use features about people but overlook their actionability. In turn, models can
assign predictions that are fixed, meaning that consumers who are denied loans,
interviews, or benefits may be permanently locked out from access to credit,
employment, or assistance. In this work, we introduce a formal testing
procedure to flag models that assign fixed predictions that we call recourse
verification. We develop machinery to reliably determine if a given model can
provide recourse to its decision subjects from a set of user-specified
actionability constraints. We demonstrate how our tools can ensure recourse and
adversarial robustness in real-world datasets and use them to study the
infeasibility of recourse in real-world lending datasets. Our results highlight
how models can inadvertently assign fixed predictions that permanently bar
access, and we provide tools to design algorithms that account for
actionability when developing models
Manipulation-Proof Machine Learning
An increasing number of decisions are guided by machine learning algorithms.
In many settings, from consumer credit to criminal justice, those decisions are
made by applying an estimator to data on an individual's observed behavior. But
when consequential decisions are encoded in rules, individuals may
strategically alter their behavior to achieve desired outcomes. This paper
develops a new class of estimator that is stable under manipulation, even when
the decision rule is fully transparent. We explicitly model the costs of
manipulating different behaviors, and identify decision rules that are stable
in equilibrium. Through a large field experiment in Kenya, we show that
decision rules estimated with our strategy-robust method outperform those based
on standard supervised learning approaches
Actionable Recourse in Linear Classification
Machine learning models are increasingly used to automate decisions that
affect humans - deciding who should receive a loan, a job interview, or a
social service. In such applications, a person should have the ability to
change the decision of a model. When a person is denied a loan by a credit
score, for example, they should be able to alter its input variables in a way
that guarantees approval. Otherwise, they will be denied the loan as long as
the model is deployed. More importantly, they will lack the ability to
influence a decision that affects their livelihood.
In this paper, we frame these issues in terms of recourse, which we define as
the ability of a person to change the decision of a model by altering
actionable input variables (e.g., income vs. age or marital status). We present
integer programming tools to ensure recourse in linear classification problems
without interfering in model development. We demonstrate how our tools can
inform stakeholders through experiments on credit scoring problems. Our results
show that recourse can be significantly affected by standard practices in model
development, and motivate the need to evaluate recourse in practice.Comment: Extended version. ACM Conference on Fairness, Accountability and
Transparency [FAT2019
Choosing the Right Weights: Balancing Value, Strategy, and Noise in Recommender Systems
Many recommender systems are based on optimizing a linear weighting of
different user behaviors, such as clicks, likes, shares, etc. Though the choice
of weights can have a significant impact, there is little formal study or
guidance on how to choose them. We analyze the optimal choice of weights from
the perspectives of both users and content producers who strategically respond
to the weights. We consider three aspects of user behavior: value-faithfulness
(how well a behavior indicates whether the user values the content),
strategy-robustness (how hard it is for producers to manipulate the behavior),
and noisiness (how much estimation error there is in predicting the behavior).
Our theoretical results show that for users, upweighting more value-faithful
and less noisy behaviors leads to higher utility, while for producers,
upweighting more value-faithful and strategy-robust behaviors leads to higher
welfare (and the impact of noise is non-monotonic). Finally, we discuss how our
results can help system designers select weights in practice
On the Actionability of Outcome Prediction
Predicting future outcomes is a prevalent application of machine learning in
social impact domains. Examples range from predicting student success in
education to predicting disease risk in healthcare. Practitioners recognize
that the ultimate goal is not just to predict but to act effectively.
Increasing evidence suggests that relying on outcome predictions for downstream
interventions may not have desired results.
In most domains there exists a multitude of possible interventions for each
individual, making the challenge of taking effective action more acute. Even
when causal mechanisms connecting the individual's latent states to outcomes is
well understood, in any given instance (a specific student or patient),
practitioners still need to infer -- from budgeted measurements of latent
states -- which of many possible interventions will be most effective for this
individual. With this in mind, we ask: when are accurate predictors of outcomes
helpful for identifying the most suitable intervention?
Through a simple model encompassing actions, latent states, and measurements,
we demonstrate that pure outcome prediction rarely results in the most
effective policy for taking actions, even when combined with other
measurements. We find that except in cases where there is a single decisive
action for improving the outcome, outcome prediction never maximizes "action
value", the utility of taking actions. Making measurements of actionable latent
states, where specific actions lead to desired outcomes, considerably enhances
the action value compared to outcome prediction, and the degree of improvement
depends on action costs and the outcome model. This analysis emphasizes the
need to go beyond generic outcome prediction in interventional settings by
incorporating knowledge of plausible actions and latent states.Comment: 14 pages, 3 figure