406 research outputs found
A Regression Discontinuity Design for Ordinal Running Variables: Evaluating Central Bank Purchases of Corporate Bonds
Regression discontinuity (RD) is a widely used quasi-experimental design for
causal inference. In the standard RD, the assignment to treatment is determined
by a continuous pretreatment variable (i.e., running variable) falling above or
below a pre-fixed threshold. In the case of the corporate sector purchase
programme (CSPP) of the European Central Bank, which involves large-scale
purchases of securities issued by corporations in the euro area, such a
threshold can be defined in terms of an ordinal running variable. This feature
poses challenges to RD estimation due to the lack of a meaningful measure of
distance. To evaluate such program, this paper proposes an RD approach for
ordinal running variables under the local randomization framework. The proposal
first estimates an ordered probit model for the ordinal running variable. The
estimated probability of being assigned to treatment is then adopted as a
latent continuous running variable and used to identify a covariate-balanced
subsample around the threshold. Assuming local unconfoundedness of the
treatment in the subsample, an estimate of the effect of the program is
obtained by employing a weighted estimator of the average treatment effect. Two
weighting estimators---overlap weights and ATT weights---as well as their
augmented versions are considered. We apply the method to evaluate the causal
effect of the CSPP and find a statistically significant and negative effect on
corporate bond spreads at issuance.Comment: Also available as Temi di discussione (Economic working papers) 1213,
Bank of Italy, Economic Research and International Relations Are
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Bridging the Gap: Towards an Expanded Toolkit for ML-Supported Decision-Making in the Public Sector
Machine Learning (ML) systems are becoming instrumental in the public sector,
with applications spanning areas like criminal justice, social welfare,
financial fraud detection, and public health. While these systems offer great
potential benefits to institutional decision-making processes, such as improved
efficiency and reliability, they still face the challenge of aligning intricate
and nuanced policy objectives with the precise formalization requirements
necessitated by ML models. In this paper, we aim to bridge the gap between ML
and public sector decision-making by presenting a comprehensive overview of key
technical challenges where disjunctions between policy goals and ML models
commonly arise. We concentrate on pivotal points of the ML pipeline that
connect the model to its operational environment, delving into the significance
of representative training data and highlighting the importance of a model
setup that facilitates effective decision-making. Additionally, we link these
challenges with emerging methodological advancements, encompassing causal ML,
domain adaptation, uncertainty quantification, and multi-objective
optimization, illustrating the path forward for harmonizing ML and public
sector objectives
Towards Robust Off-Policy Evaluation via Human Inputs
Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies
in high-stakes domains such as healthcare, where direct deployment is often
infeasible, unethical, or expensive. When deployment environments are expected
to undergo changes (that is, dataset shifts), it is important for OPE methods
to perform robust evaluation of the policies amidst such changes. Existing
approaches consider robustness against a large class of shifts that can
arbitrarily change any observable property of the environment. This often
results in highly pessimistic estimates of the utilities, thereby invalidating
policies that might have been useful in deployment. In this work, we address
the aforementioned problem by investigating how domain knowledge can help
provide more realistic estimates of the utilities of policies. We leverage
human inputs on which aspects of the environments may plausibly change, and
adapt the OPE methods to only consider shifts on these aspects. Specifically,
we propose a novel framework, Robust OPE (ROPE), which considers shifts on a
subset of covariates in the data based on user inputs, and estimates worst-case
utility under these shifts. We then develop computationally efficient
algorithms for OPE that are robust to the aforementioned shifts for contextual
bandits and Markov decision processes. We also theoretically analyze the sample
complexity of these algorithms. Extensive experimentation with synthetic and
real world datasets from the healthcare domain demonstrates that our approach
not only captures realistic dataset shifts accurately, but also results in less
pessimistic policy evaluations.Comment: 10 pages, 5 figures, 1 table. Appeared at AIES '22: Proceedings of
the 2022 AAAI/ACM Conference on AI, Ethics, and Society. Expanded version of
arXiv:2103.1593
Diagnosing Model Performance Under Distribution Shift
Prediction models can perform poorly when deployed to target distributions
different from the training distribution. To understand these operational
failure modes, we develop a method, called DIstribution Shift DEcomposition
(DISDE), to attribute a drop in performance to different types of distribution
shifts. Our approach decomposes the performance drop into terms for 1) an
increase in harder but frequently seen examples from training, 2) changes in
the relationship between features and outcomes, and 3) poor performance on
examples infrequent or unseen during training. These terms are defined by
fixing a distribution on while varying the conditional distribution of between training and target, or by fixing the conditional distribution
of while varying the distribution on . In order to do this, we
define a hypothetical distribution on consisting of values common in both
training and target, over which it is easy to compare and thus
predictive performance. We estimate performance on this hypothetical
distribution via reweighting methods. Empirically, we show how our method can
1) inform potential modeling improvements across distribution shifts for
employment prediction on tabular census data, and 2) help to explain why
certain domain adaptation methods fail to improve model performance for
satellite image classification
- …