188 research outputs found
Estimating individual treatment effect: generalization bounds and algorithms
There is intense interest in applying machine learning to problems of causal
inference in fields such as healthcare, economics and education. In particular,
individual-level causal inference has important applications such as precision
medicine. We give a new theoretical analysis and family of algorithms for
predicting individual treatment effect (ITE) from observational data, under the
assumption known as strong ignorability. The algorithms learn a "balanced"
representation such that the induced treated and control distributions look
similar. We give a novel, simple and intuitive generalization-error bound
showing that the expected ITE estimation error of a representation is bounded
by a sum of the standard generalization-error of that representation and the
distance between the treated and control distributions induced by the
representation. We use Integral Probability Metrics to measure distances
between distributions, deriving explicit bounds for the Wasserstein and Maximum
Mean Discrepancy (MMD) distances. Experiments on real and simulated data show
the new algorithms match or outperform the state-of-the-art.Comment: Added name "TARNet" to refer to version with alpha = 0. Removed sup
A Survey on Graph Kernels
Graph kernels have become an established and widely-used technique for
solving classification tasks on graphs. This survey gives a comprehensive
overview of techniques for kernel-based graph classification developed in the
past 15 years. We describe and categorize graph kernels based on properties
inherent to their design, such as the nature of their extracted graph features,
their method of computation and their applicability to problems in practice. In
an extensive experimental evaluation, we study the classification accuracy of a
large suite of graph kernels on established benchmarks as well as new datasets.
We compare the performance of popular kernels with several baseline methods and
study the effect of applying a Gaussian RBF kernel to the metric induced by a
graph kernel. In doing so, we find that simple baselines become competitive
after this transformation on some datasets. Moreover, we study the extent to
which existing graph kernels agree in their predictions (and prediction errors)
and obtain a data-driven categorization of kernels as result. Finally, based on
our experimental results, we derive a practitioner's guide to kernel-based
graph classification
Support and Invertibility in Domain-Invariant Representations
Learning domain-invariant representations has become a popular approach to
unsupervised domain adaptation and is often justified by invoking a particular
suite of theoretical results. We argue that there are two significant flaws in
such arguments. First, the results in question hold only for a fixed
representation and do not account for information lost in non-invertible
transformations. Second, domain invariance is often a far too strict
requirement and does not always lead to consistent estimation, even under
strong and favorable assumptions. In this work, we give generalization bounds
for unsupervised domain adaptation that hold for any representation function by
acknowledging the cost of non-invertibility. In addition, we show that
penalizing distance between densities is often wasteful and propose a bound
based on measuring the extent to which the support of the source domain covers
the target domain. We perform experiments on well-known benchmarks that
illustrate the short-comings of current standard practice
Why Is My Classifier Discriminatory?
Recent attempts to achieve fairness in predictive models focus on the balance
between fairness and accuracy. In sensitive applications such as healthcare or
criminal justice, this trade-off is often undesirable as any increase in
prediction error could have devastating consequences. In this work, we argue
that the fairness of predictions should be evaluated in context of the data,
and that unfairness induced by inadequate samples sizes or unmeasured
predictive variables should be addressed through data collection, rather than
by constraining the model. We decompose cost-based metrics of discrimination
into bias, variance, and noise, and propose actions aimed at estimating and
reducing each term. Finally, we perform case-studies on prediction of income,
mortality, and review ratings, confirming the value of this analysis. We find
that data collection is often a means to reduce discrimination without
sacrificing accuracy.Comment: Appeared in Advances in Neural Information Processing Systems
(NeurIPS 2018); 3 figures, 8 pages, 6 page supplementar
Estimation of Bounds on Potential Outcomes For Decision Making
Estimation of individual treatment effects is commonly used as the basis for
contextual decision making in fields such as healthcare, education, and
economics. However, it is often sufficient for the decision maker to have
estimates of upper and lower bounds on the potential outcomes of decision
alternatives to assess risks and benefits. We show that, in such cases, we can
improve sample efficiency by estimating simple functions that bound these
outcomes instead of estimating their conditional expectations, which may be
complex and hard to estimate. Our analysis highlights a trade-off between the
complexity of the learning task and the confidence with which the learned
bounds hold. Guided by these findings, we develop an algorithm for learning
upper and lower bounds on potential outcomes which optimize an objective
function defined by the decision maker, subject to the probability that bounds
are violated being small. Using a clinical dataset and a well-known causality
benchmark, we demonstrate that our algorithm outperforms baselines, providing
tighter, more reliable bounds
Pure Exploration in Bandits with Linear Constraints
We address the problem of identifying the optimal policy with a fixed
confidence level in a multi-armed bandit setup, when \emph{the arms are subject
to linear constraints}. Unlike the standard best-arm identification problem
which is well studied, the optimal policy in this case may not be deterministic
and could mix between several arms. This changes the geometry of the problem
which we characterize via an information-theoretic lower bound. We introduce
two asymptotically optimal algorithms for this setting, one based on the
Track-and-Stop method and the other based on a game-theoretic approach. Both
these algorithms try to track an optimal allocation based on the lower bound
and computed by a weighted projection onto the boundary of a normal cone.
Finally, we provide empirical results that validate our bounds and visualize
how constraints change the hardness of the problem
Learning to search efficiently for causally near-optimal treatments
Finding an effective medical treatment often requires a search by trial and
error. Making this search more efficient by minimizing the number of
unnecessary trials could lower both costs and patient suffering. We formalize
this problem as learning a policy for finding a near-optimal treatment in a
minimum number of trials using a causal inference framework. We give a
model-based dynamic programming algorithm which learns from observational data
while being robust to unmeasured confounding. To reduce time complexity, we
suggest a greedy algorithm which bounds the near-optimality constraint. The
methods are evaluated on synthetic and real-world healthcare data and compared
to model-free reinforcement learning. We find that our methods compare
favorably to the model-free baseline while offering a more transparent
trade-off between search time and treatment efficacy
- …