2,790 research outputs found
Generic machine learning inference on heterogenous treatment effects in randomized experiments
We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied by machine learning methods. We post-process these proxies into the estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, deep and shallow neural networks, canonical and new random forests, boosted trees, and ensemble methods. Our approach is agnostic and does not make unrealistic or hard-to-check assumptions; we don’t require conditions for consistency of the ML methods. Estimation and inference relies on repeated data splitting to avoid overfitting and achieve validity. For inference, we take medians of p-values and medians of confidence intervals, resulting from many different data splits, and then adjust their nominal level to guarantee uniform validity. This variational inference method is shown to be uniformly valid and quantifies the uncertainty coming from both parameter estimation and data splitting. The inference method could be of substantial independent interest in many machine learning applications. An empirical application to the impact of micro-credit on economic development illustrates the use of the approach in randomized
experiments. An additional application to the impact of the gender discrimination on wages illustrates the potential use of the approach in observational studies, where machine learning methods can be used to condition flexibly on very high-dimensional controls.https://arxiv.org/abs/1712.04802First author draf
Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments
We propose strategies to estimate and make inference on key features of
heterogeneous effects in randomized experiments. These key features include
best linear predictors of the effects using machine learning proxies, average
effects sorted by impact groups, and average characteristics of most and least
impacted units. The approach is valid in high dimensional settings, where the
effects are proxied by machine learning methods. We post-process these proxies
into the estimates of the key features. Our approach is generic, it can be used
in conjunction with penalized methods, deep and shallow neural networks,
canonical and new random forests, boosted trees, and ensemble methods. It does
not rely on strong assumptions. In particular, we don't require conditions for
consistency of the machine learning methods. Estimation and inference relies on
repeated data splitting to avoid overfitting and achieve validity. For
inference, we take medians of p-values and medians of confidence intervals,
resulting from many different data splits, and then adjust their nominal level
to guarantee uniform validity. This variational inference method is shown to be
uniformly valid and quantifies the uncertainty coming from both parameter
estimation and data splitting. We illustrate the use of the approach with two
randomized experiments in development on the effects of microcredit and nudges
to stimulate immunization demand.Comment: 53 pages, 6 figures, 15 table
Distributed Variance Reduction with Optimal Communication
We consider the problem of distributed variance reduction: machines each
receive probabilistic estimates of an unknown true vector , and must
cooperate to find a common estimate of with lower variance, while
minimizing communication.
Variance reduction is closely related to the well-studied problem of
distributed mean estimation, and is a key procedure in instances of distributed
optimization, such as data-parallel stochastic gradient descent. Previous work
typically assumes an upper bound on the norm of the input vectors, and achieves
an output variance bound in terms of this norm. However, in real applications,
the input vectors can be concentrated around the true vector , but
itself may have large norm. In this case, output variance bounds in
terms of input norm perform poorly, and may even increase variance.
In this paper, we show that output variance need not depend on input norm. We
provide a method of quantization which allows variance reduction to be
performed with solution quality dependent only on input variance, not on input
norm, and show an analogous result for mean estimation. This method is
effective over a wide range of communication regimes, from sublinear to
superlinear in the dimension. We also provide lower bounds showing that in many
cases the communication to output variance trade-off is asymptotically optimal.
Further, we show experimentally that our method yields improvements for common
optimization tasks, when compared to prior approaches to distributed mean
estimation.Comment: 28 pages, 14 figure
Large-Scale Kernel Methods for Independence Testing
Representations of probability measures in reproducing kernel Hilbert spaces
provide a flexible framework for fully nonparametric hypothesis tests of
independence, which can capture any type of departure from independence,
including nonlinear associations and multivariate interactions. However, these
approaches come with an at least quadratic computational cost in the number of
observations, which can be prohibitive in many applications. Arguably, it is
exactly in such large-scale datasets that capturing any type of dependence is
of interest, so striking a favourable tradeoff between computational efficiency
and test performance for kernel independence tests would have a direct impact
on their applicability in practice. In this contribution, we provide an
extensive study of the use of large-scale kernel approximations in the context
of independence testing, contrasting block-based, Nystrom and random Fourier
feature approaches. Through a variety of synthetic data experiments, it is
demonstrated that our novel large scale methods give comparable performance
with existing methods whilst using significantly less computation time and
memory.Comment: 29 pages, 6 figure
- …