2,790 research outputs found

    Generic machine learning inference on heterogenous treatment effects in randomized experiments

    Full text link
    We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied by machine learning methods. We post-process these proxies into the estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, deep and shallow neural networks, canonical and new random forests, boosted trees, and ensemble methods. Our approach is agnostic and does not make unrealistic or hard-to-check assumptions; we don’t require conditions for consistency of the ML methods. Estimation and inference relies on repeated data splitting to avoid overfitting and achieve validity. For inference, we take medians of p-values and medians of confidence intervals, resulting from many different data splits, and then adjust their nominal level to guarantee uniform validity. This variational inference method is shown to be uniformly valid and quantifies the uncertainty coming from both parameter estimation and data splitting. The inference method could be of substantial independent interest in many machine learning applications. An empirical application to the impact of micro-credit on economic development illustrates the use of the approach in randomized experiments. An additional application to the impact of the gender discrimination on wages illustrates the potential use of the approach in observational studies, where machine learning methods can be used to condition flexibly on very high-dimensional controls.https://arxiv.org/abs/1712.04802First author draf

    Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments

    Full text link
    We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied by machine learning methods. We post-process these proxies into the estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, deep and shallow neural networks, canonical and new random forests, boosted trees, and ensemble methods. It does not rely on strong assumptions. In particular, we don't require conditions for consistency of the machine learning methods. Estimation and inference relies on repeated data splitting to avoid overfitting and achieve validity. For inference, we take medians of p-values and medians of confidence intervals, resulting from many different data splits, and then adjust their nominal level to guarantee uniform validity. This variational inference method is shown to be uniformly valid and quantifies the uncertainty coming from both parameter estimation and data splitting. We illustrate the use of the approach with two randomized experiments in development on the effects of microcredit and nudges to stimulate immunization demand.Comment: 53 pages, 6 figures, 15 table

    Distributed Variance Reduction with Optimal Communication

    Full text link
    We consider the problem of distributed variance reduction: nn machines each receive probabilistic estimates of an unknown true vector Δ\Delta, and must cooperate to find a common estimate of Δ\Delta with lower variance, while minimizing communication. Variance reduction is closely related to the well-studied problem of distributed mean estimation, and is a key procedure in instances of distributed optimization, such as data-parallel stochastic gradient descent. Previous work typically assumes an upper bound on the norm of the input vectors, and achieves an output variance bound in terms of this norm. However, in real applications, the input vectors can be concentrated around the true vector Δ\Delta, but Δ\Delta itself may have large norm. In this case, output variance bounds in terms of input norm perform poorly, and may even increase variance. In this paper, we show that output variance need not depend on input norm. We provide a method of quantization which allows variance reduction to be performed with solution quality dependent only on input variance, not on input norm, and show an analogous result for mean estimation. This method is effective over a wide range of communication regimes, from sublinear to superlinear in the dimension. We also provide lower bounds showing that in many cases the communication to output variance trade-off is asymptotically optimal. Further, we show experimentally that our method yields improvements for common optimization tasks, when compared to prior approaches to distributed mean estimation.Comment: 28 pages, 14 figure

    Large-Scale Kernel Methods for Independence Testing

    Get PDF
    Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.Comment: 29 pages, 6 figure
    corecore