Distributed Estimation and Inference with Statistical Guarantees


This paper studies hypothesis testing and parameter estimation in the context of the divide and conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from kk subsamples of size n/kn/k, where nn is the sample size. In both low dimensional and high dimensional settings, we address the important question of how to choose kk as nn grows large, providing a theoretical upper bound on kk such that the information loss due to the divide and conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as a practically infeasible oracle with access to the full sample. Thorough numerical results are provided to back up the theory

    Similar works