49 research outputs found

    A review of distributed statistical inference

    Full text link
    The rapid emergence of massive datasets in various fields poses a serious challenge to traditional statistical methods. Meanwhile, it provides opportunities for researchers to develop novel algorithms. Inspired by the idea of divide-and-conquer, various distributed frameworks for statistical estimation and inference have been proposed. They were developed to deal with large-scale statistical optimization problems. This paper aims to provide a comprehensive review for related literature. It includes parametric models, nonparametric models, and other frequently used models. Their key ideas and theoretical properties are summarized. The trade-off between communication cost and estimate precision together with other concerns are discussed

    Data-driven confidence bands for distributed nonparametric regression

    Get PDF
    Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. Unfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist L2L_2-confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimax-optimal high-probability bound for the averaged estimator, complementing and generalizing the known risk bounds

    Data-driven confidence bands for distributed nonparametric regression

    Get PDF
    Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. Unfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist L2L_2-confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimax-optimal high-probability bound for the averaged estimator, complementing and generalizing the known risk bounds.Comment: COLT2020 (to appear

    On the optimality of misspecified spectral algorithms

    Full text link
    In the misspecified spectral algorithms problem, researchers usually assume the underground true function fρ[H]sf_{\rho}^{*} \in [\mathcal{H}]^{s}, a less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS) H\mathcal{H} for some s(0,1)s\in (0,1). The existing minimax optimal results require fρLα0\|f_{\rho}^{*}\|_{L^{\infty}} \alpha_{0} where α0(0,1)\alpha_{0}\in (0,1) is the embedding index, a constant depending on H\mathcal{H}. Whether the spectral algorithms are optimal for all s(0,1)s\in (0,1) is an outstanding problem lasting for years. In this paper, we show that spectral algorithms are minimax optimal for any α01β<s<1\alpha_{0}-\frac{1}{\beta} < s < 1, where β\beta is the eigenvalue decay rate of H\mathcal{H}. We also give several classes of RKHSs whose embedding index satisfies α0=1β \alpha_0 = \frac{1}{\beta} . Thus, the spectral algorithms are minimax optimal for all s(0,1)s\in (0,1) on these RKHSs.Comment: 48 pages, 2 figure

    Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up

    Full text link
    We analyse the learning performance of Distributed Gradient Descent in the context of multi-agent decentralised non-parametric regression with the square loss function when i.i.d. samples are assigned to agents. We show that if agents hold sufficiently many samples with respect to the network size, then Distributed Gradient Descent achieves optimal statistical rates with a number of iterations that scales, up to a threshold, with the inverse of the spectral gap of the gossip matrix divided by the number of samples owned by each agent raised to a problem-dependent power. The presence of the threshold comes from statistics. It encodes the existence of a "big data" regime where the number of required iterations does not depend on the network topology. In this regime, Distributed Gradient Descent achieves optimal statistical rates with the same order of iterations as gradient descent run with all the samples in the network. Provided the communication delay is sufficiently small, the distributed protocol yields a linear speed-up in runtime compared to the single-machine protocol. This is in contrast to decentralised optimisation algorithms that do not exploit statistics and only yield a linear speed-up in graphs where the spectral gap is bounded away from zero. Our results exploit the statistical concentration of quantities held by agents and shed new light on the interplay between statistics and communication in decentralised methods. Bounds are given in the standard non-parametric setting with source/capacity assumptions
    corecore