Distributed data naturally arise in scenarios involving multiple sources of
observations, each stored at a different location. Directly pooling all the
data together is often prohibited due to limited bandwidth and storage, or due
to privacy protocols. This paper introduces a new robust distributed algorithm
for fitting linear regressions when data are subject to heavy-tailed and/or
asymmetric errors with finite second moments. The algorithm only communicates
gradient information at each iteration and therefore is
communication-efficient. Statistically, the resulting estimator achieves the
centralized nonasymptotic error bound as if all the data were pooled together
and came from a distribution with sub-Gaussian tails. Under a finite
(2+δ)-th moment condition, we derive a Berry-Esseen bound for the
distributed estimator, based on which we construct robust confidence intervals.
Numerical studies further confirm that compared with extant distributed
methods, the proposed methods achieve near-optimal accuracy with low
variability and better coverage with tighter confidence width.Comment: 29 page