This paper proposes a Decentralized Stochastic Gradient Descent (DSGD)
algorithm to solve distributed machine-learning tasks over wirelessly-connected
systems, without the coordination of a base station. It combines local
stochastic gradient descent steps with a Non-Coherent Over-The-Air (NCOTA)
consensus scheme at the receivers, that enables concurrent transmissions by
leveraging the waveform superposition properties of the wireless channels. With
NCOTA, local optimization signals are mapped to a mixture of orthogonal
preamble sequences and transmitted concurrently over the wireless channel under
half-duplex constraints. Consensus is estimated by non-coherently combining the
received signals with the preamble sequences and mitigating the impact of noise
and fading via a consensus stepsize. NCOTA-DSGD operates without channel state
information (typically used in over-the-air computation schemes for channel
inversion) and leverages the channel pathloss to mix signals, without explicit
knowledge of the mixing weights (typically known in consensus-based
optimization). It is shown that, with a suitable tuning of decreasing consensus
and learning stepsizes, the error (measured as Euclidean distance) between the
local and globally optimum models vanishes with rate O(kβ1/4)
after k iterations. NCOTA-DSGD is evaluated numerically by solving an image
classification task on the MNIST dataset, cast as a regularized cross-entropy
loss minimization. Numerical results depict faster convergence vis-\`a-vis
running time than implementations of the classical DSGD algorithm over digital
and analog orthogonal channels, when the number of learning devices is large,
under stringent delay constraints.Comment: Submitted to the IEEE Transactions on Signal Processin