Federated learning (FL) is a decentralized learning framework wherein a
parameter server (PS) and a collection of clients collaboratively train a model
via minimizing a global objective. Communication bandwidth is a scarce
resource; in each round, the PS aggregates the updates from a subset of clients
only. In this paper, we focus on non-convex minimization that is vulnerable to
non-uniform and time-varying communication failures between the PS and the
clients. Specifically, in each round t, the link between the PS and client
i is active with probability pit, which is unknown to both the
PS and the clients. This arises when the channel conditions are heterogeneous
across clients and are changing over time.
We show that when the pit's are not uniform, Federated Average
(FedAvg) -- the most widely adopted FL algorithm -- fails to minimize the
global objective. Observing this, we propose Federated Postponed Broadcast (FedPBC) which is a simple variant of FedAvg. It differs from
FedAvg in that the PS postpones broadcasting the global model till the end of
each round. We show that FedPBC converges to a stationary point of the original
objective. The introduced staleness is mild and there is no noticeable
slowdown. Both theoretical analysis and numerical results are provided. On the
technical front, postponing the global model broadcasts enables implicit
gossiping among the clients with active links at round t. Despite pit's
are time-varying, we are able to bound the perturbation of the global model
dynamics via the techniques of controlling the gossip-type information mixing
errors