To lower the communication complexity of federated min-max learning, a
natural approach is to utilize the idea of infrequent communications (through
multiple local updates) same as in conventional federated learning. However,
due to the more complicated inter-outer problem structure in federated min-max
learning, theoretical understandings of communication complexity for federated
min-max learning with infrequent communications remain very limited in the
literature. This is particularly true for settings with non-i.i.d. datasets and
partial client participation. To address this challenge, in this paper, we
propose a new algorithmic framework called stochastic sampling averaging
gradient descent ascent (SAGDA), which i) assembles stochastic gradient
estimators from randomly sampled clients as control variates and ii) leverages
two learning rates on both server and client sides. We show that SAGDA achieves
a linear speedup in terms of both the number of clients and local update steps,
which yields an O(ϵ−2) communication complexity that is
orders of magnitude lower than the state of the art. Interestingly, by noting
that the standard federated stochastic gradient descent ascent (FSGDA) is in
fact a control-variate-free special version of SAGDA, we immediately arrive at
an O(ϵ−2) communication complexity result for FSGDA.
Therefore, through the lens of SAGDA, we also advance the current understanding
on communication complexity of the standard FSGDA method for federated min-max
learning.Comment: Published as a conference paper at NeurIPS 202