In this paper, we tackle a novel federated learning (FL) problem for
optimizing a family of X-risks, to which no existing FL algorithms are
applicable. In particular, the objective has the form of EzβΌS1ββf(Ezβ²βΌS2βββ(w;z,zβ²)), where two sets of data S1β,S2β
are distributed over multiple machines, β(β ) is a pairwise loss that
only depends on the prediction outputs of the input data pairs (z,zβ²), and
f(β ) is possibly a non-linear non-convex function. This problem has
important applications in machine learning, e.g., AUROC maximization with a
pairwise loss, and partial AUROC maximization with a compositional loss. The
challenges for designing an FL algorithm lie in the non-decomposability of the
objective over multiple machines and the interdependency between different
machines. To address the challenges, we propose an active-passive decomposition
framework that decouples the gradient's components with two types, namely
active parts and passive parts, where the active parts depend on local data
that are computed with the local model and the passive parts depend on other
machines that are communicated/computed based on historical models and samples.
Under this framework, we develop two provable FL algorithms (FeDXL) for
handling linear and nonlinear f, respectively, based on federated averaging
and merging. We develop a novel theoretical analysis to combat the latency of
the passive parts and the interdependency between the local model parameters
and the involved data for computing local gradient estimators. We establish
both iteration and communication complexities and show that using the
historical samples and models for computing the passive parts do not degrade
the complexities. We conduct empirical studies of FeDXL for deep AUROC and
partial AUROC maximization, and demonstrate their performance compared with
several baselines