1,033 research outputs found
FeDXL: Provable Federated Learning for Deep X-Risk Optimization
In this paper, we tackle a novel federated learning (FL) problem for
optimizing a family of X-risks, to which no existing FL algorithms are
applicable. In particular, the objective has the form of , where two sets of data
are distributed over multiple machines, is a pairwise loss that
only depends on the prediction outputs of the input data pairs , and
is possibly a non-linear non-convex function. This problem has
important applications in machine learning, e.g., AUROC maximization with a
pairwise loss, and partial AUROC maximization with a compositional loss. The
challenges for designing an FL algorithm lie in the non-decomposability of the
objective over multiple machines and the interdependency between different
machines. To address the challenges, we propose an active-passive decomposition
framework that decouples the gradient's components with two types, namely
active parts and passive parts, where the active parts depend on local data
that are computed with the local model and the passive parts depend on other
machines that are communicated/computed based on historical models and samples.
Under this framework, we develop two provable FL algorithms (FeDXL) for
handling linear and nonlinear , respectively, based on federated averaging
and merging. We develop a novel theoretical analysis to combat the latency of
the passive parts and the interdependency between the local model parameters
and the involved data for computing local gradient estimators. We establish
both iteration and communication complexities and show that using the
historical samples and models for computing the passive parts do not degrade
the complexities. We conduct empirical studies of FeDXL for deep AUROC and
partial AUROC maximization, and demonstrate their performance compared with
several baselines
Efficient Cross-Device Federated Learning Algorithms for Minimax Problems
In many machine learning applications where massive and privacy-sensitive
data are generated on numerous mobile or IoT devices, collecting data in a
centralized location may be prohibitive. Thus, it is increasingly attractive to
estimate parameters over mobile or IoT devices while keeping data localized.
Such learning setting is known as cross-device federated learning. In this
paper, we propose the first theoretically guaranteed algorithms for general
minimax problems in the cross-device federated learning setting. Our algorithms
require only a fraction of devices in each round of training, which overcomes
the difficulty introduced by the low availability of devices. The communication
overhead is further reduced by performing multiple local update steps on
clients before communication with the server, and global gradient estimates are
leveraged to correct the bias in local update directions introduced by data
heterogeneity. By developing analyses based on novel potential functions, we
establish theoretical convergence guarantees for our algorithms. Experimental
results on AUC maximization, robust adversarial network training, and GAN
training tasks demonstrate the efficiency of our algorithms
SAGDA: Achieving Communication Complexity in Federated Min-Max Learning
To lower the communication complexity of federated min-max learning, a
natural approach is to utilize the idea of infrequent communications (through
multiple local updates) same as in conventional federated learning. However,
due to the more complicated inter-outer problem structure in federated min-max
learning, theoretical understandings of communication complexity for federated
min-max learning with infrequent communications remain very limited in the
literature. This is particularly true for settings with non-i.i.d. datasets and
partial client participation. To address this challenge, in this paper, we
propose a new algorithmic framework called stochastic sampling averaging
gradient descent ascent (SAGDA), which i) assembles stochastic gradient
estimators from randomly sampled clients as control variates and ii) leverages
two learning rates on both server and client sides. We show that SAGDA achieves
a linear speedup in terms of both the number of clients and local update steps,
which yields an communication complexity that is
orders of magnitude lower than the state of the art. Interestingly, by noting
that the standard federated stochastic gradient descent ascent (FSGDA) is in
fact a control-variate-free special version of SAGDA, we immediately arrive at
an communication complexity result for FSGDA.
Therefore, through the lens of SAGDA, we also advance the current understanding
on communication complexity of the standard FSGDA method for federated min-max
learning.Comment: Published as a conference paper at NeurIPS 202
Algorithmic Foundations of Empirical X-risk Minimization
This manuscript introduces a new optimization framework for machine learning
and AI, named {\bf empirical X-risk minimization (EXM)}. X-risk is a term
introduced to represent a family of compositional measures or objectives, in
which each data point is compared with a large number of items explicitly or
implicitly for defining a risk function. It includes surrogate objectives of
many widely used measures and non-decomposable losses, e.g., AUROC, AUPRC,
partial AUROC, NDCG, MAP, precision/recall at top positions, precision at a
certain recall level, listwise losses, p-norm push, top push, global
contrastive losses, etc. While these non-decomposable objectives and their
optimization algorithms have been studied in the literature of machine
learning, computer vision, information retrieval, and etc, optimizing these
objectives has encountered some unique challenges for deep learning. In this
paper, we present recent rigorous efforts for EXM with a focus on its
algorithmic foundations and its applications. We introduce a class of
algorithmic techniques for solving EXM with smooth non-convex objectives. We
formulate EXM into three special families of non-convex optimization problems
belonging to non-convex compositional optimization, non-convex min-max
optimization and non-convex bilevel optimization, respectively. For each family
of problems, we present some strong baseline algorithms and their complexities,
which will motivate further research for improving the existing results.
Discussions about the presented results and future studies are given at the
end. Efficient algorithms for optimizing a variety of X-risks are implemented
in the LibAUC library at \url{www.libauc.org}
Stochastic Optimization of Areas UnderPrecision-Recall Curves with Provable Convergence
Areas under ROC (AUROC) and precision-recall curves (AUPRC) are common
metrics for evaluating classification performance for imbalanced problems.
Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanced
datasets. While stochastic optimization of AUROC has been studied extensively,
principled stochastic optimization of AUPRC has been rarely explored. In this
work, we propose a principled technical method to optimize AUPRC for deep
learning. Our approach is based on maximizing the averaged precision (AP),
which is an unbiased point estimator of AUPRC. We cast the objective into a sum
of {\it dependent compositional functions} with inner functions dependent on
random variables of the outer level. We propose efficient adaptive and
non-adaptive stochastic algorithms named SOAP with {\it provable convergence
guarantee under mild conditions} by leveraging recent advances in stochastic
compositional optimization. Extensive experimental results on image and graph
datasets demonstrate that our proposed method outperforms prior methods on
imbalanced problems in terms of AUPRC. To the best of our knowledge, our work
represents the first attempt to optimize AUPRC with provable convergence. The
SOAP has been implemented in the libAUC library at~\url{https://libauc.org/}.Comment: 24 pages, 10 figure
Adaptive Federated Minimax Optimization with Lower complexities
Federated learning is a popular distributed and privacy-preserving machine
learning paradigm. Meanwhile, minimax optimization, as an effective
hierarchical optimization, is widely applied in machine learning. Recently,
some federated optimization methods have been proposed to solve the distributed
minimax problems. However, these federated minimax methods still suffer from
high gradient and communication complexities. Meanwhile, few algorithm focuses
on using adaptive learning rate to accelerate algorithms. To fill this gap, in
the paper, we study a class of nonconvex minimax optimization, and propose an
efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to
solve these distributed minimax problems. Specifically, our AdaFGDA builds on
the momentum-based variance reduced and local-SGD techniques, and it can
flexibly incorporate various adaptive learning rates by using the unified
adaptive matrix. Theoretically, we provide a solid convergence analysis
framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we
prove our algorithms obtain lower gradient (i.e., stochastic first-order
oracle, SFO) complexity of with lower communication
complexity of in finding -stationary point
of the nonconvex minimax problems. Experimentally, we conduct some experiments
on the deep AUC maximization and robust neural network training tasks to verify
efficiency of our algorithms.Comment: Submitted to AISTATS-202
- ā¦