9,763 research outputs found
Accelerated Federated Learning with Decoupled Adaptive Optimization
The federated learning (FL) framework enables edge clients to collaboratively
learn a shared inference model while keeping privacy of training data on
clients. Recently, many heuristics efforts have been made to generalize
centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc.,
to federated settings for improving convergence and accuracy. However, there is
still a paucity of theoretical principles on where to and how to design and
utilize adaptive optimization methods in federated settings. This work aims to
develop novel adaptive optimization methods for FL from the perspective of
dynamics of ordinary differential equations (ODEs). First, an analytic
framework is established to build a connection between federated optimization
methods and decompositions of ODEs of corresponding centralized optimizers.
Second, based on this analytic framework, a momentum decoupling adaptive
optimization method, FedDA, is developed to fully utilize the global momentum
on each local iteration and accelerate the training convergence. Last but not
least, full batch gradients are utilized to mimic centralized optimization in
the end of the training process to ensure the convergence and overcome the
possible inconsistency caused by adaptive optimization methods
Adaptive Federated Minimax Optimization with Lower complexities
Federated learning is a popular distributed and privacy-preserving machine
learning paradigm. Meanwhile, minimax optimization, as an effective
hierarchical optimization, is widely applied in machine learning. Recently,
some federated optimization methods have been proposed to solve the distributed
minimax problems. However, these federated minimax methods still suffer from
high gradient and communication complexities. Meanwhile, few algorithm focuses
on using adaptive learning rate to accelerate algorithms. To fill this gap, in
the paper, we study a class of nonconvex minimax optimization, and propose an
efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to
solve these distributed minimax problems. Specifically, our AdaFGDA builds on
the momentum-based variance reduced and local-SGD techniques, and it can
flexibly incorporate various adaptive learning rates by using the unified
adaptive matrix. Theoretically, we provide a solid convergence analysis
framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we
prove our algorithms obtain lower gradient (i.e., stochastic first-order
oracle, SFO) complexity of with lower communication
complexity of in finding -stationary point
of the nonconvex minimax problems. Experimentally, we conduct some experiments
on the deep AUC maximization and robust neural network training tasks to verify
efficiency of our algorithms.Comment: Submitted to AISTATS-202
Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients
Federated optimization, an emerging paradigm which finds wide real-world
applications such as federated learning, enables multiple clients (e.g., edge
devices) to collaboratively optimize a global function. The clients do not
share their local datasets and typically only share their local gradients.
However, the gradient information is not available in many applications of
federated optimization, which hence gives rise to the paradigm of federated
zeroth-order optimization (ZOO). Existing federated ZOO algorithms suffer from
the limitations of query and communication inefficiency, which can be
attributed to (a) their reliance on a substantial number of function queries
for gradient estimation and (b) the significant disparity between their
realized local updates and the intended global updates. To this end, we (a)
introduce trajectory-informed gradient surrogates which is able to use the
history of function queries during optimization for accurate and
query-efficient gradient estimation, and (b) develop the technique of adaptive
gradient correction using these gradient surrogates to mitigate the
aforementioned disparity. Based on these, we propose the federated zeroth-order
optimization using trajectory-informed surrogate gradients (FZooS) algorithm
for query- and communication-efficient federated ZOO. Our FZooS achieves
theoretical improvements over the existing approaches, which is supported by
our real-world experiments such as federated black-box adversarial attack and
federated non-differentiable metric optimization
Federated Online and Bandit Convex Optimization
We study the problems of distributed online and bandit convex optimization
against an adaptive adversary. We aim to minimize the average regret on
machines working in parallel over rounds with intermittent
communications. Assuming the underlying cost functions are convex and can be
generated adaptively, our results show that collaboration is not beneficial
when the machines have access to the first-order gradient information at the
queried points. This is in contrast to the case for stochastic functions, where
each machine samples the cost functions from a fixed distribution. Furthermore,
we delve into the more challenging setting of federated online optimization
with bandit (zeroth-order) feedback, where the machines can only access values
of the cost functions at the queried points. The key finding here is
identifying the high-dimensional regime where collaboration is beneficial and
may even lead to a linear speedup in the number of machines. We further
illustrate our findings through federated adversarial linear bandits by
developing novel distributed single and two-point feedback algorithms. Our work
is the first attempt towards a systematic understanding of federated online
optimization with limited feedback, and it attains tight regret bounds in the
intermittent communication setting for both first and zeroth-order feedback.
Our results thus bridge the gap between stochastic and adaptive settings in
federated online optimization
FedGrad: Optimisation in Decentralised Machine Learning
Federated Learning is a machine learning paradigm where we aim to train
machine learning models in a distributed fashion. Many clients/edge devices
collaborate with each other to train a single model on the central. Clients do
not share their own datasets with each other, decoupling computation and data
on the same device. In this paper, we propose yet another adaptive federated
optimization method and some other ideas in the field of federated learning. We
also perform experiments using these methods and showcase the improvement in
the overall performance of federated learning.Comment: 4 pages, 6 figures, submitting @ FL-AAAI Worksho
- …