3,749 research outputs found
Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization
We suggest a general oracle-based framework that captures different parallel
stochastic optimization settings described by a dependency graph, and derive
generic lower bounds in terms of this graph. We then use the framework and
derive lower bounds for several specific parallel optimization settings,
including delayed updates and parallel processing with intermittent
communication. We highlight gaps between lower and upper bounds on the oracle
complexity, and cases where the "natural" algorithms are not known to be
optimal
Adaptive Federated Minimax Optimization with Lower complexities
Federated learning is a popular distributed and privacy-preserving machine
learning paradigm. Meanwhile, minimax optimization, as an effective
hierarchical optimization, is widely applied in machine learning. Recently,
some federated optimization methods have been proposed to solve the distributed
minimax problems. However, these federated minimax methods still suffer from
high gradient and communication complexities. Meanwhile, few algorithm focuses
on using adaptive learning rate to accelerate algorithms. To fill this gap, in
the paper, we study a class of nonconvex minimax optimization, and propose an
efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to
solve these distributed minimax problems. Specifically, our AdaFGDA builds on
the momentum-based variance reduced and local-SGD techniques, and it can
flexibly incorporate various adaptive learning rates by using the unified
adaptive matrix. Theoretically, we provide a solid convergence analysis
framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we
prove our algorithms obtain lower gradient (i.e., stochastic first-order
oracle, SFO) complexity of with lower communication
complexity of in finding -stationary point
of the nonconvex minimax problems. Experimentally, we conduct some experiments
on the deep AUC maximization and robust neural network training tasks to verify
efficiency of our algorithms.Comment: Submitted to AISTATS-202
Byzantine Stochastic Gradient Descent
This paper studies the problem of distributed stochastic optimization in an
adversarial setting where, out of the machines which allegedly compute
stochastic gradients every iteration, an -fraction are Byzantine, and
can behave arbitrarily and adversarially. Our main result is a variant of
stochastic gradient descent (SGD) which finds -approximate
minimizers of convex functions in iterations. In contrast, traditional
mini-batch SGD needs iterations,
but cannot tolerate Byzantine failures. Further, we provide a lower bound
showing that, up to logarithmic factors, our algorithm is
information-theoretically optimal both in terms of sampling complexity and time
complexity
- …