20 research outputs found
Fast Decentralized Gradient Tracking for Federated Minimax Optimization with Local Updates
Federated learning (FL) for minimax optimization has emerged as a powerful
paradigm for training models across distributed nodes/clients while preserving
data privacy and model robustness on data heterogeneity. In this work, we delve
into the decentralized implementation of federated minimax optimization by
proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization
algorithm that combines local updates and gradient tracking techniques. Our
analysis showcases the algorithm's communication efficiency and convergence
rate for nonconvex-strongly-concave (NC-SC) minimax optimization, demonstrating
a superior convergence rate compared to existing methods.
\texttt{K-GT-Minimax}'s ability to handle data heterogeneity and ensure
robustness underscores its significance in advancing federated learning
research and applications
A General Continuous-Time Formulation of Stochastic ADMM and Its Variants
Stochastic versions of the alternating direction method of multiplier (ADMM)
and its variants play a key role in many modern large-scale machine learning
problems. In this work, we introduce a unified algorithmic framework called
generalized stochastic ADMM and investigate their continuous-time analysis. The
generalized framework widely includes many stochastic ADMM variants such as
standard, linearized and gradient-based ADMM. Our continuous-time analysis
provides us with new insights into stochastic ADMM and variants, and we
rigorously prove that under some proper scaling, the trajectory of stochastic
ADMM weakly converges to the solution of a stochastic differential equation
with small noise. Our analysis also provides a theoretical explanation of why
the relaxation parameter should be chosen between 0 and 2
On the Global Convergence of Continuous-Time Stochastic Heavy-Ball Method for Nonconvex Optimization
We study the convergence behavior of the stochastic heavy-ball method with a
small stepsize. Under a change of time scale, we approximate the discrete
method by a stochastic differential equation that models small random
perturbations of a coupled system of nonlinear oscillators. We rigorously show
that the perturbed system converges to a local minimum in a logarithmic time.
This indicates that for the diffusion process that approximates the stochastic
heavy-ball method, it takes (up to a logarithmic factor) only a linear time of
the square root of the inverse stepsize to escape from all saddle points. This
results may suggest a fast convergence of its discrete-time counterpart. Our
theoretical results are validated by numerical experiments.Comment: accepted at IEEE International Conference on Big Data in 201
Diffusion Approximations for Online Principal Component Estimation and Global Convergence
In this paper, we propose to adopt the diffusion approximation tools to study
the dynamics of Oja's iteration which is an online stochastic gradient descent
method for the principal component analysis. Oja's iteration maintains a
running estimate of the true principal component from streaming data and enjoys
less temporal and spatial complexities. We show that the Oja's iteration for
the top eigenvector generates a continuous-state discrete-time Markov chain
over the unit sphere. We characterize the Oja's iteration in three phases using
diffusion approximation and weak convergence tools. Our three-phase analysis
further provides a finite-sample error bound for the running estimate, which
matches the minimax information lower bound for principal component analysis
under the additional assumption of bounded samples.Comment: Appeared in NIPS 201