50,654 research outputs found
Distributed Stochastic Optimization with Gradient Tracking over Time-Varying Directed Networks
We study a distributed method called SAB-TV, which employs gradient tracking
to collaboratively minimize the sum of smooth and strongly-convex local cost
functions for networked agents communicating over a time-varying directed
graph. Each agent, assumed to have access to a stochastic first-order oracle
for obtaining an unbiased estimate of the gradient of its local cost function,
maintains an auxiliary variable to asymptotically track the stochastic gradient
of the global cost. The optimal decision and gradient tracking are updated over
time through limited information exchange with local neighbors using row- and
column-stochastic weights, guaranteeing both consensus and optimality. With a
sufficiently small constant step-size, we demonstrate that, in expectation,
SAB-TV converges linearly to a neighborhood of the optimal solution. Numerical
simulations illustrate the effectiveness of the proposed algorithm
Robust Fully-Asynchronous Methods for Distributed Training over General Architecture
Perfect synchronization in distributed machine learning problems is
inefficient and even impossible due to the existence of latency, package losses
and stragglers. We propose a Robust Fully-Asynchronous Stochastic Gradient
Tracking method (R-FAST), where each device performs local computation and
communication at its own pace without any form of synchronization. Different
from existing asynchronous distributed algorithms, R-FAST can eliminate the
impact of data heterogeneity across devices and allow for packet losses by
employing a robust gradient tracking strategy that relies on properly designed
auxiliary variables for tracking and buffering the overall gradient vector.
More importantly, the proposed method utilizes two spanning-tree graphs for
communication so long as both share at least one common root, enabling flexible
designs in communication architectures. We show that R-FAST converges in
expectation to a neighborhood of the optimum with a geometric rate for smooth
and strongly convex objectives; and to a stationary point with a sublinear rate
for general non-convex settings. Extensive experiments demonstrate that R-FAST
runs 1.5-2 times faster than synchronous benchmark algorithms, such as
Ring-AllReduce and D-PSGD, while still achieving comparable accuracy, and
outperforms existing asynchronous SOTA algorithms, such as AD-PSGD and OSGP,
especially in the presence of stragglers
Fully Distributed Nash Equilibrium Seeking in N-Cluster Games
Distributed optimization and Nash equilibrium (NE) seeking problems have
drawn much attention in the control community recently. This paper studies a
class of non-cooperative games, known as -cluster game, which subsumes both
cooperative and non-cooperative nature among multiple agents in the two
problems: solving distributed optimization problem within the cluster, while
playing a non-cooperative game across the clusters. Moreover, we consider a
partial-decision information game setup, i.e., the agents do not have direct
access to other agents' decisions, and hence need to communicate with each
other through a directed graph whose associated adjacency matrix is assumed to
be non-doubly stochastic. To solve the -cluster game problem, we propose a
fully distributed NE seeking algorithm by a synthesis of leader-following
consensus and gradient tracking, where the leader-following consensus protocol
is adopted to estimate the other agents' decisions and the gradient tracking
method is employed to trace some weighted average of the gradient. Furthermore,
the algorithm is equipped with uncoordinated constant step-sizes, which allows
the agents to choose their own preferred step-sizes, instead of a uniform
coordinated step-size. We prove that all agents' decisions converge linearly to
their corresponding NE so long as the largest step-size and the heterogeneity
of the step-size are small. We verify the derived results through a numerical
example in a Cournot competition game
Distributed randomized block stochastic gradient tracking methods: Rate analysis and numerical experiments
Distributed optimization has been a trending topic of research in the past few decades. This is mainly due to the recent advancements in the technology of wireless sensors and also the emerging applications in machine learning. Traditionally, optimization problems were addressed using centralized schemes where the data is assumed to be available all in one place. However, the main reasons that motivate the need for distributed implementations include: (i) the unavailability of the collected data in a centralized location, (ii) the privacy of the data among agents should be preserved, and (iii) the memory and computational power limitations of data processors. Accordingly, to address these challenges, distributed optimization provides a framework where agents (e.g., data processor, sensor) communicate their local information with each other over a network and seek to minimize a global objective function. In some applications, the data may have a huge sample size or a large number of attributes. The problems associated with this type of data are often known as big data problems. In this thesis, our goal is to address such high dimensional distributed optimizationproblems, where the computation of the local gradient mappings may become expensive.Recently, a distributed optimization algorithm has been developed for addressing possibly large-scale problems by considering stochasticity. This method is called Distributed Stochastic Gradient Tracking (DSGT). We develop a novel iterative method called Distributed Randomized Block Stochastic Gradient Tracking (DRBSGT), that is a randomized block variant of the existing DSGT method. We derive new non-asymptotic convergence rates of the order 1/k and 1/k^2 in terms of an optimality metric and a consensus violation metric, respectively. Importantly, while block coordinate schemes have been studied for distributed optimization problems before, the proposed algorithm appears to be the first randomized block-coordinate gradient tracking method that is equipped with the aforementioned convergence rate statements. We validate the performance of the proposed method on the MNIST and a synthetic data set under different network settings. A potential future research direction is to extend the results of this thesis to an asynchronous variant of the proposed method. This will allow for the consideration of communication delays
- …