8,542 research outputs found
Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction
There is growing interest in large-scale machine learning and optimization
over decentralized networks, e.g. in the context of multi-agent learning and
federated learning. Due to the imminent need to alleviate the communication
burden, the investigation of communication-efficient distributed optimization
algorithms - particularly for empirical risk minimization - has flourished in
recent years. A large fraction of these algorithms have been developed for the
master/slave setting, relying on a central parameter server that can
communicate with all agents. This paper focuses on distributed optimization
over networks, or decentralized optimization, where each agent is only allowed
to aggregate information from its neighbors. By properly adjusting the global
gradient estimate via local averaging in conjunction with proper correction, we
develop a communication-efficient approximate Newton-type method Network-DANE,
which generalizes DANE to the decentralized scenarios. Our key ideas can be
applied in a systematic manner to obtain decentralized versions of other
master/slave distributed algorithms. A notable development is
Network-SVRG/SARAH, which employs variance reduction to further accelerate
local computation. We establish linear convergence of Network-DANE and
Network-SVRG for strongly convex losses, and Network-SARAH for quadratic
losses, which shed light on the impacts of data homogeneity, network
connectivity, and local averaging upon the rate of convergence. We further
extend Network-DANE to composite optimization by allowing a nonsmooth penalty
term. Numerical evidence is provided to demonstrate the appealing performance
of our algorithms over competitive baselines, in terms of both communication
and computation efficiency. Our work suggests that performing a certain amount
of local communications and computations per iteration can substantially
improve the overall efficiency
Influence Maximization over Markovian Graphs: A Stochastic Optimization Approach
This paper considers the problem of randomized influence maximization over a
Markovian graph process: given a fixed set of nodes whose connectivity graph is
evolving as a Markov chain, estimate the probability distribution (over this
fixed set of nodes) that samples a node which will initiate the largest
information cascade (in expectation). Further, it is assumed that the sampling
process affects the evolution of the graph i.e. the sampling distribution and
the transition probability matrix are functionally dependent. In this setup,
recursive stochastic optimization algorithms are presented to estimate the
optimal sampling distribution for two cases: 1) transition probabilities of the
graph are unknown but, the graph can be observed perfectly 2) transition
probabilities of the graph are known but, the graph is observed in noise. These
algorithms consist of a neighborhood size estimation algorithm combined with a
variance reduction method, a Bayesian filter and a stochastic gradient
algorithm. Convergence of the algorithms are established theoretically and,
numerical results are provided to illustrate how the algorithms work
A Robust Gradient Tracking Method for Distributed Optimization over Directed Networks
In this paper, we consider the problem of distributed consensus optimization
over multi-agent networks with directed network topology. Assuming each agent
has a local cost function that is smooth and strongly convex, the global
objective is to minimize the average of all the local cost functions. To solve
the problem, we introduce a robust gradient tracking method (R-Push-Pull)
adapted from the recently proposed Push-Pull/AB algorithm. R-Push-Pull inherits
the advantages of Push-Pull and enjoys linear convergence to the optimal
solution with exact communication. Under noisy information exchange,
R-Push-Pull is more robust than the existing gradient tracking based
algorithms; the solutions obtained by each agent reach a neighborhood of the
optimum in expectation exponentially fast under a constant stepsize policy. We
provide a numerical example that demonstrate the effectiveness of R-Push-Pull
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
Despite the success of single-agent reinforcement learning, multi-agent
reinforcement learning (MARL) remains challenging due to complex interactions
between agents. Motivated by decentralized applications such as sensor
networks, swarm robotics, and power grids, we study policy evaluation in MARL,
where agents with jointly observed state-action pairs and private local rewards
collaborate to learn the value of a given policy. In this paper, we propose a
double averaging scheme, where each agent iteratively performs averaging over
both space and time to incorporate neighboring gradient information and local
reward information, respectively. We prove that the proposed algorithm
converges to the optimal solution at a global geometric rate. In particular,
such an algorithm is built upon a primal-dual reformulation of the mean squared
projected Bellman error minimization problem, which gives rise to a
decentralized convex-concave saddle-point problem. To the best of our
knowledge, the proposed double averaging primal-dual optimization algorithm is
the first to achieve fast finite-time convergence on decentralized
convex-concave saddle-point problems.Comment: final version as appeared in NeurIPS 201
A Distributed Algorithm for Training Augmented Complex Adaptive IIR Filters
In this paper we consider the problem of decentralized (distributed) adaptive
learning, where the aim of the network is to train the coefficients of a widely
linear autoregressive moving average (ARMA) model by measurements collected by
the nodes. Such a problem arises in many sensor network-based applications such
as target tracking, fast rerouting, data reduction and data aggregation. We
assume that each node of the network uses the augmented complex adaptive
infinite impulse response (ACAIIR) filter as the learning rule, and nodes
interact with each other under an incremental mode of cooperation. Since the
proposed algorithm (incremental augmented complex IIR (IACA-IIR) algorithm)
relies on the augmented complex statistics, it can be used to model both types
of complex-valued signals (proper and improper signals). To evaluate the
performance of the proposed algorithm, we use both synthetic and real-world
complex signals in our simulations. The results exhibit superior performance of
the proposed algorithm over the non-cooperative ACAIIR algorithm.Comment: Draft version, 11 Pages, 4 Figure
Gradient tracking and variance reduction for decentralized optimization and machine learning
Decentralized methods to solve finite-sum minimization problems are important
in many signal processing and machine learning tasks where the data is
distributed over a network of nodes and raw data sharing is not permitted due
to privacy and/or resource constraints. In this article, we review
decentralized stochastic first-order methods and provide a unified algorithmic
framework that combines variance-reduction with gradient tracking to achieve
both robust performance and fast convergence. We provide explicit theoretical
guarantees of the corresponding methods when the objective functions are smooth
and strongly-convex, and show their applicability to non-convex problems via
numerical experiments. Throughout the article, we provide intuitive
illustrations of the main technical ideas by casting appropriate tradeoffs and
comparisons among the methods of interest and by highlighting applications to
decentralized training of machine learning models.Comment: accepted for publication, IEEE Signal Processing Magazin
Compressed Distributed Gradient Descent: Communication-Efficient Consensus over Networks
Network consensus optimization has received increasing attention in recent
years and has found important applications in many scientific and engineering
fields. To solve network consensus optimization problems, one of the most
well-known approaches is the distributed gradient descent method (DGD).
However, in networks with slow communication rates, DGD's performance is
unsatisfactory for solving high-dimensional network consensus problems due to
the communication bottleneck. This motivates us to design a
communication-efficient DGD-type algorithm based on compressed information
exchanges. Our contributions in this paper are three-fold: i) We develop a
communication-efficient algorithm called amplified-differential compression DGD
(ADC-DGD) and show that it converges under {\em any} unbiased compression
operator; ii) We rigorously prove the convergence performances of ADC-DGD and
show that they match with those of DGD without compression; iii) We reveal an
interesting phase transition phenomenon in the convergence speed of ADC-DGD.
Collectively, our findings advance the state-of-the-art of network consensus
optimization theory.Comment: 11 pages, 11 figures, IEEE INFOCOM 201
A general framework for decentralized optimization with first-order methods
Decentralized optimization to minimize a finite sum of functions over a
network of nodes has been a significant focus within control and signal
processing research due to its natural relevance to optimal control and signal
estimation problems. More recently, the emergence of sophisticated computing
and large-scale data science needs have led to a resurgence of activity in this
area. In this article, we discuss decentralized first-order gradient methods,
which have found tremendous success in control, signal processing, and machine
learning problems, where such methods, due to their simplicity, serve as the
first method of choice for many complex inference and training tasks. In
particular, we provide a general framework of decentralized first-order methods
that is applicable to undirected and directed communication networks alike, and
show that much of the existing work on optimization and consensus can be
related explicitly to this framework. We further extend the discussion to
decentralized stochastic first-order methods that rely on stochastic gradients
at each node and describe how local variance reduction schemes, previously
shown to have promise in the centralized settings, are able to improve the
performance of decentralized methods when combined with what is known as
gradient tracking. We motivate and demonstrate the effectiveness of the
corresponding methods in the context of machine learning and signal processing
problems that arise in decentralized environments
Distributed stochastic gradient tracking algorithm with variance reduction for non-convex optimization
This paper proposes a distributed stochastic algorithm with variance
reduction for general smooth non-convex finite-sum optimization, which has wide
applications in signal processing and machine learning communities. In
distributed setting, large number of samples are allocated to multiple agents
in the network. Each agent computes local stochastic gradient and communicates
with its neighbors to seek for the global optimum. In this paper, we develop a
modified variance reduction technique to deal with the variance introduced by
stochastic gradients. Combining gradient tracking and variance reduction
techniques, this paper proposes a distributed stochastic algorithm, GT-VR, to
solve large-scale non-convex finite-sum optimization over multi-agent networks.
A complete and rigorous proof shows that the GT-VR algorithm converges to
first-order stationary points with convergence rate. In
addition, we provide the complexity analysis of the proposed algorithm.
Compared with some existing first-order methods, the proposed algorithm has a
lower gradient complexity under some mild
condition. By comparing state-of-the-art algorithms and GT-VR in experimental
simulations, we verify the efficiency of the proposed algorithm.Comment: 11page
Big Learning with Bayesian Methods
Explosive growth in data and availability of cheap computing resources have
sparked increasing interest in Big learning, an emerging subfield that studies
scalable machine learning algorithms, systems, and applications with Big Data.
Bayesian methods represent one important class of statistic methods for machine
learning, with substantial recent developments on adaptive, flexible and
scalable Bayesian learning. This article provides a survey of the recent
advances in Big learning with Bayesian methods, termed Big Bayesian Learning,
including nonparametric Bayesian methods for adaptively inferring model
complexity, regularized Bayesian inference for improving the flexibility via
posterior regularization, and scalable algorithms and systems based on
stochastic subsampling and distributed computing for dealing with large-scale
applications.Comment: 21 pages, 6 figure
- …