2,472 research outputs found
Supervised Learning Under Distributed Features
This work studies the problem of learning under both large datasets and
large-dimensional feature space scenarios. The feature information is assumed
to be spread across agents in a network, where each agent observes some of the
features. Through local cooperation, the agents are supposed to interact with
each other to solve an inference problem and converge towards the global
minimizer of an empirical risk. We study this problem exclusively in the primal
domain, and propose new and effective distributed solutions with guaranteed
convergence to the minimizer with linear rate under strong convexity. This is
achieved by combining a dynamic diffusion construction, a pipeline strategy,
and variance-reduced techniques. Simulation results illustrate the conclusions
Distributed Big-Data Optimization via Block-Iterative Convexification and Averaging
In this paper, we study distributed big-data nonconvex optimization in
multi-agent networks. We consider the (constrained) minimization of the sum of
a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a
convex (possibly) nonsmooth regularizer. Our interest is in big-data problems
wherein there is a large number of variables to optimize. If treated by means
of standard distributed optimization algorithms, these large-scale problems may
be intractable, due to the prohibitive local computation and communication
burden at each node. We propose a novel distributed solution method whereby at
each iteration agents optimize and then communicate (in an uncoordinated
fashion) only a subset of their decision variables. To deal with non-convexity
of the cost function, the novel scheme hinges on Successive Convex
Approximation (SCA) techniques coupled with i) a tracking mechanism
instrumental to locally estimate gradient averages; and ii) a novel block-wise
consensus-based protocol to perform local block-averaging operations and
gradient tacking. Asymptotic convergence to stationary solutions of the
nonconvex problem is established. Finally, numerical results show the
effectiveness of the proposed algorithm and highlight how the block dimension
impacts on the communication overhead and practical convergence speed
Dynamic Average Diffusion with randomized Coordinate Updates
This work derives and analyzes an online learning strategy for tracking the
average of time-varying distributed signals by relying on randomized
coordinate-descent updates. During each iteration, each agent selects or
observes a random entry of the observation vector, and different agents may
select different entries of their observations before engaging in a
consultation step. Careful coordination of the interactions among agents is
necessary to avoid bias and ensure convergence. We provide a convergence
analysis for the proposed methods, and illustrate the results by means of
simulations
Variance-Reduced Stochastic Learning by Networked Agents under Random Reshuffling
A new amortized variance-reduced gradient (AVRG) algorithm was developed in
\cite{ying2017convergence}, which has constant storage requirement in
comparison to SAGA and balanced gradient computations in comparison to SVRG.
One key advantage of the AVRG strategy is its amenability to decentralized
implementations. In this work, we show how AVRG can be extended to the network
case where multiple learning agents are assumed to be connected by a graph
topology. In this scenario, each agent observes data that is spatially
distributed and all agents are only allowed to communicate with direct
neighbors. Moreover, the amount of data observed by the individual agents may
differ drastically. For such situations, the balanced gradient computation
property of AVRG becomes a real advantage in reducing idle time caused by
unbalanced local data storage requirements, which is characteristic of other
reduced-variance gradient algorithms. The resulting diffusion-AVRG algorithm is
shown to have linear convergence to the exact solution, and is much more memory
efficient than other alternative algorithms. In addition, we propose a
mini-batch strategy to balance the communication and computation efficiency for
diffusion-AVRG. When a proper batch size is employed, it is observed in
simulations that diffusion-AVRG is more computationally efficient than exact
diffusion or EXTRA while maintaining almost the same communication efficiency.Comment: 23 pages, 12 figures, submitted for publicatio
Robust and Efficient Aggregation for Distributed Learning
Distributed learning paradigms, such as federated and decentralized learning,
allow for the coordination of models across a collection of agents, and without
the need to exchange raw data. Instead, agents compute model updates locally
based on their available data, and subsequently share the update model with a
parameter server or their peers. This is followed by an aggregation step, which
traditionally takes the form of a (weighted) average. Distributed learning
schemes based on averaging are known to be susceptible to outliers. A single
malicious agent is able to drive an averaging-based distributed learning
algorithm to an arbitrarily poor model. This has motivated the development of
robust aggregation schemes, which are based on variations of the median and
trimmed mean. While such procedures ensure robustness to outliers and malicious
behavior, they come at the cost of significantly reduced sample efficiency.
This means that current robust aggregation schemes require significantly higher
agent participation rates to achieve a given level of performance than their
mean-based counterparts in non-contaminated settings. In this work we remedy
this drawback by developing statistically efficient and robust aggregation
schemes for distributed learning
A Distributed Asynchronous Method of Multipliers for Constrained Nonconvex Optimization
This paper presents a fully asynchronous and distributed approach for
tackling optimization problems in which both the objective function and the
constraints may be nonconvex. In the considered network setting each node is
active upon triggering of a local timer and has access only to a portion of the
objective function and to a subset of the constraints. In the proposed
technique, based on the method of multipliers, each node performs, when it
wakes up, either a descent step on a local augmented Lagrangian or an ascent
step on the local multiplier vector. Nodes realize when to switch from the
descent step to the ascent one through an asynchronous distributed logic-AND,
which detects when all the nodes have reached a predefined tolerance in the
minimization of the augmented Lagrangian. It is shown that the resulting
distributed algorithm is equivalent to a block coordinate descent for the
minimization of the global augmented Lagrangian. This allows one to extend the
properties of the centralized method of multipliers to the considered
distributed framework. Two application examples are presented to validate the
proposed approach: a distributed source localization problem and the parameter
estimation of a neural network.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0648
- …