Search CORE

9,126 research outputs found

Performance Limits of Stochastic Sub-Gradient Learning, Part II: Multi-Agent Case

Author: Sayed Ali H.
Ying Bicheng
Publication venue
Publication date: 20/04/2017
Field of study

The analysis in Part I revealed interesting properties for subgradient learning algorithms in the context of stochastic optimization when gradient noise is present. These algorithms are used when the risk functions are non-smooth and involve non-differentiable components. They have been long recognized as being slow converging methods. However, it was revealed in Part I that the rate of convergence becomes linear for stochastic optimization problems, with the error iterate converging at an exponential rate

\alpha^i

to within an

O(\mu)-

neighborhood of the optimizer, for some

\alpha \in (0,1)

and small step-size

\mu

. The conclusion was established under weaker assumptions than the prior literature and, moreover, several important problems (such as LASSO, SVM, and Total Variation) were shown to satisfy these weaker assumptions automatically (but not the previously used conditions from the literature). These results revealed that sub-gradient learning methods have more favorable behavior than originally thought when used to enable continuous adaptation and learning. The results of Part I were exclusive to single-agent adaptation. The purpose of the current Part II is to examine the implications of these discoveries when a collection of networked agents employs subgradient learning as their cooperative mechanism. The analysis will show that, despite the coupled dynamics that arises in a networked scenario, the agents are still able to attain linear convergence in the stochastic case; they are also able to reach agreement within

O(\mu)

of the optimizer

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Diffusion Adaptation Strategies for Distributed Optimization and Learning over Networks

Author: Chen Jianshu
Sayed Ali H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/05/2012
Field of study

We propose an adaptive diffusion mechanism to optimize a global cost function in a distributed manner over a network of nodes. The cost function is assumed to consist of a collection of individual components. Diffusion adaptation allows the nodes to cooperate and diffuse information in real-time; it also helps alleviate the effects of stochastic gradient noise and measurement noise through a continuous learning process. We analyze the mean-square-error performance of the algorithm in some detail, including its transient and steady-state behavior. We also apply the diffusion algorithm to two problems: distributed estimation with sparse parameters and distributed localization. Compared to well-studied incremental methods, diffusion methods do not require the use of a cyclic path over the nodes and are robust to node and link failure. Diffusion methods also endow networks with adaptation abilities that enable the individual nodes to continue learning even when the cost function changes with time. Examples involving such dynamic cost functions with moving targets are common in the context of biological networks.Comment: 34 pages, 6 figures, to appear in IEEE Transactions on Signal Processing, 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

On the topology of free paratopological groups

Author: Elfard Ali Sayed
Nickolas Peter
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

The result often known as Joiner's lemma is fundamental in understanding the topology of the free topological group

F(X)

on a Tychonoff space

X

. In this paper, an analogue of Joiner's lemma for the free paratopological group \FP(X) on a

T_1

space

X

is proved. Using this, it is shown that the following conditions are equivalent for a space

X

: (1)

X

T_1

; (2) \FP(X) is

T_1

; (3) the subspace

X

of \FP(X) is closed; (4) the subspace

X^{-1}

of \FP(X) is discrete; (5) the subspace

X^{-1}

T_1

; (6) the subspace

X^{-1}

is closed; and (7) the subspace \FP_n(X) is closed for all

n \in \N

, where \FP_n(X) denotes the subspace of \FP(X) consisting of all words of length at most

n

.Comment: http://blms.oxfordjournals.org/cgi/content/abstract/bds031?ijkey=9Su2bYV9e19JMxf&keytype=re

arXiv.org e-Print Archive

Crossref

Open Access Research from University of Wollongong

Linear Convergence of Primal-Dual Gradient Methods and their Performance in Distributed Optimization

Author: Alghunaim Sulaiman A.
Sayed Ali H.
Publication venue
Publication date: 16/01/2020
Field of study

In this work, we revisit a classical incremental implementation of the primal-descent dual-ascent gradient method used for the solution of equality constrained optimization problems. We provide a short proof that establishes the linear (exponential) convergence of the algorithm for smooth strongly-convex cost functions and study its relation to the non-incremental implementation. We also study the effect of the augmented Lagrangian penalty term on the performance of distributed optimization algorithms for the minimization of aggregate cost functions over multi-agent networks

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Sparse Distributed Learning Based on Diffusion Adaptation

Author: Di Lorenzo Paolo
Sayed Ali H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/11/2012
Field of study

This article proposes diffusion LMS strategies for distributed estimation over adaptive networks that are able to exploit sparsity in the underlying system model. The approach relies on convex regularization, common in compressive sensing, to enhance the detection of sparsity via a diffusive process over the network. The resulting algorithms endow networks with learning abilities and allow them to learn the sparse structure from the incoming data in real-time, and also to track variations in the sparsity of the model. We provide convergence and mean-square performance analysis of the proposed method and show under what conditions it outperforms the unregularized diffusion version. We also show how to adaptively select the regularization parameter. Simulation results illustrate the advantage of the proposed filters for sparse data recovery.Comment: to appear in IEEE Trans. on Signal Processing, 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref