The analysis in Part I revealed interesting properties for subgradient
learning algorithms in the context of stochastic optimization when gradient
noise is present. These algorithms are used when the risk functions are
non-smooth and involve non-differentiable components. They have been long
recognized as being slow converging methods. However, it was revealed in Part I
that the rate of convergence becomes linear for stochastic optimization
problems, with the error iterate converging at an exponential rate αi
to within an O(μ)−neighborhood of the optimizer, for some α∈(0,1) and small step-size μ. The conclusion was established under weaker
assumptions than the prior literature and, moreover, several important problems
(such as LASSO, SVM, and Total Variation) were shown to satisfy these weaker
assumptions automatically (but not the previously used conditions from the
literature). These results revealed that sub-gradient learning methods have
more favorable behavior than originally thought when used to enable continuous
adaptation and learning. The results of Part I were exclusive to single-agent
adaptation. The purpose of the current Part II is to examine the implications
of these discoveries when a collection of networked agents employs subgradient
learning as their cooperative mechanism. The analysis will show that, despite
the coupled dynamics that arises in a networked scenario, the agents are still
able to attain linear convergence in the stochastic case; they are also able to
reach agreement within O(μ) of the optimizer