31,503 research outputs found
Variance-Reduced Stochastic Learning by Networked Agents under Random Reshuffling
A new amortized variance-reduced gradient (AVRG) algorithm was developed in
\cite{ying2017convergence}, which has constant storage requirement in
comparison to SAGA and balanced gradient computations in comparison to SVRG.
One key advantage of the AVRG strategy is its amenability to decentralized
implementations. In this work, we show how AVRG can be extended to the network
case where multiple learning agents are assumed to be connected by a graph
topology. In this scenario, each agent observes data that is spatially
distributed and all agents are only allowed to communicate with direct
neighbors. Moreover, the amount of data observed by the individual agents may
differ drastically. For such situations, the balanced gradient computation
property of AVRG becomes a real advantage in reducing idle time caused by
unbalanced local data storage requirements, which is characteristic of other
reduced-variance gradient algorithms. The resulting diffusion-AVRG algorithm is
shown to have linear convergence to the exact solution, and is much more memory
efficient than other alternative algorithms. In addition, we propose a
mini-batch strategy to balance the communication and computation efficiency for
diffusion-AVRG. When a proper batch size is employed, it is observed in
simulations that diffusion-AVRG is more computationally efficient than exact
diffusion or EXTRA while maintaining almost the same communication efficiency.Comment: 23 pages, 12 figures, submitted for publicatio
D: Decentralized Training over Decentralized Data
While training a machine learning model using multiple workers, each of which
collects data from their own data sources, it would be most useful when the
data collected from different workers can be {\em unique} and {\em different}.
Ironically, recent analysis of decentralized parallel stochastic gradient
descent (D-PSGD) relies on the assumption that the data hosted on different
workers are {\em not too different}. In this paper, we ask the question: {\em
Can we design a decentralized parallel stochastic gradient descent algorithm
that is less sensitive to the data variance across workers?} In this paper, we
present D, a novel decentralized parallel stochastic gradient descent
algorithm designed for large data variance \xr{among workers} (imprecisely,
"decentralized" data). The core of D is a variance blackuction extension of
the standard D-PSGD algorithm, which improves the convergence rate from
to where
denotes the variance among data on different workers. As a result, D is
robust to data variance among workers. We empirically evaluated D on image
classification tasks where each worker has access to only the data of a limited
set of labels, and find that D significantly outperforms D-PSGD
Distributed Deblurring of Large Images of Wide Field-Of-View
Image deblurring is an economic way to reduce certain degradations (blur and
noise) in acquired images. Thus, it has become essential tool in high
resolution imaging in many applications, e.g., astronomy, microscopy or
computational photography. In applications such as astronomy and satellite
imaging, the size of acquired images can be extremely large (up to gigapixels)
covering wide field-of-view suffering from shift-variant blur. Most of the
existing image deblurring techniques are designed and implemented to work
efficiently on centralized computing system having multiple processors and a
shared memory. Thus, the largest image that can be handle is limited by the
size of the physical memory available on the system. In this paper, we propose
a distributed nonblind image deblurring algorithm in which several connected
processing nodes (with reasonable computational resources) process
simultaneously different portions of a large image while maintaining certain
coherency among them to finally obtain a single crisp image. Unlike the
existing centralized techniques, image deblurring in distributed fashion raises
several issues. To tackle these issues, we consider certain approximations that
trade-offs between the quality of deblurred image and the computational
resources required to achieve it. The experimental results show that our
algorithm produces the similar quality of images as the existing centralized
techniques while allowing distribution, and thus being cost effective for
extremely large images.Comment: 16 pages, 10 figures, submitted to IEEE Trans. on Image Processin
- …