1,140 research outputs found
Stochastic Learning under Random Reshuffling with Constant Step-sizes
In empirical risk optimization, it has been observed that stochastic gradient
implementations that rely on random reshuffling of the data achieve better
performance than implementations that rely on sampling the data uniformly.
Recent works have pursued justifications for this behavior by examining the
convergence rate of the learning process under diminishing step-sizes. This
work focuses on the constant step-size case and strongly convex loss function.
In this case, convergence is guaranteed to a small neighborhood of the
optimizer albeit at a linear rate. The analysis establishes analytically that
random reshuffling outperforms uniform sampling by showing explicitly that
iterates approach a smaller neighborhood of size around the
minimizer rather than . Furthermore, we derive an analytical expression
for the steady-state mean-square-error performance of the algorithm, which
helps clarify in greater detail the differences between sampling with and
without replacement. We also explain the periodic behavior that is observed in
random reshuffling implementations
Variance-Reduced Stochastic Learning by Networked Agents under Random Reshuffling
A new amortized variance-reduced gradient (AVRG) algorithm was developed in
\cite{ying2017convergence}, which has constant storage requirement in
comparison to SAGA and balanced gradient computations in comparison to SVRG.
One key advantage of the AVRG strategy is its amenability to decentralized
implementations. In this work, we show how AVRG can be extended to the network
case where multiple learning agents are assumed to be connected by a graph
topology. In this scenario, each agent observes data that is spatially
distributed and all agents are only allowed to communicate with direct
neighbors. Moreover, the amount of data observed by the individual agents may
differ drastically. For such situations, the balanced gradient computation
property of AVRG becomes a real advantage in reducing idle time caused by
unbalanced local data storage requirements, which is characteristic of other
reduced-variance gradient algorithms. The resulting diffusion-AVRG algorithm is
shown to have linear convergence to the exact solution, and is much more memory
efficient than other alternative algorithms. In addition, we propose a
mini-batch strategy to balance the communication and computation efficiency for
diffusion-AVRG. When a proper batch size is employed, it is observed in
simulations that diffusion-AVRG is more computationally efficient than exact
diffusion or EXTRA while maintaining almost the same communication efficiency.Comment: 23 pages, 12 figures, submitted for publicatio
Skyrmion Gas Manipulation for Probabilistic Computing
The topologically protected magnetic spin configurations known as skyrmions
offer promising applications due to their stability, mobility and localization.
In this work, we emphasize how to leverage the thermally driven dynamics of an
ensemble of such particles to perform computing tasks. We propose a device
employing a skyrmion gas to reshuffle a random signal into an uncorrelated copy
of itself. This is demonstrated by modelling the ensemble dynamics in a
collective coordinate approach where skyrmion-skyrmion and skyrmion-boundary
interactions are accounted for phenomenologically. Our numerical results are
used to develop a proof-of-concept for an energy efficient
() device with a low area imprint ().
Whereas its immediate application to stochastic computing circuit designs will
be made apparent, we argue that its basic functionality, reminiscent of an
integrate-and-fire neuron, qualifies it as a novel bio-inspired building block.Comment: 41 pages, 20 figure
- …