Search CORE

653 research outputs found

Stochastic Learning under Random Reshuffling with Constant Step-sizes

Author: Sayed Ali H.
Vlaski Stefan
Ying Bicheng
Yuan Kun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/10/2018
Field of study

In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. This work focuses on the constant step-size case and strongly convex loss function. In this case, convergence is guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size

O(\mu^2)

around the minimizer rather than

O(\mu)

. Furthermore, we derive an analytical expression for the steady-state mean-square-error performance of the algorithm, which helps clarify in greater detail the differences between sampling with and without replacement. We also explain the periodic behavior that is observed in random reshuffling implementations

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Variance-Reduced Stochastic Learning by Networked Agents under Random Reshuffling

Author: Liu Jiageng
Sayed Ali H.
Ying Bicheng
Yuan Kun
Publication venue
Publication date: 29/05/2018
Field of study

A new amortized variance-reduced gradient (AVRG) algorithm was developed in \cite{ying2017convergence}, which has constant storage requirement in comparison to SAGA and balanced gradient computations in comparison to SVRG. One key advantage of the AVRG strategy is its amenability to decentralized implementations. In this work, we show how AVRG can be extended to the network case where multiple learning agents are assumed to be connected by a graph topology. In this scenario, each agent observes data that is spatially distributed and all agents are only allowed to communicate with direct neighbors. Moreover, the amount of data observed by the individual agents may differ drastically. For such situations, the balanced gradient computation property of AVRG becomes a real advantage in reducing idle time caused by unbalanced local data storage requirements, which is characteristic of other reduced-variance gradient algorithms. The resulting diffusion-AVRG algorithm is shown to have linear convergence to the exact solution, and is much more memory efficient than other alternative algorithms. In addition, we propose a mini-batch strategy to balance the communication and computation efficiency for diffusion-AVRG. When a proper batch size is employed, it is observed in simulations that diffusion-AVRG is more computationally efficient than exact diffusion or EXTRA while maintaining almost the same communication efficiency.Comment: 23 pages, 12 figures, submitted for publicatio

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Convergence of Random Reshuffling Under The Kurdyka-{\L}ojasiewicz Inequality

Author: Li Xiao
Milzarek Andre
Qiu Junwen
Publication venue
Publication date: 03/10/2022
Field of study

We study the random reshuffling (RR) method for smooth nonconvex optimization problems with a finite-sum structure. Though this method is widely utilized in practice such as the training of neural networks, its convergence behavior is only understood in several limited settings. In this paper, under the well-known Kurdyka-Lojasiewicz (KL) inequality, we establish strong limit-point convergence results for RR with appropriate diminishing step sizes, namely, the whole sequence of iterates generated by RR is convergent and converges to a single stationary point in an almost sure sense. In addition, we derive the corresponding rate of convergence, depending on the KL exponent and the suitably selected diminishing step sizes. When the KL exponent lies in

[0,\frac12]

, the convergence is at a rate of

\mathcal{O}(t^{-1})

with

t

counting the iteration number. When the KL exponent belongs to

(\frac12,1)

, our derived convergence rate is of the form

\mathcal{O}(t^{-q})

with

q\in (0,1)

depending on the KL exponent. The standard KL inequality-based convergence analysis framework only applies to algorithms with a certain descent property. We conduct a novel convergence analysis for the non-descent RR method with diminishing step sizes based on the KL inequality, which generalizes the standard KL framework. We summarize our main steps and core ideas in an informal analysis framework, which is of independent interest. As a direct application of this framework, we also establish similar strong limit-point convergence results for the reshuffled proximal point method.Comment: 23 page

arXiv.org e-Print Archive

Global stability of first-order methods for coercive tame functions

Author: Josz Cédric
Lai Lexiao
Publication venue
Publication date: 01/08/2023
Field of study

We consider first-order methods with constant step size for minimizing locally Lipschitz coercive functions that are tame in an o-minimal structure on the real field. We prove that if the method is approximated by subgradient trajectories, then the iterates eventually remain in a neighborhood of a connected component of the set of critical points. Under suitable method-dependent regularity assumptions, this result applies to the subgradient method with momentum, the stochastic subgradient method with random reshuffling and momentum, and the random-permutations cyclic coordinate descent method.Comment: 30 pages, 1 figur

arXiv.org e-Print Archive

Distributed stochastic proximal algorithm with random reshuffling for non-smooth finite-sum optimization

Author: Chen Jie
Jiang Xia
Sun Jian
Xie Lihua
Zeng Xianlin
Publication venue
Publication date: 10/10/2022
Field of study

The non-smooth finite-sum minimization is a fundamental problem in machine learning. This paper develops a distributed stochastic proximal-gradient algorithm with random reshuffling to solve the finite-sum minimization over time-varying multi-agent networks. The objective function is a sum of differentiable convex functions and non-smooth regularization. Each agent in the network updates local variables with a constant step-size by local information and cooperates to seek an optimal solution. We prove that local variable estimates generated by the proposed algorithm achieve consensus and are attracted to a neighborhood of the optimal solution in expectation with an

\mathcal{O}(\frac{1}{T}+\frac{1}{\sqrt{T}})

convergence rate, where

T

is the total number of iterations. Finally, some comparative simulations are provided to verify the convergence performance of the proposed algorithm.Comment: 15 pages, 7 figure

arXiv.org e-Print Archive

Skyrmion Gas Manipulation for Probabilistic Computing

Author: Araujo Flavio Abreu
Bessiere Perre
Cros Vincent
Droulez Jacques
Grollier Julie
Kim Joo-Von
Pinna Daniele
Querlioz Damien
Publication venue: 'American Physical Society (APS)'
Publication date: 19/09/2017
Field of study

The topologically protected magnetic spin configurations known as skyrmions offer promising applications due to their stability, mobility and localization. In this work, we emphasize how to leverage the thermally driven dynamics of an ensemble of such particles to perform computing tasks. We propose a device employing a skyrmion gas to reshuffle a random signal into an uncorrelated copy of itself. This is demonstrated by modelling the ensemble dynamics in a collective coordinate approach where skyrmion-skyrmion and skyrmion-boundary interactions are accounted for phenomenologically. Our numerical results are used to develop a proof-of-concept for an energy efficient (

\sim\mu\mathrm{W}

) device with a low area imprint (

\sim\mu\mathrm{m}^2

). Whereas its immediate application to stochastic computing circuit designs will be made apparent, we argue that its basic functionality, reminiscent of an integrate-and-fire neuron, qualifies it as a novel bio-inspired building block.Comment: 41 pages, 20 figure

arXiv.org e-Print Archive

DIAL UCLouvain