1,018 research outputs found

    Stochastic Learning under Random Reshuffling with Constant Step-sizes

    Full text link
    In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. This work focuses on the constant step-size case and strongly convex loss function. In this case, convergence is guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size O(μ2)O(\mu^2) around the minimizer rather than O(μ)O(\mu). Furthermore, we derive an analytical expression for the steady-state mean-square-error performance of the algorithm, which helps clarify in greater detail the differences between sampling with and without replacement. We also explain the periodic behavior that is observed in random reshuffling implementations

    On the performance of random reshuffling in stochastic learning

    Get PDF
    In empirical risk optimization, it has been observed that gradient descent implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data randomly and independently of each other. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. Some of these justifications rely on loose bounds, or their conclusions are dependent on the sample size which is problematic for large datasets. This work focuses on constant step-size adaptation, where the agent is continuously learning. In this case, convergence is only guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms independent sampling by showing that the iterate at the end of each run approaches a smaller neighborhood of size O(μ2) around the minimizer rather than O(μ). Simulation results illustrate the theoretical findings

    EXPERIMENTAL EVALUATION OF ITERATIVE METHODS FOR GAMES

    Get PDF
    Min-max optimization problems are a class of problems that are usually seen in game theory, machine learning, deep learning, and adversarial training. Deterministic gradient methods, such as gradient descent ascent (GDA), Extragradient (EG), and Hamiltonian Gradient Descent (HGD) are usually implemented to solve those problems. In large-scale setting, stochastic variants of those gradient methods are prefer because of their cheap per iteration cost. To further increase optimization efficiency, different improvements of deterministic and stochastic gradient methods are proposed, such as acceleration, variance reduction, and random reshuffling. In this work, we explore advanced iterative methods for solving min-max optimization problems, including deterministic gradient methods combined with accelerated methods and stochastic gradient methods combined with variance reduction and random reshuffling. We use experiments to evaluate the performance of the classical and advanced iterative methods on both bilinear and quadratic games. With an experimental approach, we show that the most advanced iterative methods in the deterministic and stochastic setting have improvements in iteration complexity

    Variance-Reduced Stochastic Learning by Networked Agents under Random Reshuffling

    Full text link
    A new amortized variance-reduced gradient (AVRG) algorithm was developed in \cite{ying2017convergence}, which has constant storage requirement in comparison to SAGA and balanced gradient computations in comparison to SVRG. One key advantage of the AVRG strategy is its amenability to decentralized implementations. In this work, we show how AVRG can be extended to the network case where multiple learning agents are assumed to be connected by a graph topology. In this scenario, each agent observes data that is spatially distributed and all agents are only allowed to communicate with direct neighbors. Moreover, the amount of data observed by the individual agents may differ drastically. For such situations, the balanced gradient computation property of AVRG becomes a real advantage in reducing idle time caused by unbalanced local data storage requirements, which is characteristic of other reduced-variance gradient algorithms. The resulting diffusion-AVRG algorithm is shown to have linear convergence to the exact solution, and is much more memory efficient than other alternative algorithms. In addition, we propose a mini-batch strategy to balance the communication and computation efficiency for diffusion-AVRG. When a proper batch size is employed, it is observed in simulations that diffusion-AVRG is more computationally efficient than exact diffusion or EXTRA while maintaining almost the same communication efficiency.Comment: 23 pages, 12 figures, submitted for publicatio

    Skyrmion Gas Manipulation for Probabilistic Computing

    Full text link
    The topologically protected magnetic spin configurations known as skyrmions offer promising applications due to their stability, mobility and localization. In this work, we emphasize how to leverage the thermally driven dynamics of an ensemble of such particles to perform computing tasks. We propose a device employing a skyrmion gas to reshuffle a random signal into an uncorrelated copy of itself. This is demonstrated by modelling the ensemble dynamics in a collective coordinate approach where skyrmion-skyrmion and skyrmion-boundary interactions are accounted for phenomenologically. Our numerical results are used to develop a proof-of-concept for an energy efficient (μW\sim\mu\mathrm{W}) device with a low area imprint (μm2\sim\mu\mathrm{m}^2). Whereas its immediate application to stochastic computing circuit designs will be made apparent, we argue that its basic functionality, reminiscent of an integrate-and-fire neuron, qualifies it as a novel bio-inspired building block.Comment: 41 pages, 20 figure
    corecore