90 research outputs found

    Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders

    Full text link
    Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being popular choices for cycling through random or single permutations of the training data. However, the convergence properties of these algorithms in the non-convex case are not fully understood. Existing results suggest that, in realistic training scenarios where the number of epochs is smaller than the training set size, RR may perform worse than SGD. In this paper, we analyze a general SGD algorithm that allows for arbitrary data orderings and show improved convergence rates for non-convex functions. Specifically, our analysis reveals that SGD with random and single shuffling is always faster or at least as good as classical SGD with replacement, regardless of the number of iterations. Overall, our study highlights the benefits of using SGD with random/single shuffling and provides new insights into its convergence properties for non-convex optimization

    On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms

    Full text link
    Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings. Our analysis employs more relaxed non-convex assumptions than previous literature. Nevertheless, we maintain the desired computational complexity as shuffling SGD has achieved in the general convex setting.Comment: The 37th Conference on Neural Information Processing Systems (NeurIPS 2023

    Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond

    Full text link
    We study convergence lower bounds of without-replacement stochastic gradient descent (SGD) for solving smooth (strongly-)convex finite-sum minimization problems. Unlike most existing results focusing on final iterate lower bounds in terms of the number of components nn and the number of epochs KK, we seek bounds for arbitrary weighted average iterates that are tight in all factors including the condition number κ\kappa. For SGD with Random Reshuffling, we present lower bounds that have tighter κ\kappa dependencies than existing bounds. Our results are the first to perfectly close the gap between lower and upper bounds for weighted average iterates in both strongly-convex and convex cases. We also prove weighted average iterate lower bounds for arbitrary permutation-based SGD, which apply to all variants that carefully choose the best permutation. Our bounds improve the existing bounds in factors of nn and κ\kappa and thereby match the upper bounds shown for a recently proposed algorithm called GraB.Comment: 58 page
    • …
    corecore