12 research outputs found
Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders
Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing
neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being
popular choices for cycling through random or single permutations of the
training data. However, the convergence properties of these algorithms in the
non-convex case are not fully understood. Existing results suggest that, in
realistic training scenarios where the number of epochs is smaller than the
training set size, RR may perform worse than SGD.
In this paper, we analyze a general SGD algorithm that allows for arbitrary
data orderings and show improved convergence rates for non-convex functions.
Specifically, our analysis reveals that SGD with random and single shuffling is
always faster or at least as good as classical SGD with replacement, regardless
of the number of iterations. Overall, our study highlights the benefits of
using SGD with random/single shuffling and provides new insights into its
convergence properties for non-convex optimization
Zeroth-Order Methods for Convex-Concave Minmax Problems: Applications to Decision-Dependent Risk Minimization
Min-max optimization is emerging as a key framework for analyzing problems of
robustness to strategically and adversarially generated data. We propose a
random reshuffling-based gradient free Optimistic Gradient Descent-Ascent
algorithm for solving convex-concave min-max problems with finite sum
structure. We prove that the algorithm enjoys the same convergence rate as that
of zeroth-order algorithms for convex minimization problems. We further
specialize the algorithm to solve distributionally robust, decision-dependent
learning problems, where gradient information is not readily available. Through
illustrative simulations, we observe that our proposed approach learns models
that are simultaneously robust against adversarial distribution shifts and
strategic decisions from the data sources, and outperforms existing methods
from the strategic classification literature.Comment: 32 pages, 5 figure