Stochastic gradient descent (SGD) algorithm is the method of choice in many
machine learning tasks thanks to its scalability and efficiency in dealing with
large-scale problems. In this paper, we focus on the shuffling version of SGD
which matches the mainstream practical heuristics. We show the convergence to a
global solution of shuffling SGD for a class of non-convex functions under
over-parameterized settings. Our analysis employs more relaxed non-convex
assumptions than previous literature. Nevertheless, we maintain the desired
computational complexity as shuffling SGD has achieved in the general convex
setting.Comment: The 37th Conference on Neural Information Processing Systems (NeurIPS
2023