An SDE for Modeling SAM: Theory and Insights

Biggio, Luca; Compagnoni, Enea Monzio; Kersting, Hans; Lucchi, Aurelien; Orvieto, Antonio; Proske, Frank Norbert

An SDE for Modeling SAM: Theory and Insights

Authors: Luca Biggio
Enea Monzio Compagnoni
Hans Kersting
Aurelien Lucchi
Antonio Orvieto
Frank Norbert Proske
Publication date: 4 June 2023
Publisher

Abstract

We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the learning rate). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones~--~by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments.Comment: Accepted at ICML 2023 (Poster

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2301.08203

Last time updated on 28/02/2023