Recent experiments have shown that, often, when training a neural network
with gradient descent (GD) with a step size η, the operator norm of the
Hessian of the loss grows until it approximately reaches 2/η, after which
it fluctuates around this value. The quantity 2/η has been called the
"edge of stability" based on consideration of a local quadratic approximation
of the loss. We perform a similar calculation to arrive at an "edge of
stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has
been shown to improve its generalization. Unlike the case for GD, the resulting
SAM-edge depends on the norm of the gradient. Using three deep learning
training tasks, we see empirically that SAM operates on the edge of stability
identified by this analysis