Domain generalization (DG) seeks to learn robust models that generalize well
under unknown distribution shifts. As a critical aspect of DG, optimizer
selection has not been explored in depth. Currently, most DG methods follow the
widely used benchmark, DomainBed, and utilize Adam as the default optimizer for
all datasets. However, we reveal that Adam is not necessarily the optimal
choice for the majority of current DG methods and datasets. Based on the
perspective of loss landscape flatness, we propose a novel approach,
Flatness-Aware Minimization for Domain Generalization (FAD), which can
efficiently optimize both zeroth-order and first-order flatness simultaneously
for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD)
generalization error and convergence. Our experimental results demonstrate the
superiority of FAD on various DG datasets. Additionally, we confirm that FAD is
capable of discovering flatter optima in comparison to other zeroth-order and
first-order flatness-aware optimization methods.Comment: Accepted by ICCV202