43 research outputs found
Enhancing Sharpness-Aware Optimization Through Variance Suppression
Sharpness-aware minimization (SAM) has well documented merits in enhancing
generalization of deep neural networks, even without sizable data augmentation.
Embracing the geometry of the loss function, where neighborhoods of 'flat
minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing
the maximum loss caused by an adversary perturbing parameters within the
neighborhood. Although critical to account for sharpness of the loss function,
such an 'over-friendly adversary' can curtail the outmost level of
generalization. The novel approach of this contribution fosters stabilization
of adversaries through variance suppression (VaSSO) to avoid such friendliness.
VaSSO's provable stability safeguards its numerical improvement over SAM in
model-agnostic tasks, including image classification and machine translation.
In addition, experiments confirm that VaSSO endows SAM with robustness against
high levels of label noise.Comment: Accepted to NeurIPS 202
Conic Descent Redux for Memory-Efficient Optimization
Conic programming has well-documented merits in a gamut of signal processing
and machine learning tasks. This contribution revisits a recently developed
first-order conic descent (CD) solver, and advances it in three aspects:
intuition, theory, and algorithmic implementation. It is found that CD can
afford an intuitive geometric derivation that originates from the dual problem.
This opens the door to novel algorithmic designs, with a momentum variant of
CD, momentum conic descent (MOCO) exemplified. Diving deeper into the dual
behavior CD and MOCO reveals: i) an analytically justified stopping criterion;
and, ii) the potential to design preconditioners to speed up dual convergence.
Lastly, to scale semidefinite programming (SDP) especially for low-rank
solutions, a memory efficient MOCO variant is developed and numerically
validated
Meta-Learning with Versatile Loss Geometries for Fast Adaptation Using Mirror Descent
Utilizing task-invariant prior knowledge extracted from related tasks,
meta-learning is a principled framework that empowers learning a new task
especially when data records are limited. A fundamental challenge in
meta-learning is how to quickly "adapt" the extracted prior in order to train a
task-specific model within a few optimization steps. Existing approaches deal
with this challenge using a preconditioner that enhances convergence of the
per-task training process. Though effective in representing locally a quadratic
training loss, these simple linear preconditioners can hardly capture complex
loss geometries. The present contribution addresses this limitation by learning
a nonlinear mirror map, which induces a versatile distance metric to enable
capturing and optimizing a wide range of loss geometries, hence facilitating
the per-task training. Numerical tests on few-shot learning datasets
demonstrate the superior expressiveness and convergence of the advocated
approach.Comment: Accepted by 2024 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP-24
Scalable Bayesian Meta-Learning through Generalized Implicit Gradients
Meta-learning owns unique effectiveness and swiftness in tackling emerging
tasks with limited data. Its broad applicability is revealed by viewing it as a
bi-level optimization problem. The resultant algorithmic viewpoint however,
faces scalability issues when the inner-level optimization relies on
gradient-based iterations. Implicit differentiation has been considered to
alleviate this challenge, but it is restricted to an isotropic Gaussian prior,
and only favors deterministic meta-learning approaches. This work markedly
mitigates the scalability bottleneck by cross-fertilizing the benefits of
implicit differentiation to probabilistic Bayesian meta-learning. The novel
implicit Bayesian meta-learning (iBaML) method not only broadens the scope of
learnable priors, but also quantifies the associated uncertainty. Furthermore,
the ultimate complexity is well controlled regardless of the inner-level
optimization trajectory. Analytical error bounds are established to demonstrate
the precision and efficiency of the generalized implicit gradient over the
explicit one. Extensive numerical tests are also carried out to empirically
validate the performance of the proposed method.Comment: Accepted as a poster paper in the main track of Proceedings of the
37th AAAI Conference on Artificial Intelligence (AAAI-23