512 research outputs found
Preconditioned Federated Learning
Federated Learning (FL) is a distributed machine learning approach that
enables model training in communication efficient and privacy-preserving
manner. The standard optimization method in FL is Federated Averaging (FedAvg),
which performs multiple local SGD steps between communication rounds. FedAvg
has been considered to lack algorithm adaptivity compared to modern first-order
adaptive optimizations. In this paper, we propose new communication-efficient
FL algortithms based on two adaptive frameworks: local adaptivity (PreFed) and
server-side adaptivity (PreFedOp). Proposed methods adopt adaptivity by using a
novel covariance matrix preconditioner. Theoretically, we provide convergence
guarantees for our algorithms. The empirical experiments show our methods
achieve state-of-the-art performances on both i.i.d. and non-i.i.d. settings.Comment: preprin
When Do Flat Minima Optimizers Work?
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been limited investigation into their properties and no systematic benchmarking of them across different domains. We fill this gap here by comparing the loss surfaces of the models trained with each method and through broad benchmarking across computer vision, natural language processing, and graph representation learning tasks. We discover several surprising findings from these results, which we hope will help researchers further improve deep learning optimizers, and practitioners identify the right optimizer for their problem
- …