22 research outputs found
PAC-Bayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime
While the decision-theoretic optimality of the Bayesian formalism under
correct model specification is well-known (Berger 2013), the Bayesian case
becomes less clear under model misspecification (Grunwald 2017; Ramamoorthi
2015; Fushiki 2005). To formally understand the consequences of Bayesian
misspecification, this work examines the relationship between posterior
predictive risk and its sensitivity to correct model assumptions, i.e., choice
of likelihood and prior. We present the multisample PAC-Bayes risk. This
risk is justified by theoretical analysis based on PAC-Bayes as well as
empirical study on a number of toy problems. The PAC-Bayes risk is
appealing in that it entails direct minimization of the Monte-Carlo
approximated posterior predictive risk yet recovers both the Bayesian formalism
as well as the MLE in its limits. Our work is heavily influenced by Masegosa
(2019); our contributions are to align training and generalization risks while
offering a tighter bound which empirically performs at least as well and
sometimes much better.Comment: Submitted to ICML 202
Towards Federated Learning Under Resource Constraints via Layer-wise Training and Depth Dropout
Large machine learning models trained on diverse data have recently seen
unprecedented success. Federated learning enables training on private data that
may otherwise be inaccessible, such as domain-specific datasets decentralized
across many clients. However, federated learning can be difficult to scale to
large models when clients have limited resources. This challenge often results
in a trade-off between model size and access to diverse data. To mitigate this
issue and facilitate training of large models on edge devices, we introduce a
simple yet effective strategy, Federated Layer-wise Learning, to simultaneously
reduce per-client memory, computation, and communication costs. Clients train
just a single layer each round, reducing resource costs considerably with
minimal performance degradation. We also introduce Federated Depth Dropout, a
complementary technique that randomly drops frozen layers during training, to
further reduce resource usage. Coupling these two techniques enables us to
effectively train significantly larger models on edge devices. Specifically, we
reduce training memory usage by 5x or more in federated self-supervised
representation learning and demonstrate that performance in downstream tasks is
comparable to conventional federated self-supervised learning
Federated Training of Dual Encoding Models on Small Non-IID Client Datasets
Dual encoding models that encode a pair of inputs are widely used for
representation learning. Many approaches train dual encoding models by
maximizing agreement between pairs of encodings on centralized training data.
However, in many scenarios, datasets are inherently decentralized across many
clients (user devices or organizations) due to privacy concerns, motivating
federated learning. In this work, we focus on federated training of dual
encoding models on decentralized data composed of many small, non-IID
(independent and identically distributed) client datasets. We show that
existing approaches that work well in centralized settings perform poorly when
naively adapted to this setting using federated averaging. We observe that, we
can simulate large-batch loss computation on individual clients for loss
functions that are based on encoding statistics. Based on this insight, we
propose a novel federated training approach, Distributed Cross Correlation
Optimization (DCCO), which trains dual encoding models using encoding
statistics aggregated across clients, without sharing individual data samples.
Our experimental results on two datasets demonstrate that the proposed DCCO
approach outperforms federated variants of existing approaches by a large
margin.Comment: ICLR 2023 Workshop on Pitfalls of Limited Data and Computation for
Trustworthy M
Weighted Ensemble Self-Supervised Learning
Ensembling has proven to be a powerful technique for boosting model
performance, uncertainty estimation, and robustness in supervised learning.
Advances in self-supervised learning (SSL) enable leveraging large unlabeled
corpora for state-of-the-art few-shot and supervised learning performance. In
this paper, we explore how ensemble methods can improve recent SSL techniques
by developing a framework that permits data-dependent weighted cross-entropy
losses. We refrain from ensembling the representation backbone; this choice
yields an efficient ensemble method that incurs a small training cost and
requires no architectural changes or computational overhead to downstream
evaluation. The effectiveness of our method is demonstrated with two
state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al.,
2022). Our method outperforms both in multiple evaluation metrics on
ImageNet-1K, particularly in the few-shot setting. We explore several weighting
schemes and find that those which increase the diversity of ensemble heads lead
to better downstream evaluation results. Thorough experiments yield improved
prior art baselines which our method still surpasses; e.g., our overall
improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.Comment: Accepted by ICLR 202