206 research outputs found
Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective
Existing Out-of-Distribution (OoD) detection methods address to detect OoD
samples from In-Distribution data (InD) mainly by exploring differences in
features, logits and gradients in Deep Neural Networks (DNNs). We in this work
propose a new perspective upon loss landscape and mode ensemble to investigate
OoD detection. In the optimization of DNNs, there exist many local optima in
the parameter space, or namely modes. Interestingly, we observe that these
independent modes, which all reach low-loss regions with InD data (training and
test data), yet yield significantly different loss landscapes with OoD data.
Such an observation provides a novel view to investigate the OoD detection from
the loss landscape and further suggests significantly fluctuating OoD detection
performance across these modes. For instance, FPR values of the RankFeat method
can range from 46.58% to 84.70% among 5 modes, showing uncertain detection
performance evaluations across independent modes. Motivated by such diversities
on OoD loss landscape across modes, we revisit the deep ensemble method for OoD
detection through mode ensemble, leading to improved performance and benefiting
the OoD detector with reduced variances. Extensive experiments covering varied
OoD detectors and network structures illustrate high variances across modes and
also validate the superiority of mode ensemble in boosting OoD detection. We
hope this work could attract attention in the view of independent modes in the
OoD loss landscape and more reliable evaluations on OoD detectors
Efficient and Transferable Adversarial Examples from Bayesian Neural Networks
An established way to improve the transferability of black-box evasion
attacks is to craft the adversarial examples on a surrogate ensemble model to
increase diversity. We argue that transferability is fundamentally related to
epistemic uncertainty. Based on a state-of-the-art Bayesian Deep Learning
technique, we propose a new method to efficiently build a surrogate by sampling
approximately from the posterior distribution of neural network weights, which
represents the belief about the value of each parameter. Our extensive
experiments on ImageNet and CIFAR-10 show that our approach improves the
transfer rates of four state-of-the-art attacks significantly (up to 62.1
percentage points), in both intra-architecture and inter-architecture cases. On
ImageNet, our approach can reach 94% of transfer rate while reducing training
computations from 11.6 to 2.4 exaflops, compared to an ensemble of
independently trained DNNs. Our vanilla surrogate achieves 87.5% of the time
higher transferability than 3 test-time techniques designed for this purpose.
Our work demonstrates that the way to train a surrogate has been overlooked
although it is an important element of transfer-based attacks. We are,
therefore, the first to review the effectiveness of several training methods in
increasing transferability. We provide new directions to better understand the
transferability phenomenon and offer a simple but strong baseline for future
work
Layerwise Linear Mode Connectivity
In the federated setup one performs an aggregation of separate local models
multiple times during training in order to obtain a stronger global model; most
often aggregation is a simple averaging of the parameters. Understanding when
and why averaging works in a non-convex setup, such as federated deep learning,
is an open challenge that hinders obtaining highly performant global models. On
i.i.d.~datasets federated deep learning with frequent averaging is successful.
The common understanding, however, is that during the independent training
models are drifting away from each other and thus averaging may not work
anymore after many local parameter updates. The problem can be seen from the
perspective of the loss surface: for points on a non-convex surface the average
can become arbitrarily bad. The assumption of local convexity, often used to
explain the success of federated averaging, contradicts to the empirical
evidence showing that high loss barriers exist between models from the very
beginning of the learning, even when training on the same data. Based on the
observation that the learning process evolves differently in different layers,
we investigate the barrier between models in a layerwise fashion. Our
conjecture is that barriers preventing from successful federated training are
caused by a particular layer or group of layers.Comment: HLD 2023: 1st Workshop on High-dimensional Learning Dynamics, ICML
2023, Hawaii, US
- …