62 research outputs found
Greedy Bayesian Posterior Approximation with Deep Ensembles
Ensembles of independently trained neural networks are a state-of-the-art
approach to estimate predictive uncertainty in Deep Learning, and can be
interpreted as an approximation of the posterior distribution via a mixture of
delta functions. The training of ensembles relies on non-convexity of the loss
landscape and random initialization of their individual members, making the
resulting posterior approximation uncontrolled. This paper proposes a novel and
principled method to tackle this limitation, minimizing an -divergence
between the true posterior and a kernel density estimator in a function space.
We analyze this objective from a combinatorial point of view, and show that it
is submodular with respect to mixture components for any . Subsequently, we
consider the problem of ensemble construction, and from the marginal gain of
the total objective, we derive a novel diversity term for training ensembles
greedily. The performance of our approach is demonstrated on computer vision
out-of-distribution detection benchmarks in a range of architectures trained on
multiple datasets. The source code of our method is publicly available at
https://github.com/MIPT-Oulu/greedy_ensembles_training
Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
IoU losses are surrogates that directly optimize the Jaccard index. In
semantic segmentation, leveraging IoU losses as part of the loss function is
shown to perform better with respect to the Jaccard index measure than
optimizing pixel-wise losses such as the cross-entropy loss alone. The most
notable IoU losses are the soft Jaccard loss and the Lovasz-Softmax loss.
However, these losses are incompatible with soft labels which are ubiquitous in
machine learning. In this paper, we propose Jaccard metric losses (JMLs), which
are identical to the soft Jaccard loss in a standard setting with hard labels,
but are compatible with soft labels. With JMLs, we study two of the most
popular use cases of soft labels: label smoothing and knowledge distillation.
With a variety of architectures, our experiments show significant improvements
over the cross-entropy loss on three semantic segmentation datasets
(Cityscapes, PASCAL VOC and DeepGlobe Land), and our simple approach
outperforms state-of-the-art knowledge distillation methods by a large margin.
Code is available at:
\href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.Comment: Submitted to ICML2023. Code is available at
https://github.com/zifuwanggg/JDTLosse
Surrogate Model Extension (SME): A Fast and Accurate Weight Update Attack on Federated Learning
In Federated Learning (FL) and many other distributed training frameworks,
collaborators can hold their private data locally and only share the network
weights trained with the local data after multiple iterations. Gradient
inversion is a family of privacy attacks that recovers data from its generated
gradients. Seemingly, FL can provide a degree of protection against gradient
inversion attacks on weight updates, since the gradient of a single step is
concealed by the accumulation of gradients over multiple local iterations. In
this work, we propose a principled way to extend gradient inversion attacks to
weight updates in FL, thereby better exposing weaknesses in the presumed
privacy protection inherent in FL. In particular, we propose a surrogate model
method based on the characteristic of two-dimensional gradient flow and
low-rank property of local updates. Our method largely boosts the ability of
gradient inversion attacks on weight updates containing many iterations and
achieves state-of-the-art (SOTA) performance. Additionally, our method runs up
to faster than the SOTA baseline in the common FL scenario. Our
work re-evaluates and highlights the privacy risk of sharing network weights.
Our code is available at
https://github.com/JunyiZhu-AI/surrogate_model_extension.Comment: Accepted at ICML 202
- …