13 research outputs found
The Deep Weight Prior
Bayesian inference is known to provide a general framework for incorporating
prior knowledge or specific properties into machine learning models via
carefully choosing a prior distribution. In this work, we propose a new type of
prior distributions for convolutional neural networks, deep weight prior (DWP),
that exploit generative models to encourage a specific structure of trained
convolutional filters e.g., spatial correlations of weights. We define DWP in
the form of an implicit distribution and propose a method for variational
inference with such type of implicit priors. In experiments, we show that DWP
improves the performance of Bayesian neural networks when training data are
limited, and initialization of weights with samples from DWP accelerates
training of conventional convolutional neural networks.Comment: TL;DR: The deep weight prior learns a generative model for kernels of
convolutional neural networks, that acts as a prior distribution while
training on new dataset
Generalized Separable Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a linear dimensionality technique
for nonnegative data with applications such as image analysis, text mining,
audio source separation and hyperspectral unmixing. Given a data matrix and
a factorization rank , NMF looks for a nonnegative matrix with
columns and a nonnegative matrix with rows such that .
NMF is NP-hard to solve in general. However, it can be computed efficiently
under the separability assumption which requires that the basis vectors appear
as data points, that is, that there exists an index set such that
. In this paper, we generalize the separability
assumption: We only require that for each rank-one factor for
, either for some or for
some . We refer to the corresponding problem as generalized separable NMF
(GS-NMF). We discuss some properties of GS-NMF and propose a convex
optimization model which we solve using a fast gradient method. We also propose
a heuristic algorithm inspired by the successive projection algorithm. To
verify the effectiveness of our methods, we compare them with several
state-of-the-art separable NMF algorithms on synthetic, document and image data
sets.Comment: 31 pages, 12 figures, 4 tables. We have added discussions about the
identifiability of the model, we have modified the first synthetic
experiment, we have clarified some aspects of the contributio
Weighted Ensemble Self-Supervised Learning
Ensembling has proven to be a powerful technique for boosting model
performance, uncertainty estimation, and robustness in supervised learning.
Advances in self-supervised learning (SSL) enable leveraging large unlabeled
corpora for state-of-the-art few-shot and supervised learning performance. In
this paper, we explore how ensemble methods can improve recent SSL techniques
by developing a framework that permits data-dependent weighted cross-entropy
losses. We refrain from ensembling the representation backbone; this choice
yields an efficient ensemble method that incurs a small training cost and
requires no architectural changes or computational overhead to downstream
evaluation. The effectiveness of our method is demonstrated with two
state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al.,
2022). Our method outperforms both in multiple evaluation metrics on
ImageNet-1K, particularly in the few-shot setting. We explore several weighting
schemes and find that those which increase the diversity of ensemble heads lead
to better downstream evaluation results. Thorough experiments yield improved
prior art baselines which our method still surpasses; e.g., our overall
improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.Comment: Accepted by ICLR 202