Search CORE

13 research outputs found

The Deep Weight Prior

Author: Ashukha Arsenii
Atanov Andrei
Struminsky Kirill
Vetrov Dmitry
Welling Max
Publication venue
Publication date: 18/02/2019
Field of study

Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional filters e.g., spatial correlations of weights. We define DWP in the form of an implicit distribution and propose a method for variational inference with such type of implicit priors. In experiments, we show that DWP improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from DWP accelerates training of conventional convolutional neural networks.Comment: TL;DR: The deep weight prior learns a generative model for kernels of convolutional neural networks, that acts as a prior distribution while training on new dataset

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Generalized Separable Nonnegative Matrix Factorization

Author: Gillis Nicolas
Pan Junjun
Publication venue
Publication date: 01/01/2019
Field of study

Nonnegative matrix factorization (NMF) is a linear dimensionality technique for nonnegative data with applications such as image analysis, text mining, audio source separation and hyperspectral unmixing. Given a data matrix

M

and a factorization rank

r

, NMF looks for a nonnegative matrix

W

with

r

columns and a nonnegative matrix

H

with

r

rows such that

M \approx WH

. NMF is NP-hard to solve in general. However, it can be computed efficiently under the separability assumption which requires that the basis vectors appear as data points, that is, that there exists an index set

\mathcal{K}

such that

W = M(:,\mathcal{K})

. In this paper, we generalize the separability assumption: We only require that for each rank-one factor

W(:,k)H(k,:)

for

k=1,2,\dots,r

, either

W(:,k) = M(:,j)

for some

j

H(k,:) = M(i,:)

for some

i

. We refer to the corresponding problem as generalized separable NMF (GS-NMF). We discuss some properties of GS-NMF and propose a convex optimization model which we solve using a fast gradient method. We also propose a heuristic algorithm inspired by the successive projection algorithm. To verify the effectiveness of our methods, we compare them with several state-of-the-art separable NMF algorithms on synthetic, document and image data sets.Comment: 31 pages, 12 figures, 4 tables. We have added discussions about the identifiability of the model, we have modified the first synthetic experiment, we have clarified some aspects of the contributio

arXiv.org e-Print Archive

Crossref

Weighted Ensemble Self-Supervised Learning

Author: Alemi Alexander A.
Dillon Joshua V.
Fischer Ian
Ioffe Sergey
Morningstar Warren
Ruan Yangjun
Singh Saurabh
Publication venue
Publication date: 09/04/2023
Field of study

Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on ImageNet-1K, particularly in the few-shot setting. We explore several weighting schemes and find that those which increase the diversity of ensemble heads lead to better downstream evaluation results. Thorough experiments yield improved prior art baselines which our method still surpasses; e.g., our overall improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning.Comment: Accepted by ICLR 202

arXiv.org e-Print Archive