23,050 research outputs found
The Deep Weight Prior
Bayesian inference is known to provide a general framework for incorporating
prior knowledge or specific properties into machine learning models via
carefully choosing a prior distribution. In this work, we propose a new type of
prior distributions for convolutional neural networks, deep weight prior (DWP),
that exploit generative models to encourage a specific structure of trained
convolutional filters e.g., spatial correlations of weights. We define DWP in
the form of an implicit distribution and propose a method for variational
inference with such type of implicit priors. In experiments, we show that DWP
improves the performance of Bayesian neural networks when training data are
limited, and initialization of weights with samples from DWP accelerates
training of conventional convolutional neural networks.Comment: TL;DR: The deep weight prior learns a generative model for kernels of
convolutional neural networks, that acts as a prior distribution while
training on new dataset
Comparison between Suitable Priors for Additive Bayesian Networks
Additive Bayesian networks are types of graphical models that extend the
usual Bayesian generalized linear model to multiple dependent variables through
the factorisation of the joint probability distribution of the underlying
variables. When fitting an ABN model, the choice of the prior of the parameters
is of crucial importance. If an inadequate prior - like a too weakly
informative one - is used, data separation and data sparsity lead to issues in
the model selection process. In this work a simulation study between two weakly
and a strongly informative priors is presented. As weakly informative prior we
use a zero mean Gaussian prior with a large variance, currently implemented in
the R-package abn. The second prior belongs to the Student's t-distribution,
specifically designed for logistic regressions and, finally, the strongly
informative prior is again Gaussian with mean equal to true parameter value and
a small variance. We compare the impact of these priors on the accuracy of the
learned additive Bayesian network in function of different parameters. We
create a simulation study to illustrate Lindley's paradox based on the prior
choice. We then conclude by highlighting the good performance of the
informative Student's t-prior and the limited impact of the Lindley's paradox.
Finally, suggestions for further developments are provided.Comment: 8 pages, 4 figure
- …