9,137 research outputs found
Big Learning with Bayesian Methods
Explosive growth in data and availability of cheap computing resources have
sparked increasing interest in Big learning, an emerging subfield that studies
scalable machine learning algorithms, systems, and applications with Big Data.
Bayesian methods represent one important class of statistic methods for machine
learning, with substantial recent developments on adaptive, flexible and
scalable Bayesian learning. This article provides a survey of the recent
advances in Big learning with Bayesian methods, termed Big Bayesian Learning,
including nonparametric Bayesian methods for adaptively inferring model
complexity, regularized Bayesian inference for improving the flexibility via
posterior regularization, and scalable algorithms and systems based on
stochastic subsampling and distributed computing for dealing with large-scale
applications.Comment: 21 pages, 6 figure
Overcoming Catastrophic Forgetting by Incremental Moment Matching
Catastrophic forgetting is a problem of neural networks that loses the
information of the first task after training the second task. Here, we propose
a method, i.e. incremental moment matching (IMM), to resolve this problem. IMM
incrementally matches the moment of the posterior distribution of the neural
network which is trained on the first and the second task, respectively. To
make the search space of posterior parameter smooth, the IMM procedure is
complemented by various transfer learning techniques including weight transfer,
L2-norm of the old and the new parameter, and a variant of dropout with the old
parameter. We analyze our approach on a variety of datasets including the
MNIST, CIFAR-10, Caltech-UCSD-Birds, and Lifelog datasets. The experimental
results show that IMM achieves state-of-the-art performance by balancing the
information between an old and a new network.Comment: Accepted for NIPS 201
Reconciling meta-learning and continual learning with online mixtures of tasks
Learning-to-learn or meta-learning leverages data-driven inductive bias to
increase the efficiency of learning on a novel task. This approach encounters
difficulty when transfer is not advantageous, for instance, when tasks are
considerably dissimilar or change over time. We use the connection between
gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet
process mixture of hierarchical Bayesian models over the parameters of an
arbitrary parametric model such as a neural network. In contrast to
consolidating inductive biases into a single set of hyperparameters, our
approach of task-dependent hyperparameter selection better handles latent
distribution shift, as demonstrated on a set of evolving, image-based, few-shot
learning benchmarks.Comment: updated experimental result
Automatic Posterior Transformation for Likelihood-Free Inference
How can one perform Bayesian inference on stochastic simulators with
intractable likelihoods? A recent approach is to learn the posterior from
adaptively proposed simulations using neural network-based conditional density
estimators. However, existing methods are limited to a narrow range of proposal
distributions or require importance weighting that can limit performance in
practice. Here we present automatic posterior transformation (APT), a new
sequential neural posterior estimation method for simulation-based inference.
APT can modify the posterior estimate using arbitrary, dynamically updated
proposals, and is compatible with powerful flow-based density estimators. It is
more flexible, scalable and efficient than previous simulation-based inference
techniques. APT can operate directly on high-dimensional time series and image
data, opening up new applications for likelihood-free inference
MaxEntropy Pursuit Variational Inference
One of the core problems in variational inference is a choice of approximate
posterior distribution. It is crucial to trade-off between efficient inference
with simple families as mean-field models and accuracy of inference. We propose
a variant of a greedy approximation of the posterior distribution with
tractable base learners. Using Max-Entropy approach, we obtain a well-defined
optimization problem. We demonstrate the ability of the method to capture
complex multimodal posterior via continual learning setting for neural
networks.Comment: 10 pages, 1 figur
Bayesian semiparametric modelling of contraceptive behavior in India via sequential logistic regressions
Family planning has been characterized by highly different strategic programs
in India, including method-specific contraceptive targets, coercive
sterilization, and more recent target-free approaches. These major changes in
family planning policies over time have motivated a considerable interest
towards assessing the effectiveness of the different programs, while
understanding which subsets of the population have not been properly addressed.
Current studies consider specific aspects of the above policies, including, for
example, the factors associated with the choice of alternative contraceptive
methods other than sterilization, for women using contraceptives. Although
these analyses produce relevant insights, they fail to provide a global
overview of the different family planning policies, and the determinants
underlying the contraceptive choices. Motivated by this consideration, we
propose a Bayesian semiparametric model relying on a reparameterization of the
multinomial probability mass function via a set of conditional Bernoulli
choices. The sequential binary structure is defined to be consistent with the
current family planning policies in India, and coherent with a reasonable
process characterizing the contraceptive choices. This combination of flexible
representations and careful reparameterizations allows a broader and
interpretable overview of the different policies and contraceptive preferences
in India, within a single model
A Recurrent Latent Variable Model for Sequential Data
In this paper, we explore the inclusion of latent random variables into the
dynamic hidden state of a recurrent neural network (RNN) by combining elements
of the variational autoencoder. We argue that through the use of high-level
latent random variables, the variational RNN (VRNN)1 can model the kind of
variability observed in highly structured sequential data such as natural
speech. We empirically evaluate the proposed model against related sequential
models on four speech datasets and one handwriting dataset. Our results show
the important roles that latent random variables can play in the RNN dynamic
hidden state
The Variational Gaussian Process
Variational inference is a powerful tool for approximate inference, and it
has been recently applied for representation learning with deep generative
models. We develop the variational Gaussian process (VGP), a Bayesian
nonparametric variational family, which adapts its shape to match complex
posterior distributions. The VGP generates approximate posterior samples by
generating latent inputs and warping them through random non-linear mappings;
the distribution over random mappings is learned during inference, enabling the
transformed outputs to adapt to varying complexity. We prove a universal
approximation theorem for the VGP, demonstrating its representative power for
learning any model. For inference we present a variational objective inspired
by auto-encoders and perform black box inference over a wide class of models.
The VGP achieves new state-of-the-art results for unsupervised learning,
inferring models such as the deep latent Gaussian model and the recently
proposed DRAW.Comment: Appears in International Conference on Learning Representations, 201
Bayesian sequential parameter estimation with a Laplace type approximation
A method for sequential inference of the fixed parameters of a dynamic latent
Gaussian models is proposed and evaluated that is based on the iterated Laplace
approximation. The method provides a useful trade-off between computational
performance and the accuracy of the approximation to the true posterior
distribution. Approximation corrections are shown to improve the accuracy of
the approximation in simulation studies. A population-based approach is also
shown to provide a more robust inference method
Automated Machine Learning on Big Data using Stochastic Algorithm Tuning
We introduce a means of automating machine learning (ML) for big data tasks,
by performing scalable stochastic Bayesian optimisation of ML algorithm
parameters and hyper-parameters. More often than not, the critical tuning of ML
algorithm parameters has relied on domain expertise from experts, along with
laborious hand-tuning, brute search or lengthy sampling runs. Against this
background, Bayesian optimisation is finding increasing use in automating
parameter tuning, making ML algorithms accessible even to non-experts. However,
the state of the art in Bayesian optimisation is incapable of scaling to the
large number of evaluations of algorithm performance required to fit realistic
models to complex, big data. We here describe a stochastic, sparse, Bayesian
optimisation strategy to solve this problem, using many thousands of noisy
evaluations of algorithm performance on subsets of data in order to effectively
train algorithms for big data. We provide a comprehensive benchmarking of
possible sparsification strategies for Bayesian optimisation, concluding that a
Nystrom approximation offers the best scaling and performance for real tasks.
Our proposed algorithm demonstrates substantial improvement over the state of
the art in tuning the parameters of a Gaussian Process time series prediction
task on real, big data
- …