Search CORE

9,137 research outputs found

Big Learning with Bayesian Methods

Author: Chen Jianfei
Hu Wenbo
Zhang Bo
Zhu Jun
Publication venue
Publication date: 01/03/2017
Field of study

Explosive growth in data and availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms, systems, and applications with Big Data. Bayesian methods represent one important class of statistic methods for machine learning, with substantial recent developments on adaptive, flexible and scalable Bayesian learning. This article provides a survey of the recent advances in Big learning with Bayesian methods, termed Big Bayesian Learning, including nonparametric Bayesian methods for adaptively inferring model complexity, regularized Bayesian inference for improving the flexibility via posterior regularization, and scalable algorithms and systems based on stochastic subsampling and distributed computing for dealing with large-scale applications.Comment: 21 pages, 6 figure

arXiv.org e-Print Archive

Overcoming Catastrophic Forgetting by Incremental Moment Matching

Author: Ha Jung-Woo
Jun Jaehyun
Kim Jin-Hwa
Lee Sang-Woo
Zhang Byoung-Tak
Publication venue
Publication date: 30/01/2018
Field of study

Catastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task. Here, we propose a method, i.e. incremental moment matching (IMM), to resolve this problem. IMM incrementally matches the moment of the posterior distribution of the neural network which is trained on the first and the second task, respectively. To make the search space of posterior parameter smooth, the IMM procedure is complemented by various transfer learning techniques including weight transfer, L2-norm of the old and the new parameter, and a variant of dropout with the old parameter. We analyze our approach on a variety of datasets including the MNIST, CIFAR-10, Caltech-UCSD-Birds, and Lifelog datasets. The experimental results show that IMM achieves state-of-the-art performance by balancing the information between an old and a new network.Comment: Accepted for NIPS 201

arXiv.org e-Print Archive

Reconciling meta-learning and continual learning with online mixtures of tasks

Author: Grant Erin
Griffiths Thomas L.
Heller Katherine
Jerfel Ghassen
Publication venue
Publication date: 19/06/2019
Field of study

Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not advantageous, for instance, when tasks are considerably dissimilar or change over time. We use the connection between gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet process mixture of hierarchical Bayesian models over the parameters of an arbitrary parametric model such as a neural network. In contrast to consolidating inductive biases into a single set of hyperparameters, our approach of task-dependent hyperparameter selection better handles latent distribution shift, as demonstrated on a set of evolving, image-based, few-shot learning benchmarks.Comment: updated experimental result

arXiv.org e-Print Archive

Automatic Posterior Transformation for Likelihood-Free Inference

Author: Greenberg David S.
Macke Jakob H.
Nonnenmacher Marcel
Publication venue
Publication date: 17/05/2019
Field of study

How can one perform Bayesian inference on stochastic simulators with intractable likelihoods? A recent approach is to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators. However, existing methods are limited to a narrow range of proposal distributions or require importance weighting that can limit performance in practice. Here we present automatic posterior transformation (APT), a new sequential neural posterior estimation method for simulation-based inference. APT can modify the posterior estimate using arbitrary, dynamically updated proposals, and is compatible with powerful flow-based density estimators. It is more flexible, scalable and efficient than previous simulation-based inference techniques. APT can operate directly on high-dimensional time series and image data, opening up new applications for likelihood-free inference

arXiv.org e-Print Archive

MaxEntropy Pursuit Variational Inference

Author: Burnaev Evgeny
Egorov Evgenii
Kostoev Ruslan
Neklydov Kirill
Publication venue
Publication date: 19/05/2019
Field of study

One of the core problems in variational inference is a choice of approximate posterior distribution. It is crucial to trade-off between efficient inference with simple families as mean-field models and accuracy of inference. We propose a variant of a greedy approximation of the posterior distribution with tractable base learners. Using Max-Entropy approach, we obtain a well-defined optimization problem. We demonstrate the ability of the method to capture complex multimodal posterior via continual learning setting for neural networks.Comment: 10 pages, 1 figur

arXiv.org e-Print Archive

Bayesian semiparametric modelling of contraceptive behavior in India via sequential logistic regressions

Author: Durante Daniele
Rigon Tommaso
Torelli Nicola
Publication venue: 'Wiley'
Publication date: 31/07/2017
Field of study

Family planning has been characterized by highly different strategic programs in India, including method-specific contraceptive targets, coercive sterilization, and more recent target-free approaches. These major changes in family planning policies over time have motivated a considerable interest towards assessing the effectiveness of the different programs, while understanding which subsets of the population have not been properly addressed. Current studies consider specific aspects of the above policies, including, for example, the factors associated with the choice of alternative contraceptive methods other than sterilization, for women using contraceptives. Although these analyses produce relevant insights, they fail to provide a global overview of the different family planning policies, and the determinants underlying the contraceptive choices. Motivated by this consideration, we propose a Bayesian semiparametric model relying on a reparameterization of the multinomial probability mass function via a set of conditional Bernoulli choices. The sequential binary structure is defined to be consistent with the current family planning policies in India, and coherent with a reasonable process characterizing the contraceptive choices. This combination of flexible representations and careful reparameterizations allows a broader and interpretable overview of the different policies and contraceptive preferences in India, within a single model

arXiv.org e-Print Archive

A Recurrent Latent Variable Model for Sequential Data

Author: Bengio Yoshua
Chung Junyoung
Courville Aaron
Dinh Laurent
Goel Kratarth
Kastner Kyle
Publication venue
Publication date: 06/04/2016
Field of study

In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamic hidden state

arXiv.org e-Print Archive

The Variational Gaussian Process

Author: Blei David M.
Ranganath Rajesh
Tran Dustin
Publication venue
Publication date: 17/04/2016
Field of study

Variational inference is a powerful tool for approximate inference, and it has been recently applied for representation learning with deep generative models. We develop the variational Gaussian process (VGP), a Bayesian nonparametric variational family, which adapts its shape to match complex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity. We prove a universal approximation theorem for the VGP, demonstrating its representative power for learning any model. For inference we present a variational objective inspired by auto-encoders and perform black box inference over a wide class of models. The VGP achieves new state-of-the-art results for unsupervised learning, inferring models such as the deep latent Gaussian model and the recently proposed DRAW.Comment: Appears in International Conference on Learning Representations, 201

arXiv.org e-Print Archive

Bayesian sequential parameter estimation with a Laplace type approximation

Author: Mai Tiep
Wilson Simon
Publication venue
Publication date: 25/09/2015
Field of study

A method for sequential inference of the fixed parameters of a dynamic latent Gaussian models is proposed and evaluated that is based on the iterated Laplace approximation. The method provides a useful trade-off between computational performance and the accuracy of the approximation to the true posterior distribution. Approximation corrections are shown to improve the accuracy of the approximation in simulation studies. A population-based approach is also shown to provide a more robust inference method

arXiv.org e-Print Archive

Automated Machine Learning on Big Data using Stochastic Algorithm Tuning

Author: Nickson Thomas
Osborne Michael A
Reece Steven
Roberts Stephen J
Publication venue
Publication date: 30/07/2014
Field of study

We introduce a means of automating machine learning (ML) for big data tasks, by performing scalable stochastic Bayesian optimisation of ML algorithm parameters and hyper-parameters. More often than not, the critical tuning of ML algorithm parameters has relied on domain expertise from experts, along with laborious hand-tuning, brute search or lengthy sampling runs. Against this background, Bayesian optimisation is finding increasing use in automating parameter tuning, making ML algorithms accessible even to non-experts. However, the state of the art in Bayesian optimisation is incapable of scaling to the large number of evaluations of algorithm performance required to fit realistic models to complex, big data. We here describe a stochastic, sparse, Bayesian optimisation strategy to solve this problem, using many thousands of noisy evaluations of algorithm performance on subsets of data in order to effectively train algorithms for big data. We provide a comprehensive benchmarking of possible sparsification strategies for Bayesian optimisation, concluding that a Nystrom approximation offers the best scaling and performance for real tasks. Our proposed algorithm demonstrates substantial improvement over the state of the art in tuning the parameters of a Gaussian Process time series prediction task on real, big data

arXiv.org e-Print Archive