84,848 research outputs found
Semantic Information G Theory and Logical Bayesian Inference for Machine Learning
An important problem with machine learning is that when label number n\u3e2, it is very difficult to construct and optimize a group of learning functions, and we wish that optimized learning functions are still useful when prior distribution P(x) (where x is an instance) is changed. To resolve this problem, the semantic information G theory, Logical Bayesian Inference (LBI), and a group of Channel Matching (CM) algorithms together form a systematic solution. MultilabelMultilabel A semantic channel in the G theory consists of a group of truth functions or membership functions. In comparison with likelihood functions, Bayesian posteriors, and Logistic functions used by popular methods, membership functions can be more conveniently used as learning functions without the above problem. In Logical Bayesian Inference (LBI), every label’s learning is independent. For Multilabel learning, we can directly obtain a group of optimized membership functions from a big enough sample with labels, without preparing different samples for different labels. A group of Channel Matching (CM) algorithms are developed for machine learning. For the Maximum Mutual Information (MMI) classification of three classes with Gaussian distributions on a two-dimensional feature space, 2-3 iterations can make mutual information between three classes and three labels surpass 99% of the MMI for most initial partitions. For mixture models, the Expectation-Maxmization (EM) algorithm is improved and becomes the CM-EM algorithm, which can outperform the EM algorithm when mixture ratios are imbalanced, or local convergence exists. The CM iteration algorithm needs to combine neural networks for MMI classifications on high-dimensional feature spaces. LBI needs further studies for the unification of statistics and logic
Bayesian neural networks via MCMC: a Python-based tutorial
Bayesian inference provides a methodology for parameter estimation and
uncertainty quantification in machine learning and deep learning methods.
Variational inference and Markov Chain Monte-Carlo (MCMC) sampling techniques
are used to implement Bayesian inference. In the past three decades, MCMC
methods have faced a number of challenges in being adapted to larger models
(such as in deep learning) and big data problems. Advanced proposals that
incorporate gradients, such as a Langevin proposal distribution, provide a
means to address some of the limitations of MCMC sampling for Bayesian neural
networks. Furthermore, MCMC methods have typically been constrained to use by
statisticians and are still not prominent among deep learning researchers. We
present a tutorial for MCMC methods that covers simple Bayesian linear and
logistic models, and Bayesian neural networks. The aim of this tutorial is to
bridge the gap between theory and implementation via coding, given a general
sparsity of libraries and tutorials to this end. This tutorial provides code in
Python with data and instructions that enable their use and extension. We
provide results for some benchmark problems showing the strengths and
weaknesses of implementing the respective Bayesian models via MCMC. We
highlight the challenges in sampling multi-modal posterior distributions in
particular for the case of Bayesian neural networks, and the need for further
improvement of convergence diagnosis
Exploiting the Statistics of Learning and Inference
When dealing with datasets containing a billion instances or with simulations
that require a supercomputer to execute, computational resources become part of
the equation. We can improve the efficiency of learning and inference by
exploiting their inherent statistical nature. We propose algorithms that
exploit the redundancy of data relative to a model by subsampling data-cases
for every update and reasoning about the uncertainty created in this process.
In the context of learning we propose to test for the probability that a
stochastically estimated gradient points more than 180 degrees in the wrong
direction. In the context of MCMC sampling we use stochastic gradients to
improve the efficiency of MCMC updates, and hypothesis tests based on adaptive
mini-batches to decide whether to accept or reject a proposed parameter update.
Finally, we argue that in the context of likelihood free MCMC one needs to
store all the information revealed by all simulations, for instance in a
Gaussian process. We conclude that Bayesian methods will remain to play a
crucial role in the era of big data and big simulations, but only if we
overcome a number of computational challenges.Comment: Proceedings of the NIPS workshop on "Probabilistic Models for Big
Data
Towards Gaussian Bayesian network fusion
Data sets are growing in complexity thanks to the increasing
facilities we have nowadays to both generate and store data. This poses
many challenges to machine learning that are leading to the proposal of
new methods and paradigms, in order to be able to deal with what is
nowadays referred to as Big Data. In this paper we propose a method
for the aggregation of different Bayesian network structures that have
been learned from separate data sets, as a first step towards mining data
sets that need to be partitioned in an horizontal way, i.e. with respect
to the instances, in order to be processed. Considerations that should be
taken into account when dealing with this situation are discussed. Scalable
learning of Bayesian networks is slowly emerging, and our method
constitutes one of the first insights into Gaussian Bayesian network aggregation
from different sources. Tested on synthetic data it obtains good
results that surpass those from individual learning. Future research will
be focused on expanding the method and testing more diverse data sets
- …