667 research outputs found
On the Equivalence Between Deep NADE and Generative Stochastic Networks
Neural Autoregressive Distribution Estimators (NADEs) have recently been
shown as successful alternatives for modeling high dimensional multimodal
distributions. One issue associated with NADEs is that they rely on a
particular order of factorization for . This issue has been
recently addressed by a variant of NADE called Orderless NADEs and its deeper
version, Deep Orderless NADE. Orderless NADEs are trained based on a criterion
that stochastically maximizes with all possible orders of
factorizations. Unfortunately, ancestral sampling from deep NADE is very
expensive, corresponding to running through a neural net separately predicting
each of the visible variables given some others. This work makes a connection
between this criterion and the training criterion for Generative Stochastic
Networks (GSNs). It shows that training NADEs in this way also trains a GSN,
which defines a Markov chain associated with the NADE model. Based on this
connection, we show an alternative way to sample from a trained Orderless NADE
that allows to trade-off computing time and quality of the samples: a 3 to
10-fold speedup (taking into account the waste due to correlations between
consecutive samples of the chain) can be obtained without noticeably reducing
the quality of the samples. This is achieved using a novel sampling procedure
for GSNs called annealed GSN sampling, similar to tempering methods that
combines fast mixing (obtained thanks to steps at high noise levels) with
accurate samples (obtained thanks to steps at low noise levels).Comment: ECML/PKDD 201
Bayesian Nonparametric Inverse Reinforcement Learning
Inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given the transition function and a set of observed demonstrations in the form of state-action pairs. Current IRL algorithms attempt to find a single reward function which explains the entire observation set. In practice, this leads to a computationally-costly search over a large (typically infinite) space of complex reward functions. This paper proposes the notion that if the observations can be partitioned into smaller groups, a class of much simpler reward functions can be used to explain each group. The proposed method uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Experimental results are given for simple examples showing comparable performance to other IRL algorithms in nominal situations. Moreover, the proposed method handles cyclic tasks (where the agent begins and ends in the same state) that would break existing algorithms without modification. Finally, the new algorithm has a fundamentally different structure than previous methods, making it more computationally efficient in a real-world learning scenario where the state space is large but the demonstration set is small
Nonparametric Hierarchical Clustering of Functional Data
In this paper, we deal with the problem of curves clustering. We propose a
nonparametric method which partitions the curves into clusters and discretizes
the dimensions of the curve points into intervals. The cross-product of these
partitions forms a data-grid which is obtained using a Bayesian model selection
approach while making no assumptions regarding the curves. Finally, a
post-processing technique, aiming at reducing the number of clusters in order
to improve the interpretability of the clustering, is proposed. It consists in
optimally merging the clusters step by step, which corresponds to an
agglomerative hierarchical classification whose dissimilarity measure is the
variation of the criterion. Interestingly this measure is none other than the
sum of the Kullback-Leibler divergences between clusters distributions before
and after the merges. The practical interest of the approach for functional
data exploratory analysis is presented and compared with an alternative
approach on an artificial and a real world data set
Structurama: Bayesian Inference of Population Structure
Structurama is a program for inferring population structure. Specifically, the program calculates the posterior probability of assigning individuals to different populations. The program takes as input a file containing the allelic information at some number of loci sampled from a collection of individuals. After reading a data file into computer memory, Structurama uses a Gibbs algorithm to sample assignments of individuals to populations. The program implements four different models: The number of populations can be considered fixed or a random variable with a Dirichlet process prior; moreover, the genotypes of the individuals in the analysis can be considered to come from a single population (no admixture) or as coming from several different populations (admixture). The output is a file of partitions of individuals to populations that were sampled by the Markov chain Monte Carlo algorithm. The partitions are sampled in proportion to their posterior probabilities. The program implements a number of ways to summarize the sampled partitions, including calculation of the ‘mean’ partition—a partition of the individuals to populations that minimizes the squared distance to the sampled partitions
Gaussian Processes in Machine Learning
We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters using the marginal likelihood. We explain the practical advantages of Gaussian Process and end with conclusions and a look at the current trends in GP work
Bayesian nonparametric models for name disambiguation and supervised learning
This thesis presents new Bayesian nonparametric models and approaches for their development,
for the problems of name disambiguation and supervised learning. Bayesian
nonparametric methods form an increasingly popular approach for solving problems
that demand a high amount of model flexibility. However, this field is relatively new,
and there are many areas that need further investigation. Previous work on Bayesian
nonparametrics has neither fully explored the problems of entity disambiguation and
supervised learning nor the advantages of nested hierarchical models. Entity disambiguation
is a widely encountered problem where different references need to be linked
to a real underlying entity. This problem is often unsupervised as there is no previously
known information about the entities. Further to this, effective use of Bayesian
nonparametrics offer a new approach to tackling supervised problems, which are frequently
encountered.
The main original contribution of this thesis is a set of new structured Dirichlet process
mixture models for name disambiguation and supervised learning that can also
have a wide range of applications. These models use techniques from Bayesian statistics,
including hierarchical and nested Dirichlet processes, generalised linear models,
Markov chain Monte Carlo methods and optimisation techniques such as BFGS. The
new models have tangible advantages over existing methods in the field as shown with
experiments on real-world datasets including citation databases and classification and
regression datasets.
I develop the unsupervised author-topic space model for author disambiguation that
uses free-text to perform disambiguation unlike traditional author disambiguation approaches.
The model incorporates a name variant model that is based on a nonparametric
Dirichlet language model. The model handles both novel unseen name variants and
can model the unknown authors of the text of the documents. Through this, the model
can disambiguate authors with no prior knowledge of the number of true authors in the
dataset. In addition, it can do this when the authors have identical names.
I use a model for nesting Dirichlet processes named the hybrid NDP-HDP. This
model allows Dirichlet processes to be clustered together and adds an additional level of
structure to the hierarchical Dirichlet process. I also develop a new hierarchical extension
to the hybrid NDP-HDP. I develop this model into the grouped author-topic model
for the entity disambiguation task. The grouped author-topic model uses clusters to model the co-occurrence of entities in documents, which can be interpreted as research
groups. Since this model does not require entities to be linked to specific words in a
document, it overcomes the problems of some existing author-topic models. The model
incorporates a new method for modelling name variants, so that domain-specific name
variant models can be used.
Lastly, I develop extensions to supervised latent Dirichlet allocation, a type of supervised
topic model. The keyword-supervised LDA model predicts document responses
more accurately by modelling the effect of individual words and their contexts directly.
The supervised HDP model has more model flexibility by using Bayesian nonparametrics
for supervised learning. These models are evaluated on a number of classification
and regression problems, and the results show that they outperform existing supervised
topic modelling approaches. The models can also be extended to use similar information
to the previous models, incorporating additional information such as entities and
document titles to improve prediction
Applying Bayesian Neural Networks to Separate Neutrino Events from Backgrounds in Reactor Neutrino Experiments
A toy detector has been designed to simulate central detectors in reactor
neutrino experiments in the paper. The samples of neutrino events and three
major backgrounds from the Monte-Carlo simulation of the toy detector are
generated in the signal region. The Bayesian Neural Networks(BNN) are applied
to separate neutrino events from backgrounds in reactor neutrino experiments.
As a result, the most neutrino events and uncorrelated background events in the
signal region can be identified with BNN, and the part events each of the fast
neutron and He/Li backgrounds in the signal region can be
identified with BNN. Then, the signal to noise ratio in the signal region is
enhanced with BNN. The neutrino discrimination increases with the increase of
the neutrino rate in the training sample. However, the background
discriminations decrease with the decrease of the background rate in the
training sample.Comment: 9 pages, 1 figures, 1 tabl
Bayesian solutions to the label switching problem
The label switching problem, the unidentifiability of the permutation of clusters or more generally latent variables, makes interpretation of results computed with MCMC sampling difficult. We introduce a fully Bayesian treatment of the permutations which performs better than alternatives. The method can be used to compute summaries of the posterior samples even for nonparametric Bayesian methods, for which no good solutions exist so far. Although being approximative in this case, the results are very promising. The summaries are intuitively appealing: A summarized cluster is defined as a set of points for which the likelihood of being in the same cluster is maximized
Improving Application of Bayesian Neural Networks to Discriminate Neutrino Events from Backgrounds in Reactor Neutrino Experiments
The application of Bayesian Neural Networks(BNN) to discriminate neutrino
events from backgrounds in reactor neutrino experiments has been described in
Ref.\cite{key-1}. In the paper, BNN are also used to identify neutrino events
in reactor neutrino experiments, but the numbers of photoelectrons received by
PMTs are used as inputs to BNN in the paper, not the reconstructed energy and
position of events. The samples of neutrino events and three major backgrounds
from the Monte-Carlo simulation of a toy detector are generated in the signal
region. Compared to the BNN method in Ref.\cite{key-1}, more He/Li
background and uncorrelated background in the signal region can be rejected by
the BNN method in the paper, but more fast neutron background events in the
signal region are unidentified using the BNN method in the paper. The
uncorrelated background to signal ratio and the He/Li background to
signal ratio are significantly improved using the BNN method in the paper in
comparison with the BNN method in Ref.\cite{key-1}. But the fast neutron
background to signal ratio in the signal region is a bit larger than the one in
Ref.\cite{key-1}.Comment: 9 pages, 1 figure and 1 table, accepted by Journal of Instrumentatio
- …