2,021 research outputs found
Discriminative conditional restricted Boltzmann machine for discrete choice and latent variable modelling
Conventional methods of estimating latent behaviour generally use attitudinal
questions which are subjective and these survey questions may not always be
available. We hypothesize that an alternative approach can be used for latent
variable estimation through an undirected graphical models. For instance,
non-parametric artificial neural networks. In this study, we explore the use of
generative non-parametric modelling methods to estimate latent variables from
prior choice distribution without the conventional use of measurement
indicators. A restricted Boltzmann machine is used to represent latent
behaviour factors by analyzing the relationship information between the
observed choices and explanatory variables. The algorithm is adapted for latent
behaviour analysis in discrete choice scenario and we use a graphical approach
to evaluate and understand the semantic meaning from estimated parameter vector
values. We illustrate our methodology on a financial instrument choice dataset
and perform statistical analysis on parameter sensitivity and stability. Our
findings show that through non-parametric statistical tests, we can extract
useful latent information on the behaviour of latent constructs through machine
learning methods and present strong and significant influence on the choice
process. Furthermore, our modelling framework shows robustness in input
variability through sampling and validation
Approximate Message Passing with Restricted Boltzmann Machine Priors
Approximate Message Passing (AMP) has been shown to be an excellent
statistical approach to signal inference and compressed sensing problem. The
AMP framework provides modularity in the choice of signal prior; here we
propose a hierarchical form of the Gauss-Bernouilli prior which utilizes a
Restricted Boltzmann Machine (RBM) trained on the signal support to push
reconstruction performance beyond that of simple iid priors for signals whose
support can be well represented by a trained binary RBM. We present and analyze
two methods of RBM factorization and demonstrate how these affect signal
reconstruction performance within our proposed algorithm. Finally, using the
MNIST handwritten digit dataset, we show experimentally that using an RBM
allows AMP to approach oracle-support performance
Optimal Renormalization Group Transformation from Information Theory
Recently a novel real-space RG algorithm was introduced, identifying the
relevant degrees of freedom of a system by maximizing an information-theoretic
quantity, the real-space mutual information (RSMI), with machine learning
methods. Motivated by this, we investigate the information theoretic properties
of coarse-graining procedures, for both translationally invariant and
disordered systems. We prove that a perfect RSMI coarse-graining does not
increase the range of interactions in the renormalized Hamiltonian, and, for
disordered systems, suppresses generation of correlations in the renormalized
disorder distribution, being in this sense optimal. We empirically verify decay
of those measures of complexity, as a function of information retained by the
RG, on the examples of arbitrary coarse-grainings of the clean and random Ising
chain. The results establish a direct and quantifiable connection between
properties of RG viewed as a compression scheme, and those of physical objects
i.e. Hamiltonians and disorder distributions. We also study the effect of
constraints on the number and type of coarse-grained degrees of freedom on a
generic RG procedure.Comment: Updated manuscript with new results on disordered system
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
We present a canonical way to turn any smooth parametric family of
probability distributions on an arbitrary search space into a
continuous-time black-box optimization method on , the
\emph{information-geometric optimization} (IGO) method. Invariance as a design
principle minimizes the number of arbitrary choices. The resulting \emph{IGO
flow} conducts the natural gradient ascent of an adaptive, time-dependent,
quantile-based transformation of the objective function. It makes no
assumptions on the objective function to be optimized.
The IGO method produces explicit IGO algorithms through time discretization.
It naturally recovers versions of known algorithms and offers a systematic way
to derive new ones. The cross-entropy method is recovered in a particular case,
and can be extended into a smoothed, parametrization-independent maximum
likelihood update (IGO-ML). For Gaussian distributions on , IGO
is related to natural evolution strategies (NES) and recovers a version of the
CMA-ES algorithm. For Bernoulli distributions on , we recover the
PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm
for optimization on . All these algorithms are unified under a
single information-geometric optimization framework.
Thanks to its intrinsic formulation, the IGO method achieves invariance under
reparametrization of the search space , under a change of parameters of the
probability distributions, and under increasing transformations of the
objective function.
Theory strongly suggests that IGO algorithms have minimal loss in diversity
during optimization, provided the initial diversity is high. First experiments
using restricted Boltzmann machines confirm this insight. Thus IGO seems to
provide, from information theory, an elegant way to spontaneously explore
several valleys of a fitness landscape in a single run.Comment: Final published versio
Psychophysical identity and free energy
An approach to implementing variational Bayesian inference in biological
systems is considered, under which the thermodynamic free energy of a system
directly encodes its variational free energy. In the case of the brain, this
assumption places constraints on the neuronal encoding of generative and
recognition densities, in particular requiring a stochastic population code.
The resulting relationship between thermodynamic and variational free energies
is prefigured in mind-brain identity theses in philosophy and in the Gestalt
hypothesis of psychophysical isomorphism.Comment: 22 pages; published as a research article on 8/5/2020 in Journal of
the Royal Society Interfac
Learning Generative Models with Visual Attention
Attention has long been proposed by psychologists as important for
effectively dealing with the enormous sensory stimulus available in the
neocortex. Inspired by the visual attention models in computational
neuroscience and the need of object-centric data for generative models, we
describe for generative learning framework using attentional mechanisms.
Attentional mechanisms can propagate signals from region of interest in a scene
to an aligned canonical representation, where generative modeling takes place.
By ignoring background clutter, generative models can concentrate their
resources on the object of interest. Our model is a proper graphical model
where the 2D Similarity transformation is a part of the top-down process. A
ConvNet is employed to provide good initializations during posterior inference
which is based on Hamiltonian Monte Carlo. Upon learning images of faces, our
model can robustly attend to face regions of novel test subjects. More
importantly, our model can learn generative models of new faces from a novel
dataset of large images where the face locations are not known.Comment: In the proceedings of Neural Information Processing Systems, 201
Network Plasticity as Bayesian Inference
General results from statistical learning theory suggest to understand not
only brain computations, but also brain plasticity as probabilistic inference.
But a model for that has been missing. We propose that inherently stochastic
features of synaptic plasticity and spine motility enable cortical networks of
neurons to carry out probabilistic inference by sampling from a posterior
distribution of network configurations. This model provides a viable
alternative to existing models that propose convergence of parameters to
maximum likelihood values. It explains how priors on weight distributions and
connection probabilities can be merged optimally with learned experience, how
cortical networks can generalize learned information so well to novel
experiences, and how they can compensate continuously for unforeseen
disturbances of the network. The resulting new theory of network plasticity
explains from a functional perspective a number of experimental data on
stochastic aspects of synaptic plasticity that previously appeared to be quite
puzzling.Comment: 33 pages, 5 figures, the supplement is available on the author's web
page http://www.igi.tugraz.at/kappe
- …