2,021 research outputs found

    Discriminative conditional restricted Boltzmann machine for discrete choice and latent variable modelling

    Full text link
    Conventional methods of estimating latent behaviour generally use attitudinal questions which are subjective and these survey questions may not always be available. We hypothesize that an alternative approach can be used for latent variable estimation through an undirected graphical models. For instance, non-parametric artificial neural networks. In this study, we explore the use of generative non-parametric modelling methods to estimate latent variables from prior choice distribution without the conventional use of measurement indicators. A restricted Boltzmann machine is used to represent latent behaviour factors by analyzing the relationship information between the observed choices and explanatory variables. The algorithm is adapted for latent behaviour analysis in discrete choice scenario and we use a graphical approach to evaluate and understand the semantic meaning from estimated parameter vector values. We illustrate our methodology on a financial instrument choice dataset and perform statistical analysis on parameter sensitivity and stability. Our findings show that through non-parametric statistical tests, we can extract useful latent information on the behaviour of latent constructs through machine learning methods and present strong and significant influence on the choice process. Furthermore, our modelling framework shows robustness in input variability through sampling and validation

    Approximate Message Passing with Restricted Boltzmann Machine Priors

    Full text link
    Approximate Message Passing (AMP) has been shown to be an excellent statistical approach to signal inference and compressed sensing problem. The AMP framework provides modularity in the choice of signal prior; here we propose a hierarchical form of the Gauss-Bernouilli prior which utilizes a Restricted Boltzmann Machine (RBM) trained on the signal support to push reconstruction performance beyond that of simple iid priors for signals whose support can be well represented by a trained binary RBM. We present and analyze two methods of RBM factorization and demonstrate how these affect signal reconstruction performance within our proposed algorithm. Finally, using the MNIST handwritten digit dataset, we show experimentally that using an RBM allows AMP to approach oracle-support performance

    Optimal Renormalization Group Transformation from Information Theory

    Full text link
    Recently a novel real-space RG algorithm was introduced, identifying the relevant degrees of freedom of a system by maximizing an information-theoretic quantity, the real-space mutual information (RSMI), with machine learning methods. Motivated by this, we investigate the information theoretic properties of coarse-graining procedures, for both translationally invariant and disordered systems. We prove that a perfect RSMI coarse-graining does not increase the range of interactions in the renormalized Hamiltonian, and, for disordered systems, suppresses generation of correlations in the renormalized disorder distribution, being in this sense optimal. We empirically verify decay of those measures of complexity, as a function of information retained by the RG, on the examples of arbitrary coarse-grainings of the clean and random Ising chain. The results establish a direct and quantifiable connection between properties of RG viewed as a compression scheme, and those of physical objects i.e. Hamiltonians and disorder distributions. We also study the effect of constraints on the number and type of coarse-grained degrees of freedom on a generic RG procedure.Comment: Updated manuscript with new results on disordered system

    Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles

    Get PDF
    We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space XX into a continuous-time black-box optimization method on XX, the \emph{information-geometric optimization} (IGO) method. Invariance as a design principle minimizes the number of arbitrary choices. The resulting \emph{IGO flow} conducts the natural gradient ascent of an adaptive, time-dependent, quantile-based transformation of the objective function. It makes no assumptions on the objective function to be optimized. The IGO method produces explicit IGO algorithms through time discretization. It naturally recovers versions of known algorithms and offers a systematic way to derive new ones. The cross-entropy method is recovered in a particular case, and can be extended into a smoothed, parametrization-independent maximum likelihood update (IGO-ML). For Gaussian distributions on Rd\mathbb{R}^d, IGO is related to natural evolution strategies (NES) and recovers a version of the CMA-ES algorithm. For Bernoulli distributions on {0,1}d\{0,1\}^d, we recover the PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm for optimization on {0,1}d\{0,1\}^d. All these algorithms are unified under a single information-geometric optimization framework. Thanks to its intrinsic formulation, the IGO method achieves invariance under reparametrization of the search space XX, under a change of parameters of the probability distributions, and under increasing transformations of the objective function. Theory strongly suggests that IGO algorithms have minimal loss in diversity during optimization, provided the initial diversity is high. First experiments using restricted Boltzmann machines confirm this insight. Thus IGO seems to provide, from information theory, an elegant way to spontaneously explore several valleys of a fitness landscape in a single run.Comment: Final published versio

    Psychophysical identity and free energy

    Get PDF
    An approach to implementing variational Bayesian inference in biological systems is considered, under which the thermodynamic free energy of a system directly encodes its variational free energy. In the case of the brain, this assumption places constraints on the neuronal encoding of generative and recognition densities, in particular requiring a stochastic population code. The resulting relationship between thermodynamic and variational free energies is prefigured in mind-brain identity theses in philosophy and in the Gestalt hypothesis of psychophysical isomorphism.Comment: 22 pages; published as a research article on 8/5/2020 in Journal of the Royal Society Interfac

    Learning Generative Models with Visual Attention

    Full text link
    Attention has long been proposed by psychologists as important for effectively dealing with the enormous sensory stimulus available in the neocortex. Inspired by the visual attention models in computational neuroscience and the need of object-centric data for generative models, we describe for generative learning framework using attentional mechanisms. Attentional mechanisms can propagate signals from region of interest in a scene to an aligned canonical representation, where generative modeling takes place. By ignoring background clutter, generative models can concentrate their resources on the object of interest. Our model is a proper graphical model where the 2D Similarity transformation is a part of the top-down process. A ConvNet is employed to provide good initializations during posterior inference which is based on Hamiltonian Monte Carlo. Upon learning images of faces, our model can robustly attend to face regions of novel test subjects. More importantly, our model can learn generative models of new faces from a novel dataset of large images where the face locations are not known.Comment: In the proceedings of Neural Information Processing Systems, 201

    Network Plasticity as Bayesian Inference

    Full text link
    General results from statistical learning theory suggest to understand not only brain computations, but also brain plasticity as probabilistic inference. But a model for that has been missing. We propose that inherently stochastic features of synaptic plasticity and spine motility enable cortical networks of neurons to carry out probabilistic inference by sampling from a posterior distribution of network configurations. This model provides a viable alternative to existing models that propose convergence of parameters to maximum likelihood values. It explains how priors on weight distributions and connection probabilities can be merged optimally with learned experience, how cortical networks can generalize learned information so well to novel experiences, and how they can compensate continuously for unforeseen disturbances of the network. The resulting new theory of network plasticity explains from a functional perspective a number of experimental data on stochastic aspects of synaptic plasticity that previously appeared to be quite puzzling.Comment: 33 pages, 5 figures, the supplement is available on the author's web page http://www.igi.tugraz.at/kappe
    • …
    corecore