9,227 research outputs found

    A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data

    Full text link
    Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Another popular approach to model the multimodal data is through deep neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. First, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the learned hidden topic features and show how to employ it to learn a joint representation from image visual words, annotation words and class label information. We test our model on the LabelMe and UIUC-Sports data sets and show that it compares favorably to other topic models. Second, we propose a deep extension of our model and provide an efficient way of training the deep model. Experimental results show that our deep model outperforms its shallow version and reaches state-of-the-art performance on the Multimedia Information Retrieval (MIR) Flickr data set.Comment: 24 pages, 10 figures. A version has been accepted by TPAMI on Aug 4th, 2015. Add footnote about how to train the model in practice in Section 5.1. arXiv admin note: substantial text overlap with arXiv:1305.530

    Learning Generative ConvNets via Multi-grid Modeling and Sampling

    Full text link
    This paper proposes a multi-grid method for learning energy-based generative ConvNet models of images. For each grid, we learn an energy-based probabilistic model where the energy function is defined by a bottom-up convolutional neural network (ConvNet or CNN). Learning such a model requires generating synthesized examples from the model. Within each iteration of our learning algorithm, for each observed training image, we generate synthesized images at multiple grids by initializing the finite-step MCMC sampling from a minimal 1 x 1 version of the training image. The synthesized image at each subsequent grid is obtained by a finite-step MCMC initialized from the synthesized image generated at the previous coarser grid. After obtaining the synthesized examples, the parameters of the models at multiple grids are updated separately and simultaneously based on the differences between synthesized and observed examples. We show that this multi-grid method can learn realistic energy-based generative ConvNet models, and it outperforms the original contrastive divergence (CD) and persistent CD.Comment: CVPR 201

    SERKET: An Architecture for Connecting Stochastic Models to Realize a Large-Scale Cognitive Model

    Full text link
    To realize human-like robot intelligence, a large-scale cognitive architecture is required for robots to understand the environment through a variety of sensors with which they are equipped. In this paper, we propose a novel framework named Serket that enables the construction of a large-scale generative model and its inference easily by connecting sub-modules to allow the robots to acquire various capabilities through interaction with their environments and others. We consider that large-scale cognitive models can be constructed by connecting smaller fundamental models hierarchically while maintaining their programmatic independence. Moreover, connected modules are dependent on each other, and parameters are required to be optimized as a whole. Conventionally, the equations for parameter estimation have to be derived and implemented depending on the models. However, it becomes harder to derive and implement those of a larger scale model. To solve these problems, in this paper, we propose a method for parameter estimation by communicating the minimal parameters between various modules while maintaining their programmatic independence. Therefore, Serket makes it easy to construct large-scale models and estimate their parameters via the connection of modules. Experimental results demonstrated that the model can be constructed by connecting modules, the parameters can be optimized as a whole, and they are comparable with the original models that we have proposed

    Boosting Monte Carlo simulations of spin glasses using autoregressive neural networks

    Full text link
    The autoregressive neural networks are emerging as a powerful computational tool to solve relevant problems in classical and quantum mechanics. One of their appealing functionalities is that, after they have learned a probability distribution from a dataset, they allow exact and efficient sampling of typical system configurations. Here we employ a neural autoregressive distribution estimator (NADE) to boost Markov chain Monte Carlo (MCMC) simulations of a paradigmatic classical model of spin-glass theory, namely the two-dimensional Edwards-Anderson Hamiltonian. We show that a NADE can be trained to accurately mimic the Boltzmann distribution using unsupervised learning from system configurations generated using standard MCMC algorithms. The trained NADE is then employed as smart proposal distribution for the Metropolis-Hastings algorithm. This allows us to perform efficient MCMC simulations, which provide unbiased results even if the expectation value corresponding to the probability distribution learned by the NADE is not exact. Notably, we implement a sequential tempering procedure, whereby a NADE trained at a higher temperature is iteratively employed as proposal distribution in a MCMC simulation run at a slightly lower temperature. This allows one to efficiently simulate the spin-glass model even in the low-temperature regime, avoiding the divergent correlation times that plague MCMC simulations driven by local-update algorithms. Furthermore, we show that the NADE-driven simulations quickly sample ground-state configurations, paving the way to their future utilization to tackle binary optimization problems.Comment: 13 pages, 14 figure
    corecore