14 research outputs found

    Weighted Contrastive Divergence

    Get PDF
    Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this way one has to resort to approximation schemes for the evaluation of the gradient. This is the case of Restricted Boltzmann Machines (RBM) and its learning algorithm Contrastive Divergence (CD). It is well-known that CD has a number of shortcomings, and its approximation to the gradient has several drawbacks. Overcoming these defects has been the basis of much research and new algorithms have been devised, such as persistent CD. In this manuscript we propose a new algorithm that we call Weighted CD (WCD), built from small modifications of the negative phase in standard CD. However small these modifications may be, experimental work reported in this paper suggest that WCD provides a significant improvement over standard CD and persistent CD at a small additional computational cost

    Deep Relational Model: A Joint Probabilistic Model with a Hierarchical Structure for Bidirectional Estimation of Image and Labels

    Get PDF
    Two different types of representations, such as an image and its manually-assigned corresponding labels, generally have complex and strong relationships to each other. In this paper, we represent such deep relationships between two different types of visible variables using an energy-based probabilistic model, called a deep relational model (DRM) to improve the prediction accuracies. A DRM stacks several layers from one visible layer on to another visible layer, sandwiching several hidden layers between them. As with restricted Boltzmann machines (RBMs) and deep Boltzmann machines (DBMs), all connections (weights) between two adjacent layers are undirected. During maximum likelihood (ML) -based training, the network attempts to capture the latent complex relationships between two visible variables with its deep architecture. Unlike deep neural networks (DNNs), 1) the DRM is a totally generative model and 2) allows us to generate one visible variables given the other, and 2) the parameters can be optimized in a probabilistic manner. The DRM can be also fine-tuned using DNNs, like deep belief nets (DBNs) or DBMs pre-training. This paper presents experiments conduced to evaluate the performance of a DRM in image recognition and generation tasks using the MNIST data set. In the image recognition experiments, we observed that the DRM outperformed DNNs even without fine-tuning. In the image generation experiments, we obtained much more realistic images generated from the DRM more than those from the other generative models

    Neighborhood-based stopping criterion for contrastive divergence

    Get PDF
    Restricted Boltzmann Machines (RBMs) are general unsupervised learning devices to ascertain generative models of data distributions. RBMs are often trained using the Contrastive Divergence (CD) learning algorithm, an approximation to the gradient of the data log-likelihood (logL). A simple reconstruction error is often used as a stopping criterion for CD, although several authors have raised doubts concerning the feasibility of this procedure. In many cases, the evolution curve of the reconstruction error is monotonic, while the logL is not, thus indicating that the former is not a good estimator of the optimal stopping point for learning. However, not many alternatives to the reconstruction error have been discussed in the literature. An estimation of the logL of the training data based on annealed importance sampling is feasible but computationally very expensive. In this manuscript, we present a simple and cheap alternative, based on the inclusion of information contained in neighboring states to the training set, as a stopping criterion for CD learning.Peer ReviewedPostprint (author's final draft

    A Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines

    Full text link
    Restricted Boltzmann machines (RBMs) are energy-based neural-networks which are commonly used as the building blocks for deep architectures neural architectures. In this work, we derive a deterministic framework for the training, evaluation, and use of RBMs based upon the Thouless-Anderson-Palmer (TAP) mean-field approximation of widely-connected systems with weak interactions coming from spin-glass theory. While the TAP approach has been extensively studied for fully-visible binary spin systems, our construction is generalized to latent-variable models, as well as to arbitrarily distributed real-valued spin systems with bounded support. In our numerical experiments, we demonstrate the effective deterministic training of our proposed models and are able to show interesting features of unsupervised learning which could not be directly observed with sampling. Additionally, we demonstrate how to utilize our TAP-based framework for leveraging trained RBMs as joint priors in denoising problems

    Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines

    No full text
    Abstract. Learning algorithms relying on Gibbs sampling based stochas-tic approximations of the log-likelihood gradient have become a common way to train Restricted Boltzmann Machines (RBMs). We study three of these methods, Contrastive Divergence (CD) and its refined variants Per-sistent CD (PCD) and Fast PCD (FPCD). As the approximations are biased, the maximum of the log-likelihood is not necessarily obtained. Recently, it has been shown that CD, PCD, and FPCD can even lead to a steady decrease of the log-likelihood during learning. Taking artificial data sets from the literature we study these divergence effects in more detail. Our results indicate that the log-likelihood seems to diverge es-pecially if the target distribution is difficult to learn for the RBM. The decrease of the likelihood can not be detected by an increase of the recon-struction error, which has been proposed as a stopping criterion for CD learning. Weight-decay with a carefully chosen weight-decay-parameter can prevent divergence
    corecore