12 research outputs found

    Weighted Contrastive Divergence

    Get PDF
    Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this way one has to resort to approximation schemes for the evaluation of the gradient. This is the case of Restricted Boltzmann Machines (RBM) and its learning algorithm Contrastive Divergence (CD). It is well-known that CD has a number of shortcomings, and its approximation to the gradient has several drawbacks. Overcoming these defects has been the basis of much research and new algorithms have been devised, such as persistent CD. In this manuscript we propose a new algorithm that we call Weighted CD (WCD), built from small modifications of the negative phase in standard CD. However small these modifications may be, experimental work reported in this paper suggest that WCD provides a significant improvement over standard CD and persistent CD at a small additional computational cost

    A comparative study of different gradient approximations for Restricted Boltzmann Machines

    Get PDF
    This project consists of the theoretical study of Restricted Boltzmann Machines(RBMs) and focuses on the gradient approximations of RBMs. RBMs suffer from the dilemma of accurate learning with the exact gradient. Based on Contrastive Divergence(CD) and Markov Chain Monte Carlo(MCMC), CD-k, an efficient algorithm of approximating the gradients, is proposed and now it becomes the mainstream to train RBMs. In order to improve the algorithm efficiency and mitigate the bias existing in the approximation, many CD-related algorithms have emerged afterwards, such as Persistent Contrastive Divergence(PCD) and Weighted Contrastive Divergence(WCD). In this project the comprehensive comparison of the gradient approximation algorithms is presented, mainly including CD, PCD, WCD. The experimental results indicate that among all the conducted algorithms, WCD has the fastest and best convergence for parameter learning. Increasing the Gibbs sampling time and adding a persistent chain in CD-related can enhance the performance and alleviate the bias in the approximation, also taking advantage of Parallel Tempering can further improve the results. Moreover, the cosine similarity of approximating gradients and exact gradients is studied and it proves that CD series algorithms and WCD series algorithms are heterogeneous. The general conclusions in this project can be the reference when training RBMs

    How to Center Deep Boltzmann Machines

    Get PDF
    Abstract This work analyzes centered Restricted Boltzmann Machines (RBMs) and centered Deep Boltzmann Machines (DBMs), where centering is done by subtracting offset values from visible and hidden variables. We show analytically that (i) centered and normal Boltzmann Machines (BMs) and thus RBMs and DBMs are different parameterizations of the same model class, such that any normal BM/RBM/DBM can be transformed to an equivalent centered BM/RBM/DBM and vice versa, and that this equivalence generalizes to artificial neural networks in general, (ii) the expected performance of centered binary BMs/RBMs/DBMs is invariant under simultaneous flip of data and offsets, for any offset value in the range of zero to one, (iii) centering can be reformulated as a different update rule for normal BMs/RBMs/DBMs, and (iv) using the enhanced gradient is equivalent to setting the offset values to the average over model and data mean. Furthermore, we present numerical simulations suggesting that (i) optimal generative performance is achieved by subtracting mean values from visible as well as hidden variables, (ii) centered binary RBMs/DBMs reach significantly higher log-likelihood values than normal binary RBMs/DBMs, (iii) centering variants whose offsets depend on the model mean, like the enhanced gradient, suffer from severe divergence problems, (iv) learning is stabilized if an exponentially moving average over the batch means is used for the offset values instead of the current batch mean, which also prevents the enhanced gradient from severe divergence, (v) on a similar level of log-likelihood values centered binary RBMs/DBMs have smaller weights and bigger bias parameters than normal binary RBMs/DBMs, (vi) centering leads to an update direction that is closer to the natural gradient, which is extremely efficient for training as we show for small binary RBMs, (vii) centering eliminates the need for greedy layer-wise pre-training of DBMs, which often even deteriorates the results independently of whether centering is used or not, and (ix) centering is also beneficial for auto encoders

    Neural networks and quantum many-body physics: exploring reciprocal benefits.

    Get PDF
    One of the main reasons why the physics of quantum many-body systems is hard lies in the curse of dimensionality: The number of states of such systems increases exponentially with the number of degrees of freedom involved. As a result, computations for realistic systems become intractable, and even numerical methods are limited to comparably small system sizes. Many efforts in modern physics research are therefore concerned with finding efficient representations of quantum states and clever approximations schemes that would allow them to characterize physical systems of interest. Meanwhile, Deep Learning (DL) has solved many non-scientific problems that have been unaccessible to conventional methods for a similar reason. The concept underlying DL is to extract knowledge from data by identifying patterns and regularities. The remarkable success of DL has excited many physicists about the prospect of leveraging its power to solve intractable problems in physics. At the same time, DL turned out to be an interesting complex many-body problem in itself. In contrast to its widespread empirical applications, the theoretical foundation of DL is strongly underdeveloped. In particular, as long as its decision-making process and result interpretability remain opaque, DL can not claim the status of a scientific tool. In this thesis, I explore the interface between DL and quantum many-body physics, and investigate DL both as a tool and as a subject of study. The first project presented here is a theory-based study of a fundamental open question about the role of width and the number of parameters in deep neural networks. In this work, we consider a DL setup for the image recognition task on standard benchmarking datasets. We combine controlled experiments with a theoretical analysis, including analytical calculations for a toy model. The other three works focus on the application of Restricted Boltzmann Machines as generative models for the task of wavefunction reconstruction from measurement data on a quantum many-body system. First, we implement this approach as a software package, making it available as a tool for experimentalists. Following the idea that physics problems can be used to characterize DL tools, we then use our extensive knowledge of this setup to conduct a systematic study of how the RBM complexity scales with the complexity of the physical system. Finally, in a follow-up study we focus on the effects of parameter pruning techniques on the RBM and its scaling behavior

    Augmenting Quantum Mechanics with Artificial Intelligence

    Get PDF
    The simulation of quantum matter with classical hardware plays a central role in the discovery and development of quantum many-body systems, with far-reaching implications in condensed matter physics and quantum technologies. In general, efficient and sophisticated algorithms are required to overcome the severe challenge posed by the exponential scaling of the Hilbert space of quantum systems. In contrast, hardware built with quantum bits of information are inherently capable of efficiently finding solutions of quantum many-body problems. While a universal and scalable quantum computer is still beyond the horizon, recent advances in qubit manufacturing and coherent control of synthetic quantum matter are leading to a new generation of intermediate scale quantum hardware. The complexity underlying quantum many-body systems closely resembles the one encountered in many problems in the world of information and technology. In both contexts, the complexity stems from a large number of interacting degrees of freedom. A powerful strategy in the latter scenario is machine learning, a subfield of artificial intelligence where large amounts of data are used to extract relevant features and patterns. In particular, artificial neural networks have been demonstrated to be capable of discovering low-dimensional representations of complex objects from high-dimensional dataset, leading to the profound technological revolution we all witness in our daily life. In this Thesis, we envision a new paradigm for scientific discovery in quantum physics. On the one hand, we have the essentially unlimited data generated with the increasing amount of highly controllable quantum hardware. On the other hand, we have a set of powerful algorithms that efficiently capture non-trivial correlations from high-dimensional data. Therefore, we fully embrace this data-driven approach to quantum mechanics, and anticipate new exciting possibilities in the field of quantum many-body physics and quantum information science. We revive a powerful stochastic neural network called a restricted Boltzmann machine, which slowly moved out of fashion after playing a central role in the machine learning revolution of the early 2010s. We introduce a neural-network representation of quantum states based on this generative model. We propose a set of algorithms to reconstruct unknown quantum states from measurement data and numerically demonstrate their potential, with important implications for current experiments. These include the reconstruction of experimentally inaccessible properties, such as entanglement, and diagnostics to determine sources of noise. Furthermore, we introduce a machine learning framework for quantum error correction, where a neural network learns the best decoding strategy directly from data. We expect that the full integration between quantum hardware and artificial intelligence will become the gold standard, and will drive the world into the era of fault-tolerant quantum computing and large-scale quantum simulations
    corecore