7,695 research outputs found
Hardware emulation of stochastic p-bits for invertible logic
The common feature of nearly all logic and memory devices is that they make
use of stable units to represent 0's and 1's. A completely different paradigm
is based on three-terminal stochastic units which could be called "p-bits",
where the output is a random telegraphic signal continuously fluctuating
between 0 and 1 with a tunable mean. p-bits can be interconnected to receive
weighted contributions from others in a network, and these weighted
contributions can be chosen to not only solve problems of optimization and
inference but also to implement precise Boolean functions in an inverted mode.
This inverted operation of Boolean gates is particularly striking: They provide
inputs consistent to a given output along with unique outputs to a given set of
inputs. The existing demonstrations of accurate invertible logic are
intriguing, but will these striking properties observed in computer simulations
carry over to hardware implementations? This paper uses individual micro
controllers to emulate p-bits, and we present results for a 4-bit ripple carry
adder with 48 p-bits and a 4-bit multiplier with 46 p-bits working in inverted
mode as a factorizer. Our results constitute a first step towards implementing
p-bits with nano devices, like stochastic Magnetic Tunnel Junctions
Training Restricted Boltzmann Machines on Word Observations
The restricted Boltzmann machine (RBM) is a flexible tool for modeling
complex data, however there have been significant computational difficulties in
using RBMs to model high-dimensional multinomial observations. In natural
language processing applications, words are naturally modeled by K-ary discrete
distributions, where K is determined by the vocabulary size and can easily be
in the hundreds of thousands. The conventional approach to training RBMs on
word observations is limited because it requires sampling the states of K-way
softmax visible units during block Gibbs updates, an operation that takes time
linear in K. In this work, we address this issue by employing a more general
class of Markov chain Monte Carlo operators on the visible units, yielding
updates with computational complexity independent of K. We demonstrate the
success of our approach by training RBMs on hundreds of millions of word
n-grams using larger vocabularies than previously feasible and using the
learned features to improve performance on chunking and sentiment
classification tasks, achieving state-of-the-art results on the latter
Network Plasticity as Bayesian Inference
General results from statistical learning theory suggest to understand not
only brain computations, but also brain plasticity as probabilistic inference.
But a model for that has been missing. We propose that inherently stochastic
features of synaptic plasticity and spine motility enable cortical networks of
neurons to carry out probabilistic inference by sampling from a posterior
distribution of network configurations. This model provides a viable
alternative to existing models that propose convergence of parameters to
maximum likelihood values. It explains how priors on weight distributions and
connection probabilities can be merged optimally with learned experience, how
cortical networks can generalize learned information so well to novel
experiences, and how they can compensate continuously for unforeseen
disturbances of the network. The resulting new theory of network plasticity
explains from a functional perspective a number of experimental data on
stochastic aspects of synaptic plasticity that previously appeared to be quite
puzzling.Comment: 33 pages, 5 figures, the supplement is available on the author's web
page http://www.igi.tugraz.at/kappe
Weighted Contrastive Divergence
Learning algorithms for energy based Boltzmann architectures that rely on
gradient descent are in general computationally prohibitive, typically due to
the exponential number of terms involved in computing the partition function.
In this way one has to resort to approximation schemes for the evaluation of
the gradient. This is the case of Restricted Boltzmann Machines (RBM) and its
learning algorithm Contrastive Divergence (CD). It is well-known that CD has a
number of shortcomings, and its approximation to the gradient has several
drawbacks. Overcoming these defects has been the basis of much research and new
algorithms have been devised, such as persistent CD. In this manuscript we
propose a new algorithm that we call Weighted CD (WCD), built from small
modifications of the negative phase in standard CD. However small these
modifications may be, experimental work reported in this paper suggest that WCD
provides a significant improvement over standard CD and persistent CD at a
small additional computational cost
Distributed Training Large-Scale Deep Architectures
Scale of data and scale of computation infrastructures together enable the
current deep learning renaissance. However, training large-scale deep
architectures demands both algorithmic improvement and careful system
configuration. In this paper, we focus on employing the system approach to
speed up large-scale training. Via lessons learned from our routine
benchmarking effort, we first identify bottlenecks and overheads that hinter
data parallelism. We then devise guidelines that help practitioners to
configure an effective system and fine-tune parameters to achieve desired
speedup. Specifically, we develop a procedure for setting minibatch size and
choosing computation algorithms. We also derive lemmas for determining the
quantity of key components such as the number of GPUs and parameter servers.
Experiments and examples show that these guidelines help effectively speed up
large-scale deep learning training
Efficient construction of linear models in materials modeling and applications to force constant expansions
Linear models, such as force constant (FC) and cluster expansions, play a key
role in physics and materials science. While they can in principle be
parametrized using regression and feature selection approaches, the convergence
behavior of these techniques, in particular with respect to thermodynamic
properties is not well understood. Here, we therefore analyze the efficacy and
efficiency of several state-of-the-art regression and feature selection
methods, in particular in the context of FC extraction and the prediction of
different thermodynamic properties. Generic feature selection algorithms such
as recursive feature elimination with ordinary least-squares (OLS), automatic
relevance determination regression, and the adaptive least absolute shrinkage
and selection operator can yield physically sound models for systems with a
modest number of degrees of freedom. For large unit cells with low symmetry
and/or high-order expansions they come, however, with a non-negligible
computational cost that can be more than two orders of magnitude higher than
that of OLS. In such cases, OLS with cutoff selection provides a viable route
as demonstrated here for both second-order FCs in large low-symmetry unit cells
and high-order FCs in low-symmetry systems. While regression techniques are
thus very powerful, they require well-tuned protocols. Here, the present work
establishes guidelines for the design of protocols that are readily usable,
e.g., in high-throughput and materials discovery schemes. Since the underlying
algorithms are not specific to FC construction, the general conclusions drawn
here also have a bearing on the construction of other linear models in physics
and materials science.Comment: 15 pages, 12 figure
Denoising Autoencoders for fast Combinatorial Black Box Optimization
Estimation of Distribution Algorithms (EDAs) require flexible probability
models that can be efficiently learned and sampled. Autoencoders (AE) are
generative stochastic networks with these desired properties. We integrate a
special type of AE, the Denoising Autoencoder (DAE), into an EDA and evaluate
the performance of DAE-EDA on several combinatorial optimization problems with
a single objective. We asses the number of fitness evaluations as well as the
required CPU times. We compare the results to the performance to the Bayesian
Optimization Algorithm (BOA) and RBM-EDA, another EDA which is based on a
generative neural network which has proven competitive with BOA. For the
considered problem instances, DAE-EDA is considerably faster than BOA and
RBM-EDA, sometimes by orders of magnitude. The number of fitness evaluations is
higher than for BOA, but competitive with RBM-EDA. These results show that DAEs
can be useful tools for problems with low but non-negligible fitness evaluation
costs.Comment: corrected typos and small inconsistencie
- …