5,218 research outputs found
Weighted Contrastive Divergence
Learning algorithms for energy based Boltzmann architectures that rely on
gradient descent are in general computationally prohibitive, typically due to
the exponential number of terms involved in computing the partition function.
In this way one has to resort to approximation schemes for the evaluation of
the gradient. This is the case of Restricted Boltzmann Machines (RBM) and its
learning algorithm Contrastive Divergence (CD). It is well-known that CD has a
number of shortcomings, and its approximation to the gradient has several
drawbacks. Overcoming these defects has been the basis of much research and new
algorithms have been devised, such as persistent CD. In this manuscript we
propose a new algorithm that we call Weighted CD (WCD), built from small
modifications of the negative phase in standard CD. However small these
modifications may be, experimental work reported in this paper suggest that WCD
provides a significant improvement over standard CD and persistent CD at a
small additional computational cost
Natural evolution strategies and variational Monte Carlo
A notion of quantum natural evolution strategies is introduced, which
provides a geometric synthesis of a number of known quantum/classical
algorithms for performing classical black-box optimization. Recent work of
Gomes et al. [2019] on heuristic combinatorial optimization using neural
quantum states is pedagogically reviewed in this context, emphasizing the
connection with natural evolution strategies. The algorithmic framework is
illustrated for approximate combinatorial optimization problems, and a
systematic strategy is found for improving the approximation ratios. In
particular it is found that natural evolution strategies can achieve
approximation ratios competitive with widely used heuristic algorithms for
Max-Cut, at the expense of increased computation time
Practical recommendations for gradient-based training of deep architectures
Learning algorithms related to artificial neural networks and in particular
for Deep Learning may seem to involve many bells and whistles, called
hyper-parameters. This chapter is meant as a practical guide with
recommendations for some of the most commonly used hyper-parameters, in
particular in the context of learning algorithms based on back-propagated
gradient and gradient-based optimization. It also discusses how to deal with
the fact that more interesting results can be obtained when allowing one to
adjust many hyper-parameters. Overall, it describes elements of the practice
used to successfully and efficiently train and debug large-scale and often deep
multi-layer neural networks. It closes with open questions about the training
difficulties observed with deeper architectures
- …