23,744 research outputs found
Inherent Weight Normalization in Stochastic Neural Networks
Multiplicative stochasticity such as Dropout improves the robustness and
generalizability of deep neural networks. Here, we further demonstrate that
always-on multiplicative stochasticity combined with simple threshold neurons
are sufficient operations for deep neural networks. We call such models Neural
Sampling Machines (NSM). We find that the probability of activation of the NSM
exhibits a self-normalizing property that mirrors Weight Normalization, a
previously studied mechanism that fulfills many of the features of Batch
Normalization in an online fashion. The normalization of activities during
training speeds up convergence by preventing internal covariate shift caused by
changes in the input distribution. The always-on stochasticity of the NSM
confers the following advantages: the network is identical in the inference and
learning phases, making the NSM suitable for online learning, it can exploit
stochasticity inherent to a physical substrate such as analog non-volatile
memories for in-memory computing, and it is suitable for Monte Carlo sampling,
while requiring almost exclusively addition and comparison operations. We
demonstrate NSMs on standard classification benchmarks (MNIST and CIFAR) and
event-based classification benchmarks (N-MNIST and DVS Gestures). Our results
show that NSMs perform comparably or better than conventional artificial neural
networks with the same architecture
Online Learning of a Memory for Learning Rates
The promise of learning to learn for robotics rests on the hope that by
extracting some information about the learning process itself we can speed up
subsequent similar learning tasks. Here, we introduce a computationally
efficient online meta-learning algorithm that builds and optimizes a memory
model of the optimal learning rate landscape from previously observed gradient
behaviors. While performing task specific optimization, this memory of learning
rates predicts how to scale currently observed gradients. After applying the
gradient scaling our meta-learner updates its internal memory based on the
observed effect its prediction had. Our meta-learner can be combined with any
gradient-based optimizer, learns on the fly and can be transferred to new
optimization tasks. In our evaluations we show that our meta-learning algorithm
speeds up learning of MNIST classification and a variety of learning control
tasks, either in batch or online learning settings.Comment: accepted to ICRA 2018, code available:
https://github.com/fmeier/online-meta-learning ; video pitch available:
https://youtu.be/9PzQ25FPPO
Neural Network Models of Learning and Memory: Leading Questions and an Emerging Framework
Office of Naval Research and the Defense Advanced Research Projects Agency (N00014-95-1-0409, N00014-1-95-0657); National Institutes of Health (NIH 20-316-4304-5
Maximum Likelihood Associative Memories
Associative memories are structures that store data in such a way that it can
later be retrieved given only a part of its content -- a sort-of
error/erasure-resilience property. They are used in applications ranging from
caches and memory management in CPUs to database engines. In this work we study
associative memories built on the maximum likelihood principle. We derive
minimum residual error rates when the data stored comes from a uniform binary
source. Second, we determine the minimum amount of memory required to store the
same data. Finally, we bound the computational complexity for message
retrieval. We then compare these bounds with two existing associative memory
architectures: the celebrated Hopfield neural networks and a neural network
architecture introduced more recently by Gripon and Berrou
Dreaming neural networks: forgetting spurious memories and reinforcing pure ones
The standard Hopfield model for associative neural networks accounts for
biological Hebbian learning and acts as the harmonic oscillator for pattern
recognition, however its maximal storage capacity is , far
from the theoretical bound for symmetric networks, i.e. . Inspired
by sleeping and dreaming mechanisms in mammal brains, we propose an extension
of this model displaying the standard on-line (awake) learning mechanism (that
allows the storage of external information in terms of patterns) and an
off-line (sleep) unlearningconsolidating mechanism (that allows
spurious-pattern removal and pure-pattern reinforcement): this obtained daily
prescription is able to saturate the theoretical bound , remaining
also extremely robust against thermal noise. Both neural and synaptic features
are analyzed both analytically and numerically. In particular, beyond obtaining
a phase diagram for neural dynamics, we focus on synaptic plasticity and we
give explicit prescriptions on the temporal evolution of the synaptic matrix.
We analytically prove that our algorithm makes the Hebbian kernel converge with
high probability to the projection matrix built over the pure stored patterns.
Furthermore, we obtain a sharp and explicit estimate for the "sleep rate" in
order to ensure such a convergence. Finally, we run extensive numerical
simulations (mainly Monte Carlo sampling) to check the approximations
underlying the analytical investigations (e.g., we developed the whole theory
at the so called replica-symmetric level, as standard in the
Amit-Gutfreund-Sompolinsky reference framework) and possible finite-size
effects, finding overall full agreement with the theory.Comment: 31 pages, 12 figure
An analog feedback associative memory
A method for the storage of analog vectors, i.e., vectors whose components are real-valued, is developed for the Hopfield continuous-time network. An important requirement is that each memory vector has to be an asymptotically stable (i.e. attractive) equilibrium of the network. Some of the limitations imposed by the continuous Hopfield model on the set of vectors that can be stored are pointed out. These limitations can be relieved by choosing a network containing visible as well as hidden units. An architecture consisting of several hidden layers and a visible layer, connected in a circular fashion, is considered. It is proved that the two-layer case is guaranteed to store any number of given analog vectors provided their number does not exceed 1 + the number of neurons in the hidden layer. A learning algorithm that correctly adjusts the locations of the equilibria and guarantees their asymptotic stability is developed. Simulation results confirm the effectiveness of the approach
Second Order Neural Networks.
In this dissertation, a feedback neural network model has been proposed. This network uses a second order method of convergence based on the Newton-Raphson method. This neural network has both discrete as well as continuous versions. When used as an associative memory, the proposed model has been called the polynomial neural network (PNN). The memories of this network can be located anywhere in an n dimensional space rather than being confined to the corners of the latter. A method for storing memories has been proposed. This is a single step method unlike the currently known computationally intensive iterative methods. An energy function for the polynomial neural network has been suggested. Issues relating to the error-correcting ability of this network have been addressed. Additionally, it has been found that the attractor basins of the memories of this network reveal a curious fractal topology, thereby suggesting a highly complex and often unpredictable nature. The use of the second order neural network as a function optimizer has also been shown. While issues relating to the hardware realization of this network have only been addressed briefly, it has been indicated that such a network would have a large amount of hardware for its realization. This problem can be obviated by using a simplified model that has also been described. The performance of this simplified model is comparable to that of the basic model while requiring much less hardware for its realization
Recommended from our members
Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization.
The key operation in stochastic neural networks, which have become the state-of-the-art approach for solving problems in machine learning, information theory, and statistics, is a stochastic dot-product. While there have been many demonstrations of dot-product circuits and, separately, of stochastic neurons, the efficient hardware implementation combining both functionalities is still missing. Here we report compact, fast, energy-efficient, and scalable stochastic dot-product circuits based on either passively integrated metal-oxide memristors or embedded floating-gate memories. The circuit's high performance is due to mixed-signal implementation, while the efficient stochastic operation is achieved by utilizing circuit's noise, intrinsic and/or extrinsic to the memory cell array. The dynamic scaling of weights, enabled by analog memory devices, allows for efficient realization of different annealing approaches to improve functionality. The proposed approach is experimentally verified for two representative applications, namely by implementing neural network for solving a four-node graph-partitioning problem, and a Boltzmann machine with 10-input and 8-hidden neurons
- âŠ