5,905 research outputs found
A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication
This paper has two contributions. First, we propose a novel coded matrix
multiplication technique called Generalized PolyDot codes that advances on
existing methods for coded matrix multiplication under storage and
communication constraints. This technique uses "garbage alignment," i.e.,
aligning computations in coded computing that are not a part of the desired
output. Generalized PolyDot codes bridge between Polynomial codes and MatDot
codes, trading off between recovery threshold and communication costs. Second,
we demonstrate that Generalized PolyDot can be used for training large Deep
Neural Networks (DNNs) on unreliable nodes prone to soft-errors. This requires
us to address three additional challenges: (i) prohibitively large overhead of
coding the weight matrices in each layer of the DNN at each iteration; (ii)
nonlinear operations during training, which are incompatible with linear
coding; and (iii) not assuming presence of an error-free master node, requiring
us to architect a fully decentralized implementation without any "single point
of failure." We allow all primary DNN training steps, namely, matrix
multiplication, nonlinear activation, Hadamard product, and update steps as
well as the encoding/decoding to be error-prone. We consider the case of
mini-batch size , as well as , leveraging coded matrix-vector
products, and matrix-matrix products respectively. The problem of DNN training
under soft-errors also motivates an interesting, probabilistic error model
under which a real number MDS code is shown to correct errors
with probability as compared to for the
more conventional, adversarial error model. We also demonstrate that our
proposed strategy can provide unbounded gains in error tolerance over a
competing replication strategy and a preliminary MDS-code-based strategy for
both these error models.Comment: Presented in part at the IEEE International Symposium on Information
Theory 2018 (Submission Date: Jan 12 2018); Currently under review at the
IEEE Transactions on Information Theor
Nearly extensive sequential memory lifetime achieved by coupled nonlinear neurons
Many cognitive processes rely on the ability of the brain to hold sequences
of events in short-term memory. Recent studies have revealed that such memory
can be read out from the transient dynamics of a network of neurons. However,
the memory performance of such a network in buffering past information has only
been rigorously estimated in networks of linear neurons. When signal gain is
kept low, so that neurons operate primarily in the linear part of their
response nonlinearity, the memory lifetime is bounded by the square root of the
network size. In this work, I demonstrate that it is possible to achieve a
memory lifetime almost proportional to the network size, "an extensive memory
lifetime", when the nonlinearity of neurons is appropriately utilized. The
analysis of neural activity revealed that nonlinear dynamics prevented the
accumulation of noise by partially removing noise in each time step. With this
error-correcting mechanism, I demonstrate that a memory lifetime of order
can be achieved.Comment: 21 pages, 5 figures, the manuscript has been accepted for publication
in Neural Computatio
Learning to Discriminate Through Long-Term Changes of Dynamical Synaptic Transmission
Short-term synaptic plasticity is modulated by long-term synaptic
changes. There is, however, no general agreement on the computational
role of this interaction. Here, we derive a learning rule for the release
probability and the maximal synaptic conductance in a circuit model
with combined recurrent and feedforward connections that allows learning
to discriminate among natural inputs. Short-term synaptic plasticity
thereby provides a nonlinear expansion of the input space of a linear
classifier, whereas the random recurrent network serves to decorrelate
the expanded input space. Computer simulations reveal that the twofold
increase in the number of input dimensions through short-term synaptic
plasticity improves the performance of a standard perceptron up to 100%.
The distributions of release probabilities and maximal synaptic conductances
at the capacity limit strongly depend on the balance between excitation
and inhibition. The model also suggests a new computational
interpretation of spikes evoked by stimuli outside the classical receptive
field. These neuronal activitiesmay reflect decorrelation of the expanded
stimulus space by intracortical synaptic connections
Towards a learning-theoretic analysis of spike-timing dependent plasticity
This paper suggests a learning-theoretic perspective on how synaptic
plasticity benefits global brain functioning. We introduce a model, the
selectron, that (i) arises as the fast time constant limit of leaky
integrate-and-fire neurons equipped with spiking timing dependent plasticity
(STDP) and (ii) is amenable to theoretical analysis. We show that the selectron
encodes reward estimates into spikes and that an error bound on spikes is
controlled by a spiking margin and the sum of synaptic weights. Moreover, the
efficacy of spikes (their usefulness to other reward maximizing selectrons)
also depends on total synaptic strength. Finally, based on our analysis, we
propose a regularized version of STDP, and show the regularization improves the
robustness of neuronal learning when faced with multiple stimuli.Comment: To appear in Adv. Neural Inf. Proc. System
Optimally adapted multi-state neural networks trained with noise
The principle of adaptation in a noisy retrieval environment is extended here
to a diluted attractor neural network of Q-state neurons trained with noisy
data. The network is adapted to an appropriate noisy training overlap and
training activity which are determined self-consistently by the optimized
retrieval attractor overlap and activity. The optimized storage capacity and
the corresponding retriever overlap are considerably enhanced by an adequate
threshold in the states. Explicit results for improved optimal performance and
new retriever phase diagrams are obtained for Q=3 and Q=4, with coexisting
phases over a wide range of thresholds. Most of the interesting results are
stable to replica-symmetry-breaking fluctuations.Comment: 22 pages, 5 figures, accepted for publication in PR
Stability of the replica symmetric solution for the information conveyed by by a neural network
The information that a pattern of firing in the output layer of a feedforward
network of threshold-linear neurons conveys about the network's inputs is
considered. A replica-symmetric solution is found to be stable for all but
small amounts of noise. The region of instability depends on the contribution
of the threshold and the sparseness: for distributed pattern distributions, the
unstable region extends to higher noise variances than for very sparse
distributions, for which it is almost nonexistant.Comment: 19 pages, LaTeX, 5 figures. Also available at
http://www.mrc-bbc.ox.ac.uk/~schultz/papers.html . Submitted to Phys. Rev. E
Minor change
- ā¦