3,054 research outputs found
Critical Learning Periods in Deep Neural Networks
Similar to humans and animals, deep artificial neural networks exhibit
critical periods during which a temporary stimulus deficit can impair the
development of a skill. The extent of the impairment depends on the onset and
length of the deficit window, as in animal models, and on the size of the
neural network. Deficits that do not affect low-level statistics, such as
vertical flipping of the images, have no lasting effect on performance and can
be overcome with further training. To better understand this phenomenon, we use
the Fisher Information of the weights to measure the effective connectivity
between layers of a network during training. Counterintuitively, information
rises rapidly in the early phases of training, and then decreases, preventing
redistribution of information resources in a phenomenon we refer to as a loss
of "Information Plasticity". Our analysis suggests that the first few epochs
are critical for the creation of strong connections that are optimal relative
to the input data distribution. Once such strong connections are created, they
do not appear to change during additional training. These findings suggest that
the initial learning transient, under-scrutinized compared to asymptotic
behavior, plays a key role in determining the outcome of the training process.
Our findings, combined with recent theoretical results in the literature, also
suggest that forgetting (decrease of information in the weights) is critical to
achieving invariance and disentanglement in representation learning. Finally,
critical periods are not restricted to biological systems, but can emerge
naturally in learning systems, whether biological or artificial, due to
fundamental constrains arising from learning dynamics and information
processing
A Telescopic Binary Learning Machine for Training Neural Networks
This paper proposes a new algorithm based on multi-scale stochastic local
search with binary representation for training neural networks.
In particular, we study the effects of neighborhood evaluation strategies,
the effect of the number of bits per weight and that of the maximum weight
range used for mapping binary strings to real values. Following this
preliminary investigation, we propose a telescopic multi-scale version of local
search where the number of bits is increased in an adaptive manner, leading to
a faster search and to local minima of better quality. An analysis related to
adapting the number of bits in a dynamic way is also presented. The control on
the number of bits, which happens in a natural manner in the proposed method,
is effective to increase the generalization performance. Benchmark tasks
include a highly non-linear artificial problem, a control problem requiring
either feed-forward or recurrent architectures for feedback control, and
challenging real-world tasks in different application domains.
The results demonstrate the effectiveness of the proposed method.Comment: Submitted to IEEE Transactions on Neural Networks and Learning
Systems, special issue on New Developments in Neural Network Structures for
Signal Processing, Autonomous Decision, and Adaptive Contro
Single Flux Quantum Based Ultrahigh Speed Spiking Neuromorphic Processor Architecture
Artificial neural networks inspired by brain operations can improve the
possibilities of solving complex problems more efficiently. Today's computing
hardware, on the other hand, is mainly based on von Neumann architecture and
CMOS technology, which is inefficient at implementing neural networks. For the
first time, we propose an ultrahigh speed, spiking neuromorphic processor
architecture built upon single flux quantum (SFQ) based artificial neurons
(JJ-Neuron). Proposed architecture has the potential to provide higher
performance and power efficiency over the state of the art including CMOS,
memristors and nanophotonics devices. JJ-Neuron has the ultrafast spiking
capability, trainability with commodity design software even after fabrication
and compatibility with commercial CMOS and SFQ foundry services. We
experimentally demonstrate the soma part of the JJ-Neuron for various
activation functions together with peripheral SFQ logic gates. Then, the neural
network is trained for the IRIS dataset and we have shown 100% match with the
results of the offline training with 1.2x synaptic operations per
second (SOPS) and 8.57x SOPS/W performance and power efficiency,
respectively. In addition, scalability for SOPS and
SOPS/W is shown which is at least five orders of magnitude more efficient than
the state of the art CMOS circuits and one order of magnitude more efficient
than estimations of nanophotonics-based architectures
Time Series Prediction : Predicting Stock Price
Time series forecasting is widely used in a multitude of domains. In this
paper, we present four models to predict the stock price using the SPX index as
input time series data. The martingale and ordinary linear models require the
strongest assumption in stationarity which we use as baseline models. The
generalized linear model requires lesser assumptions but is unable to
outperform the martingale. In empirical testing, the RNN model performs the
best comparing to other two models, because it will update the input through
LSTM instantaneously, but also does not beat the martingale. In addition, we
introduce an online to batch algorithm and discrepancy measure to inform
readers the newest research in time series predicting method, which doesn't
require any stationarity or non mixing assumptions in time series data.
Finally, to apply these forecasting to practice, we introduce basic trading
strategies that can create Win win and Zero sum situations.Comment: Under advisement of Dr. Sang Kim, for his class CS542. Additional
author unname
Precision requirements for single-layer feedforward neural networks
This paper presents a mathematical analysis of the effect of limited precision analog hardware for weight adaptation to be used in on-chip learning feedforward neural networks. Easy-to-read equations and simple worst-case estimations for the maximum tolerable imprecision are presented. As an application of the analysis, a worst-case estimation on the minimum size of the weight storage capacitors is presente
Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks
We present a formal measure-theoretical theory of neural networks (NN) built
on probability coupling theory. Our main contributions are summarized as
follows.
* Built on the formalism of probability coupling theory, we derive an
algorithm framework, named Hierarchical Measure Group and Approximate System
(HMGAS), nicknamed S-System, that is designed to learn the complex
hierarchical, statistical dependency in the physical world.
* We show that NNs are special cases of S-System when the probability kernels
assume certain exponential family distributions. Activation Functions are
derived formally. We further endow geometry on NNs through information
geometry, show that intermediate feature spaces of NNs are stochastic
manifolds, and prove that "distance" between samples is contracted as layers
stack up.
* S-System shows NNs are inherently stochastic, and under a set of realistic
boundedness and diversity conditions, it enables us to prove that for large
size nonlinear deep NNs with a class of losses, including the hinge loss, all
local minima are global minima with zero loss errors, and regions around the
minima are flat basins where all eigenvalues of Hessians are concentrated
around zero, using tools and ideas from mean field theory, random matrix
theory, and nonlinear operator equations.
* S-System, the information-geometry structure and the optimization behaviors
combined completes the analog between Renormalization Group (RG) and NNs. It
shows that a NN is a complex adaptive system that estimates the statistic
dependency of microscopic object, e.g., pixels, in multiple scales. Unlike
clear-cut physical quantity produced by RG in physics, e.g., temperature, NNs
renormalize/recompose manifolds emerging through learning/optimization that
divide the sample space into highly semantically meaningful groups that are
dictated by supervised labels (in supervised NNs)
Learning to Support: Exploiting Structure Information in Support Sets for One-Shot Learning
Deep Learning shows very good performance when trained on large labeled data
sets. The problem of training a deep net on a few or one sample per class
requires a different learning approach which can generalize to unseen classes
using only a few representatives of these classes. This problem has previously
been approached by meta-learning. Here we propose a novel meta-learner which
shows state-of-the-art performance on common benchmarks for one/few shot
classification. Our model features three novel components: First is a
feed-forward embedding that takes random class support samples (after a
customary CNN embedding) and transfers them to a better class representation in
terms of a classification problem. Second is a novel attention mechanism,
inspired by competitive learning, which causes class representatives to compete
with each other to become a temporary class prototype with respect to the query
point. This mechanism allows switching between representatives depending on the
position of the query point. Once a prototype is chosen for each class, the
predicated label is computed using a simple attention mechanism over prototypes
of all considered classes. The third feature is the ability of our meta-learner
to incorporate deeper CNN embedding, enabling larger capacity. Finally, to ease
the training procedure and reduce overfitting, we averages the top models
(evaluated on the validation) over the optimization trajectory. We show that
this approach can be viewed as an approximation to an ensemble, which saves the
factor of in training and test times and the factor of of in the
storage of the final model
Short-Term Plasticity and Long-Term Potentiation in Magnetic Tunnel Junctions: Towards Volatile Synapses
Synaptic memory is considered to be the main element responsible for learning
and cognition in humans. Although traditionally non-volatile long-term
plasticity changes have been implemented in nanoelectronic synapses for
neuromorphic applications, recent studies in neuroscience have revealed that
biological synapses undergo meta-stable volatile strengthening followed by a
long-term strengthening provided that the frequency of the input stimulus is
sufficiently high. Such "memory strengthening" and "memory decay"
functionalities can potentially lead to adaptive neuromorphic architectures. In
this paper, we demonstrate the close resemblance of the magnetization dynamics
of a Magnetic Tunnel Junction (MTJ) to short-term plasticity and long-term
potentiation observed in biological synapses. We illustrate that, in addition
to the magnitude and duration of the input stimulus, frequency of the stimulus
plays a critical role in determining long-term potentiation of the MTJ. Such
MTJ synaptic memory arrays can be utilized to create compact, ultra-fast and
low power intelligent neural systems.Comment: The article will appear in a future issue of Physical Review Applie
Differentiable programming and its applications to dynamical systems
Differentiable programming is the combination of classical neural networks
modules with algorithmic ones in an end-to-end differentiable model. These new
models, that use automatic differentiation to calculate gradients, have new
learning capabilities (reasoning, attention and memory). In this tutorial,
aimed at researchers in nonlinear systems with prior knowledge of deep
learning, we present this new programming paradigm, describe some of its new
features such as attention mechanisms, and highlight the benefits they bring.
Then, we analyse the uses and limitations of traditional deep learning models
in the modeling and prediction of dynamical systems. Here, a dynamical system
is meant to be a set of state variables that evolve in time under general
internal and external interactions. Finally, we review the advantages and
applications of differentiable programming to dynamical systems.Comment: 11 page
Solving Nonlinear and High-Dimensional Partial Differential Equations via Deep Learning
In this work we apply the Deep Galerkin Method (DGM) described in Sirignano
and Spiliopoulos (2018) to solve a number of partial differential equations
that arise in quantitative finance applications including option pricing,
optimal execution, mean field games, etc. The main idea behind DGM is to
represent the unknown function of interest using a deep neural network. A key
feature of this approach is the fact that, unlike other commonly used numerical
approaches such as finite difference methods, it is mesh-free. As such, it does
not suffer (as much as other numerical methods) from the curse of
dimensionality associated with highdimensional PDEs and PDE systems. The main
goals of this paper are to elucidate the features, capabilities and limitations
of DGM by analyzing aspects of its implementation for a number of different
PDEs and PDE systems. Additionally, we present: (1) a brief overview of PDEs in
quantitative finance along with numerical methods for solving them; (2) a brief
overview of deep learning and, in particular, the notion of neural networks;
(3) a discussion of the theoretical foundations of DGM with a focus on the
justification of why this method is expected to perform well
- …