3,054 research outputs found

    Critical Learning Periods in Deep Neural Networks

    Full text link
    Similar to humans and animals, deep artificial neural networks exhibit critical periods during which a temporary stimulus deficit can impair the development of a skill. The extent of the impairment depends on the onset and length of the deficit window, as in animal models, and on the size of the neural network. Deficits that do not affect low-level statistics, such as vertical flipping of the images, have no lasting effect on performance and can be overcome with further training. To better understand this phenomenon, we use the Fisher Information of the weights to measure the effective connectivity between layers of a network during training. Counterintuitively, information rises rapidly in the early phases of training, and then decreases, preventing redistribution of information resources in a phenomenon we refer to as a loss of "Information Plasticity". Our analysis suggests that the first few epochs are critical for the creation of strong connections that are optimal relative to the input data distribution. Once such strong connections are created, they do not appear to change during additional training. These findings suggest that the initial learning transient, under-scrutinized compared to asymptotic behavior, plays a key role in determining the outcome of the training process. Our findings, combined with recent theoretical results in the literature, also suggest that forgetting (decrease of information in the weights) is critical to achieving invariance and disentanglement in representation learning. Finally, critical periods are not restricted to biological systems, but can emerge naturally in learning systems, whether biological or artificial, due to fundamental constrains arising from learning dynamics and information processing

    A Telescopic Binary Learning Machine for Training Neural Networks

    Full text link
    This paper proposes a new algorithm based on multi-scale stochastic local search with binary representation for training neural networks. In particular, we study the effects of neighborhood evaluation strategies, the effect of the number of bits per weight and that of the maximum weight range used for mapping binary strings to real values. Following this preliminary investigation, we propose a telescopic multi-scale version of local search where the number of bits is increased in an adaptive manner, leading to a faster search and to local minima of better quality. An analysis related to adapting the number of bits in a dynamic way is also presented. The control on the number of bits, which happens in a natural manner in the proposed method, is effective to increase the generalization performance. Benchmark tasks include a highly non-linear artificial problem, a control problem requiring either feed-forward or recurrent architectures for feedback control, and challenging real-world tasks in different application domains. The results demonstrate the effectiveness of the proposed method.Comment: Submitted to IEEE Transactions on Neural Networks and Learning Systems, special issue on New Developments in Neural Network Structures for Signal Processing, Autonomous Decision, and Adaptive Contro

    Single Flux Quantum Based Ultrahigh Speed Spiking Neuromorphic Processor Architecture

    Full text link
    Artificial neural networks inspired by brain operations can improve the possibilities of solving complex problems more efficiently. Today's computing hardware, on the other hand, is mainly based on von Neumann architecture and CMOS technology, which is inefficient at implementing neural networks. For the first time, we propose an ultrahigh speed, spiking neuromorphic processor architecture built upon single flux quantum (SFQ) based artificial neurons (JJ-Neuron). Proposed architecture has the potential to provide higher performance and power efficiency over the state of the art including CMOS, memristors and nanophotonics devices. JJ-Neuron has the ultrafast spiking capability, trainability with commodity design software even after fabrication and compatibility with commercial CMOS and SFQ foundry services. We experimentally demonstrate the soma part of the JJ-Neuron for various activation functions together with peripheral SFQ logic gates. Then, the neural network is trained for the IRIS dataset and we have shown 100% match with the results of the offline training with 1.2x1010{10}^{10} synaptic operations per second (SOPS) and 8.57x1011{10}^{11} SOPS/W performance and power efficiency, respectively. In addition, scalability for 1018{10}^{18} SOPS and 1017{10}^{17} SOPS/W is shown which is at least five orders of magnitude more efficient than the state of the art CMOS circuits and one order of magnitude more efficient than estimations of nanophotonics-based architectures

    Time Series Prediction : Predicting Stock Price

    Full text link
    Time series forecasting is widely used in a multitude of domains. In this paper, we present four models to predict the stock price using the SPX index as input time series data. The martingale and ordinary linear models require the strongest assumption in stationarity which we use as baseline models. The generalized linear model requires lesser assumptions but is unable to outperform the martingale. In empirical testing, the RNN model performs the best comparing to other two models, because it will update the input through LSTM instantaneously, but also does not beat the martingale. In addition, we introduce an online to batch algorithm and discrepancy measure to inform readers the newest research in time series predicting method, which doesn't require any stationarity or non mixing assumptions in time series data. Finally, to apply these forecasting to practice, we introduce basic trading strategies that can create Win win and Zero sum situations.Comment: Under advisement of Dr. Sang Kim, for his class CS542. Additional author unname

    Precision requirements for single-layer feedforward neural networks

    Get PDF
    This paper presents a mathematical analysis of the effect of limited precision analog hardware for weight adaptation to be used in on-chip learning feedforward neural networks. Easy-to-read equations and simple worst-case estimations for the maximum tolerable imprecision are presented. As an application of the analysis, a worst-case estimation on the minimum size of the weight storage capacitors is presente

    Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks

    Full text link
    We present a formal measure-theoretical theory of neural networks (NN) built on probability coupling theory. Our main contributions are summarized as follows. * Built on the formalism of probability coupling theory, we derive an algorithm framework, named Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, that is designed to learn the complex hierarchical, statistical dependency in the physical world. * We show that NNs are special cases of S-System when the probability kernels assume certain exponential family distributions. Activation Functions are derived formally. We further endow geometry on NNs through information geometry, show that intermediate feature spaces of NNs are stochastic manifolds, and prove that "distance" between samples is contracted as layers stack up. * S-System shows NNs are inherently stochastic, and under a set of realistic boundedness and diversity conditions, it enables us to prove that for large size nonlinear deep NNs with a class of losses, including the hinge loss, all local minima are global minima with zero loss errors, and regions around the minima are flat basins where all eigenvalues of Hessians are concentrated around zero, using tools and ideas from mean field theory, random matrix theory, and nonlinear operator equations. * S-System, the information-geometry structure and the optimization behaviors combined completes the analog between Renormalization Group (RG) and NNs. It shows that a NN is a complex adaptive system that estimates the statistic dependency of microscopic object, e.g., pixels, in multiple scales. Unlike clear-cut physical quantity produced by RG in physics, e.g., temperature, NNs renormalize/recompose manifolds emerging through learning/optimization that divide the sample space into highly semantically meaningful groups that are dictated by supervised labels (in supervised NNs)

    Learning to Support: Exploiting Structure Information in Support Sets for One-Shot Learning

    Full text link
    Deep Learning shows very good performance when trained on large labeled data sets. The problem of training a deep net on a few or one sample per class requires a different learning approach which can generalize to unseen classes using only a few representatives of these classes. This problem has previously been approached by meta-learning. Here we propose a novel meta-learner which shows state-of-the-art performance on common benchmarks for one/few shot classification. Our model features three novel components: First is a feed-forward embedding that takes random class support samples (after a customary CNN embedding) and transfers them to a better class representation in terms of a classification problem. Second is a novel attention mechanism, inspired by competitive learning, which causes class representatives to compete with each other to become a temporary class prototype with respect to the query point. This mechanism allows switching between representatives depending on the position of the query point. Once a prototype is chosen for each class, the predicated label is computed using a simple attention mechanism over prototypes of all considered classes. The third feature is the ability of our meta-learner to incorporate deeper CNN embedding, enabling larger capacity. Finally, to ease the training procedure and reduce overfitting, we averages the top tt models (evaluated on the validation) over the optimization trajectory. We show that this approach can be viewed as an approximation to an ensemble, which saves the factor of tt in training and test times and the factor of of tt in the storage of the final model

    Short-Term Plasticity and Long-Term Potentiation in Magnetic Tunnel Junctions: Towards Volatile Synapses

    Full text link
    Synaptic memory is considered to be the main element responsible for learning and cognition in humans. Although traditionally non-volatile long-term plasticity changes have been implemented in nanoelectronic synapses for neuromorphic applications, recent studies in neuroscience have revealed that biological synapses undergo meta-stable volatile strengthening followed by a long-term strengthening provided that the frequency of the input stimulus is sufficiently high. Such "memory strengthening" and "memory decay" functionalities can potentially lead to adaptive neuromorphic architectures. In this paper, we demonstrate the close resemblance of the magnetization dynamics of a Magnetic Tunnel Junction (MTJ) to short-term plasticity and long-term potentiation observed in biological synapses. We illustrate that, in addition to the magnitude and duration of the input stimulus, frequency of the stimulus plays a critical role in determining long-term potentiation of the MTJ. Such MTJ synaptic memory arrays can be utilized to create compact, ultra-fast and low power intelligent neural systems.Comment: The article will appear in a future issue of Physical Review Applie

    Differentiable programming and its applications to dynamical systems

    Full text link
    Differentiable programming is the combination of classical neural networks modules with algorithmic ones in an end-to-end differentiable model. These new models, that use automatic differentiation to calculate gradients, have new learning capabilities (reasoning, attention and memory). In this tutorial, aimed at researchers in nonlinear systems with prior knowledge of deep learning, we present this new programming paradigm, describe some of its new features such as attention mechanisms, and highlight the benefits they bring. Then, we analyse the uses and limitations of traditional deep learning models in the modeling and prediction of dynamical systems. Here, a dynamical system is meant to be a set of state variables that evolve in time under general internal and external interactions. Finally, we review the advantages and applications of differentiable programming to dynamical systems.Comment: 11 page

    Solving Nonlinear and High-Dimensional Partial Differential Equations via Deep Learning

    Full text link
    In this work we apply the Deep Galerkin Method (DGM) described in Sirignano and Spiliopoulos (2018) to solve a number of partial differential equations that arise in quantitative finance applications including option pricing, optimal execution, mean field games, etc. The main idea behind DGM is to represent the unknown function of interest using a deep neural network. A key feature of this approach is the fact that, unlike other commonly used numerical approaches such as finite difference methods, it is mesh-free. As such, it does not suffer (as much as other numerical methods) from the curse of dimensionality associated with highdimensional PDEs and PDE systems. The main goals of this paper are to elucidate the features, capabilities and limitations of DGM by analyzing aspects of its implementation for a number of different PDEs and PDE systems. Additionally, we present: (1) a brief overview of PDEs in quantitative finance along with numerical methods for solving them; (2) a brief overview of deep learning and, in particular, the notion of neural networks; (3) a discussion of the theoretical foundations of DGM with a focus on the justification of why this method is expected to perform well
    corecore