106,529 research outputs found

    Attentive Tensor Product Learning

    Full text link
    This paper proposes a new architecture - Attentive Tensor Product Learning (ATPL) - to represent grammatical structures in deep learning models. ATPL is a new architecture to bridge this gap by exploiting Tensor Product Representations (TPR), a structured neural-symbolic model developed in cognitive science, aiming to integrate deep learning with explicit language structures and rules. The key ideas of ATPL are: 1) unsupervised learning of role-unbinding vectors of words via TPR-based deep neural network; 2) employing attention modules to compute TPR; and 3) integration of TPR with typical deep learning architectures including Long Short-Term Memory (LSTM) and Feedforward Neural Network (FFNN). The novelty of our approach lies in its ability to extract the grammatical structure of a sentence by using role-unbinding vectors, which are obtained in an unsupervised manner. This ATPL approach is applied to 1) image captioning, 2) part of speech (POS) tagging, and 3) constituency parsing of a sentence. Experimental results demonstrate the effectiveness of the proposed approach

    Learning, Memory, and the Role of Neural Network Architecture

    Get PDF
    The performance of information processing systems, from artificial neural networks to natural neuronal ensembles, depends heavily on the underlying system architecture. In this study, we compare the performance of parallel and layered network architectures during sequential tasks that require both acquisition and retention of information, thereby identifying tradeoffs between learning and memory processes. During the task of supervised, sequential function approximation, networks produce and adapt representations of external information. Performance is evaluated by statistically analyzing the error in these representations while varying the initial network state, the structure of the external information, and the time given to learn the information. We link performance to complexity in network architecture by characterizing local error landscape curvature. We find that variations in error landscape structure give rise to tradeoffs in performance; these include the ability of the network to maximize accuracy versus minimize inaccuracy and produce specific versus generalizable representations of information. Parallel networks generate smooth error landscapes with deep, narrow minima, enabling them to find highly specific representations given sufficient time. While accurate, however, these representations are difficult to generalize. In contrast, layered networks generate rough error landscapes with a variety of local minima, allowing them to quickly find coarse representations. Although less accurate, these representations are easily adaptable. The presence of measurable performance tradeoffs in both layered and parallel networks has implications for understanding the behavior of a wide variety of natural and artificial learning systems

    Investigating Continual Learning Strategies in Neural Networks

    Get PDF
    This paper explores the role of continual learning strategies when neural networks are confronted with learning tasks sequentially. We analyze the stability-plasticity dilemma with three factors in mind: the type of network architecture used, the continual learning scenario defined and the continual learning strategy implemented. Our results show that complementary learning systems and neural volume significantly contribute towards memory retrieval and consolidation in neural networks. Finally, we demonstrate how regularization strategies such as elastic weight consolidation are more well-suited for larger neural networks whereas rehearsal strategies such as gradient episodic memory are better suited for smaller neural networks

    An FPGA-based Convolution IP Core for Deep Neural Networks Acceleration

    Get PDF
    The development of machine learning has made a revolution in various applications such as object detection, image/video recognition, and semantic segmentation. Neural networks, a class of machine learning, play a crucial role in this process because of their remarkable improvement over traditional algorithms. However, neural networks are now going deeper and cost a significant amount of computation operations. Therefore they usually work ineffectively in edge devices that have limited resources and low performance. In this paper, we research a solution to accelerate the neural network inference phase using FPGA-based platforms. We analyze neural network models, their mathematical operations, and the inference phase in various platforms. We also profile the characteristics that affect the performance of neural network inference. Based on the analysis, we propose an architecture to accelerate the convolution operation used in most neural networks and takes up most of the computations in networks in terms of parallelism, data reuse, and memory management. We conduct different experiments to validate the FPGA-based convolution core architecture as well as to compare performance. Experimental results show that the core is platform-independent. The core outperforms a quad-core ARM processor functioning at 1.2 GHz and a 6-core Intel CPU with speed-ups of up to 15.69× and 2.78×, respectivel

    Neural Dynamics of Learning and Performance of Fixed Sequences: Latency Pattern Reorganizations and the N-STREAMS Model

    Full text link
    Fixed sequences performed from memory play a key role in human cultural behavior, especially in music and in rapid communication through speaking, handwriting, and typing. Upon first performance, fixed sequences are often produced slowly, but extensive practice leads to performance that is both fluid and as rapid as allowed by constraints inherent in the task or the performer. The experimental study of fixed sequence learning and production has generated a large database with some challenging findings, including practice-related reorganizations of temporal properties of performance. In this paper, we analyze this literature and identify a coherent set of robust experimental effects. Among these are both the sequence length effect on latency, a dependence of reaction time on sequence length, and practice-dependent lost of the lengths effect on latency. We then introduce a neural network architecture capable of explaining these effects. Called the NSTREAMS model, this multi-module architecture embodies the hypothesis that the brain uses several substrates for serial order representation and learning. The theory describes three such substrates and how learning autonomously modifies their interaction over the course of practice. A key feature of the architecture is the co-operation of a 'competitive queuing' performance mechanism with both fundamentally parallel ('priority-tagged') and fundamentally sequential ('chain-like') representations of serial order. A neurobiological interpretation of the architecture suggests how different parts of the brain divide the labor for serial learning and performance. Rhodes (1999) presents a complete mathematical model as implementation of the architecture, and reports successful simulations of the major experimental effects. It also highlights how the network mechanisms incorporated in the architecture compare and contrast with earlier substrates proposed for competitive queuing, priority tagging and response chaining.Defense Advanced Research Projects Agency and the Office of Naval Research (N00014-92-J-1309, N00014-93-1-1364, N00014-95-1-0409); National Institute of Health (RO1 DC02852
    • …
    corecore