10,478 research outputs found

    Direct Feedback Alignment with Sparse Connections for Local Learning

    Get PDF
    Recent advances in deep neural networks (DNNs) owe their success to training algorithms that use backpropagation and gradient-descent. Backpropagation, while highly effective on von Neumann architectures, becomes inefficient when scaling to large networks. Commonly referred to as the weight transport problem, each neuron's dependence on the weights and errors located deeper in the network require exhaustive data movement which presents a key problem in enhancing the performance and energy-efficiency of machine-learning hardware. In this work, we propose a bio-plausible alternative to backpropagation drawing from advances in feedback alignment algorithms in which the error computation at a single synapse reduces to the product of three scalar values. Using a sparse feedback matrix, we show that a neuron needs only a fraction of the information previously used by the feedback alignment algorithms. Consequently, memory and compute can be partitioned and distributed whichever way produces the most efficient forward pass so long as a single error can be delivered to each neuron. Our results show orders of magnitude improvement in data movement and 2Ă—2\times improvement in multiply-and-accumulate operations over backpropagation. Like previous work, we observe that any variant of feedback alignment suffers significant losses in classification accuracy on deep convolutional neural networks. By transferring trained convolutional layers and training the fully connected layers using direct feedback alignment, we demonstrate that direct feedback alignment can obtain results competitive with backpropagation. Furthermore, we observe that using an extremely sparse feedback matrix, rather than a dense one, results in a small accuracy drop while yielding hardware advantages. All the code and results are available under https://github.com/bcrafton/ssdfa.Comment: 15 pages, 8 figure

    A Cognitive Model of an Epistemic Community: Mapping the Dynamics of Shallow Lake Ecosystems

    Full text link
    We used fuzzy cognitive mapping (FCM) to develop a generic shallow lake ecosystem model by augmenting the individual cognitive maps drawn by 8 scientists working in the area of shallow lake ecology. We calculated graph theoretical indices of the individual cognitive maps and the collective cognitive map produced by augmentation. The graph theoretical indices revealed internal cycles showing non-linear dynamics in the shallow lake ecosystem. The ecological processes were organized democratically without a top-down hierarchical structure. The steady state condition of the generic model was a characteristic turbid shallow lake ecosystem since there were no dynamic environmental changes that could cause shifts between a turbid and a clearwater state, and the generic model indicated that only a dynamic disturbance regime could maintain the clearwater state. The model developed herein captured the empirical behavior of shallow lakes, and contained the basic model of the Alternative Stable States Theory. In addition, our model expanded the basic model by quantifying the relative effects of connections and by extending it. In our expanded model we ran 4 simulations: harvesting submerged plants, nutrient reduction, fish removal without nutrient reduction, and biomanipulation. Only biomanipulation, which included fish removal and nutrient reduction, had the potential to shift the turbid state into clearwater state. The structure and relationships in the generic model as well as the outcomes of the management simulations were supported by actual field studies in shallow lake ecosystems. Thus, fuzzy cognitive mapping methodology enabled us to understand the complex structure of shallow lake ecosystems as a whole and obtain a valid generic model based on tacit knowledge of experts in the field.Comment: 24 pages, 5 Figure

    Quantum Generative Adversarial Networks for Learning and Loading Random Distributions

    Full text link
    Quantum algorithms have the potential to outperform their classical counterparts in a variety of tasks. The realization of the advantage often requires the ability to load classical data efficiently into quantum states. However, the best known methods require O(2n)\mathcal{O}\left(2^n\right) gates to load an exact representation of a generic data structure into an nn-qubit state. This scaling can easily predominate the complexity of a quantum algorithm and, thereby, impair potential quantum advantage. Our work presents a hybrid quantum-classical algorithm for efficient, approximate quantum state loading. More precisely, we use quantum Generative Adversarial Networks (qGANs) to facilitate efficient learning and loading of generic probability distributions -- implicitly given by data samples -- into quantum states. Through the interplay of a quantum channel, such as a variational quantum circuit, and a classical neural network, the qGAN can learn a representation of the probability distribution underlying the data samples and load it into a quantum state. The loading requires O(poly(n))\mathcal{O}\left(poly\left(n\right)\right) gates and can, thus, enable the use of potentially advantageous quantum algorithms, such as Quantum Amplitude Estimation. We implement the qGAN distribution learning and loading method with Qiskit and test it using a quantum simulation as well as actual quantum processors provided by the IBM Q Experience. Furthermore, we employ quantum simulation to demonstrate the use of the trained quantum channel in a quantum finance application.Comment: 14 pages, 13 figure

    Information theory, complexity and neural networks

    Get PDF
    Some of the main results in the mathematical evaluation of neural networks as information processing systems are discussed. The basic operation of feedback and feed-forward neural networks is described. Their memory capacity and computing power are considered. The concept of learning by example as it applies to neural networks is examined

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page
    • …
    corecore