321 research outputs found

    Gated Linear Networks

    Full text link
    This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs). What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn feature representations in favor of rapid online learning. Individual neurons can model nonlinear functions via the use of data-dependent gating in conjunction with online convex optimization. We show that this architecture gives rise to universal learning capabilities in the limit, with effective model capacity increasing as a function of network size in a manner comparable with deep ReLU networks. Furthermore, we demonstrate that the GLN learning mechanism possesses extraordinary resilience to catastrophic forgetting, performing comparably to a MLP with dropout and Elastic Weight Consolidation on standard benchmarks. These desirable theoretical and empirical properties position GLNs as a complementary technique to contemporary offline deep learning methods.Comment: arXiv admin note: substantial text overlap with arXiv:1712.0189

    New architectures for very deep learning

    Get PDF
    Artificial Neural Networks are increasingly being used in complex real- world applications because many-layered (i.e., deep) architectures can now be trained on large quantities of data. However, training even deeper, and therefore more powerful networks, has hit a barrier due to fundamental limitations in the design of existing networks. This thesis develops new architectures that, for the first time, allow very deep networks to be optimized efficiently and reliably. Specifically, it addresses two key issues that hamper credit assignment in neural networks: cross-pattern interference and vanishing gradients. Cross- pattern interference leads to oscillations of the network’s weights that make training inefficient. The proposed Local Winner-Take-All networks reduce interference among computation units in the same layer through local competition. An in-depth analysis of locally competitive networks provides generalizable insights and reveals unifying properties that improve credit assignment. As network depth increases, vanishing gradients make a network’s outputs increasingly insensitive to the weights close to the inputs, causing the failure of gradient-based training. To overcome this limitation, the proposed Highway networks regulate information flow across layers through additional skip connections which are modulated by learned computation units. Their beneficial properties are extended to the sequential domain with Recurrent Highway Networks that gain from increased depth and learn complex sequential transitions without requiring more parameters

    Persistence Pays off: Paying Attention to What the LSTM Gating Mechanism Persists

    Get PDF
    Language Models (LMs) are important components in several Natural Language Processing systems. Recurrent Neural Network LMs composed of LSTM units, especially those augmented with an external memory, have achieved state-of-the-art results. However, these models still struggle to process long sequences which are more likely to contain long-distance dependencies because of information fading and a bias towards more recent information. In this paper we demonstrate an effective mechanism for retrieving information in a memory augmented LSTM LM based on attending to information in memory in proportion to the number of timesteps the LSTM gating mechanism persisted the information

    Scalable Probabilistic Model Selection for Network Representation Learning in Biological Network Inference

    Get PDF
    A biological system is a complex network of heterogeneous molecular entities and their interactions contributing to various biological characteristics of the system. Although the biological networks not only provide an elegant theoretical framework but also offer a mathematical foundation to analyze, understand, and learn from complex biological systems, the reconstruction of biological networks is an important and unsolved problem. Current biological networks are noisy, sparse and incomplete, limiting the ability to create a holistic view of the biological reconstructions and thus fail to provide a system-level understanding of the biological phenomena. Experimental identification of missing interactions is both time-consuming and expensive. Recent advancements in high-throughput data generation and significant improvement in computational power have led to novel computational methods to predict missing interactions. However, these methods still suffer from several unresolved challenges. It is challenging to extract information about interactions and incorporate that information into the computational model. Furthermore, the biological data are not only heterogeneous but also high-dimensional and sparse presenting the difficulty of modeling from indirect measurements. The heterogeneous nature and sparsity of biological data pose significant challenges to the design of deep neural network structures which use essentially either empirical or heuristic model selection methods. These unscalable methods heavily rely on expertise and experimentation, which is a time-consuming and error-prone process and are prone to overfitting. Furthermore, the complex deep networks tend to be poorly calibrated with high confidence on incorrect predictions. In this dissertation, we describe novel algorithms that address these challenges. In Part I, we design novel neural network structures to learn representation for biological entities and further expand the model to integrate heterogeneous biological data for biological interaction prediction. In part II, we develop a novel Bayesian model selection method to infer the most plausible network structures warranted by data. We demonstrate that our methods achieve the state-of-the-art performance on the tasks across various domains including interaction prediction. Experimental studies on various interaction networks show that our method makes accurate and calibrated predictions. Our novel probabilistic model selection approach enables the network structures to dynamically evolve to accommodate incrementally available data. In conclusion, we discuss the limitations and future directions for proposed works

    Learn, don't forget: constructive methods for effective continual learning

    Get PDF
    L'objectiu distintiu de la intel·ligència artificial és aconseguir agents amb capacitat per adaptar-se a fluxos continus d'informació. L'aprenentatge continu pretén donar resposta a aquest repte. No obstant això, els models d'aprenentatge automàtic acumulen el coneixement d'una manera diferent de la dels humans, i l'aprenentatge de noves tasques condueix a la degradació de les passades, fenomen anomenat "oblit catastròfic". La majoria dels mètodes d'aprenentatge continu o penalitzen el canvi dels paràmetres considerats importants per a les tasques passades (mètodes basats en la regularització) o bé emprenen una petita memòria intermèdia de repetició (mètodes basats en la repetició) que alimenta el model amb exemples de tasques passades per preservar el rendiment. Tot i això, el paper exacte que juga la regularització i els altres possibles factors que fan que el procés d'aprenentatge continu sigui eficaç no es coneixen bé. El projecte dóna llum sobre aquestes qüestions i suggereix maneres de millorar el rendiment de l'aprenentatge continu en tasques de visió com la classificació.El objetivo distintivo de la inteligencia artificial reside en conseguir agentes con capacidad para adaptarse a flujos continuos de información. El aprendizaje continuo pretende dar respuesta a este reto. Sin embargo, los modelos de aprendizaje automático acumulan el conocimiento de una manera diferente a la de los humanos, y el aprendizaje de nuevas tareas conduce a la degradación de las pasadas, fenómeno denominado "olvido catastrófico". La mayoría de los métodos de aprendizaje continuo o bien penalizan el cambio de los parámetros considerados importantes para las tareas pasadas (métodos basados en la regularización) o bien emplean un pequeño búfer de repetición (métodos basados en la repetición) que alimenta el modelo con ejemplos de tareas pasadas para preservar el rendimiento. Sin embargo, el papel exacto que juega la regularización y los demás posibles factores que hacen que el proceso de aprendizaje continuo sea eficaz no se conocen bien. El proyecto arroja luz sobre estas cuestiones y sugiere formas de mejorar el rendimiento del aprendizaje continuo en tareas de visión como la clasificación.The hallmark of artificial intelligence lies in agents with capabilities to adapt to continuous streams of information and tasks. Continual Learning aims to address this challenge. However, machine learning models accumulate knowledge in a manner different from humans, and learning new tasks leads to degradation in past ones, a phenomenon aptly named "catastrophic forgetting". Most continual learning methods either penalize the change of parameters deemed important for past tasks (regularization-based methods) or employ a small replay buffer (replay-based methods) that feeds the model examples from past tasks in order to preserve performance. However, the role and nature of the regularization and the other possible factors that make the continual learning process effective are not well understood. The project sheds light on these questions and suggests ways to improve the performance of continual learning in vision tasks such as classification.Outgoin
    corecore