183 research outputs found

    Hierarchical neural networks perform both serial and parallel processing

    Get PDF
    In this work we study a Hebbian neural network, where neurons are arranged according to a hierarchical architecture such that their couplings scale with their reciprocal distance. As a full statistical mechanics solution is not yet available, after a streamlined introduction to the state of the art via that route, the problem is consistently approached through signal- to-noise technique and extensive numerical simulations. Focusing on the low-storage regime, where the amount of stored patterns grows at most logarithmical with the system size, we prove that these non-mean-field Hopfield-like networks display a richer phase diagram than their classical counterparts. In particular, these networks are able to perform serial processing (i.e. retrieve one pattern at a time through a complete rearrangement of the whole ensemble of neurons) as well as parallel processing (i.e. retrieve several patterns simultaneously, delegating the management of diff erent patterns to diverse communities that build network). The tune between the two regimes is given by the rate of the coupling decay and by the level of noise affecting the system. The price to pay for those remarkable capabilities lies in a network's capacity smaller than the mean field counterpart, thus yielding a new budget principle: the wider the multitasking capabilities, the lower the network load and viceversa. This may have important implications in our understanding of biological complexity

    Top-Down Processing: Top-Down Network Combines Back-Propagation with Attention

    Full text link
    Early neural network models relied exclusively on bottom-up processing going from the input signals to higher-level representations. Many recent models also incorporate top-down networks going in the opposite direction. Top-down processing in deep learning models plays two primary roles: learning and directing attention. These two roles are accomplished in current models through distinct mechanisms. While top-down attention is often implemented by extending the model's architecture with additional units that propagate information from high to low levels of the network, learning is typically accomplished by an external learning algorithm such as back-propagation. In the current work, we present an integration of the two functions above, which appear unrelated, using a single unified mechanism. We propose a novel symmetric bottom-up top-down network structure that can integrate standard bottom-up networks with a symmetric top-down counterpart, allowing each network to guide and influence the other. The same top-down network is being used for both learning, via back-propagating feedback signals, and at the same time also for top-down attention, by guiding the bottom-up network to perform a selected task. We show that our method achieves competitive performance on a standard multi-task learning benchmark. Yet, we rely on standard single-task architectures and optimizers, without any task-specific parameters. Additionally, our learning algorithm addresses in a new way some neuroscience issues that arise in biological modeling of learning in the brain

    Multitask Protein Function Prediction Through Task Dissimilarity

    Get PDF
    Automated protein function prediction is a challenging problem with distinctive features, such as the hierarchical organization of protein functions and the scarcity of annotated proteins for most biological functions. We propose a multitask learning algorithm addressing both issues. Unlike standard multitask algorithms, which use task (protein functions) similarity information as a bias to speed up learning, we show that dissimilarity information enforces separation of rare class labels from frequent class labels, and for this reason is better suited for solving unbalanced protein function prediction problems. We support our claim by showing that a multitask extension of the label propagation algorithm empirically works best when the task relatedness information is represented using a dissimilarity matrix as opposed to a similarity matrix. Moreover, the experimental comparison carried out on three model organism shows that our method has a more stable performance in both "protein-centric" and "function-centric" evaluation settings

    Automatically Extracting Information in Medical Dialogue: Expert System And Attention for Labelling

    Full text link
    Medical dialogue information extraction is becoming an increasingly significant problem in modern medical care. It is difficult to extract key information from electronic medical records (EMRs) due to their large numbers. Previously, researchers proposed attention-based models for retrieving features from EMRs, but their limitations were reflected in their inability to recognize different categories in medical dialogues. In this paper, we propose a novel model, Expert System and Attention for Labelling (ESAL). We use mixture of experts and pre-trained BERT to retrieve the semantics of different categories, enabling the model to fuse the differences between them. In our experiment, ESAL was applied to a public dataset and the experimental results indicated that ESAL significantly improved the performance of Medical Information Classification

    A Marr's Three‐Level Analytical Framework for Neuromorphic Electronic Systems

    Get PDF
    Neuromorphic electronics, an emerging field that aims for building electronic mimics of the biological brain, holds promise for reshaping the frontiers of information technology and enabling a more intelligent and efficient computing paradigm. As their biological brain counterpart, the neuromorphic electronic systems are complex, having multiple levels of organization. Inspired by David Marr's famous three-level analytical framework developed for neuroscience, the advances in neuromorphic electronic systems are selectively surveyed and given significance to these research endeavors as appropriate from the computational level, algorithmic level, or implementation level. Under this framework, the problem of how to build a neuromorphic electronic system is defined in a tractable way. In conclusion, the development of neuromorphic electronic systems confronts a similar challenge to the one neuroscience confronts, that is, the limited constructability of the low-level knowledge (implementations and algorithms) to achieve high-level brain-like (human-level) computational functions. An opportunity arises from the communication among different levels and their codesign. Neuroscience lab-on-neuromorphic chip platforms offer additional opportunity for mutual benefit between the two disciplines

    Mnemosyne: Learning to Train Transformers with Transformers

    Full text link
    In this work, we propose a new class of learnable optimizers, called \textit{Mnemosyne}. It is based on the novel spatio-temporal low-rank implicit attention Transformers that can learn to train entire neural network architectures, including other Transformers, without any task-specific optimizer tuning. We show that Mnemosyne: (a) outperforms popular LSTM optimizers (also with new feature engineering to mitigate catastrophic forgetting of LSTMs), (b) can successfully train Transformers while using simple meta-training strategies that require minimal computational resources, (c) matches accuracy-wise SOTA hand-designed optimizers with carefully tuned hyper-parameters (often producing top performing models). Furthermore, Mnemosyne provides space complexity comparable to that of its hand-designed first-order counterparts, which allows it to scale to training larger sets of parameters. We conduct an extensive empirical evaluation of Mnemosyne on: (a) fine-tuning a wide range of Vision Transformers (ViTs) from medium-size architectures to massive ViT-Hs (36 layers, 16 heads), (b) pre-training BERT models and (c) soft prompt-tuning large 11B+ T5XXL models. We complement our results with a comprehensive theoretical analysis of the compact associative memory used by Mnemosyne which we believe was never done before
    corecore