183 research outputs found
Hierarchical neural networks perform both serial and parallel processing
In this work we study a Hebbian neural network, where neurons are arranged
according to a hierarchical architecture such that their couplings scale with
their reciprocal distance. As a full statistical mechanics solution is not yet
available, after a streamlined introduction to the state of the art via that
route, the problem is consistently approached through signal- to-noise
technique and extensive numerical simulations. Focusing on the low-storage
regime, where the amount of stored patterns grows at most logarithmical with
the system size, we prove that these non-mean-field Hopfield-like networks
display a richer phase diagram than their classical counterparts. In
particular, these networks are able to perform serial processing (i.e. retrieve
one pattern at a time through a complete rearrangement of the whole ensemble of
neurons) as well as parallel processing (i.e. retrieve several patterns
simultaneously, delegating the management of diff erent patterns to diverse
communities that build network). The tune between the two regimes is given by
the rate of the coupling decay and by the level of noise affecting the system.
The price to pay for those remarkable capabilities lies in a network's capacity
smaller than the mean field counterpart, thus yielding a new budget principle:
the wider the multitasking capabilities, the lower the network load and
viceversa. This may have important implications in our understanding of
biological complexity
Top-Down Processing: Top-Down Network Combines Back-Propagation with Attention
Early neural network models relied exclusively on bottom-up processing going
from the input signals to higher-level representations. Many recent models also
incorporate top-down networks going in the opposite direction. Top-down
processing in deep learning models plays two primary roles: learning and
directing attention. These two roles are accomplished in current models through
distinct mechanisms. While top-down attention is often implemented by extending
the model's architecture with additional units that propagate information from
high to low levels of the network, learning is typically accomplished by an
external learning algorithm such as back-propagation. In the current work, we
present an integration of the two functions above, which appear unrelated,
using a single unified mechanism. We propose a novel symmetric bottom-up
top-down network structure that can integrate standard bottom-up networks with
a symmetric top-down counterpart, allowing each network to guide and influence
the other. The same top-down network is being used for both learning, via
back-propagating feedback signals, and at the same time also for top-down
attention, by guiding the bottom-up network to perform a selected task. We show
that our method achieves competitive performance on a standard multi-task
learning benchmark. Yet, we rely on standard single-task architectures and
optimizers, without any task-specific parameters. Additionally, our learning
algorithm addresses in a new way some neuroscience issues that arise in
biological modeling of learning in the brain
Multitask Protein Function Prediction Through Task Dissimilarity
Automated protein function prediction is a challenging problem with distinctive features, such as the hierarchical organization of protein functions and the scarcity of annotated proteins for most biological functions. We propose a multitask learning algorithm addressing both issues. Unlike standard multitask algorithms, which use task (protein functions) similarity information as a bias to speed up learning, we show that dissimilarity information enforces separation of rare class labels from frequent class labels, and for this reason is better suited for solving unbalanced protein function prediction problems. We support our claim by showing that a multitask extension of the label propagation algorithm empirically works best when the task relatedness information is represented using a dissimilarity matrix as opposed to a similarity matrix. Moreover, the experimental comparison carried out on three model organism shows that our method has a more stable performance in both "protein-centric" and "function-centric" evaluation settings
Automatically Extracting Information in Medical Dialogue: Expert System And Attention for Labelling
Medical dialogue information extraction is becoming an increasingly
significant problem in modern medical care. It is difficult to extract key
information from electronic medical records (EMRs) due to their large numbers.
Previously, researchers proposed attention-based models for retrieving features
from EMRs, but their limitations were reflected in their inability to recognize
different categories in medical dialogues. In this paper, we propose a novel
model, Expert System and Attention for Labelling (ESAL). We use mixture of
experts and pre-trained BERT to retrieve the semantics of different categories,
enabling the model to fuse the differences between them. In our experiment,
ESAL was applied to a public dataset and the experimental results indicated
that ESAL significantly improved the performance of Medical Information
Classification
A Marr's Three‐Level Analytical Framework for Neuromorphic Electronic Systems
Neuromorphic electronics, an emerging field that aims for building electronic mimics of the biological brain, holds promise for reshaping the frontiers of information technology and enabling a more intelligent and efficient computing paradigm. As their biological brain counterpart, the neuromorphic electronic systems are complex, having multiple levels of organization. Inspired by David Marr's famous three-level analytical framework developed for neuroscience, the advances in neuromorphic electronic systems are selectively surveyed and given significance to these research endeavors as appropriate from the computational level, algorithmic level, or implementation level. Under this framework, the problem of how to build a neuromorphic electronic system is defined in a tractable way. In conclusion, the development of neuromorphic electronic systems confronts a similar challenge to the one neuroscience confronts, that is, the limited constructability of the low-level knowledge (implementations and algorithms) to achieve high-level brain-like (human-level) computational functions. An opportunity arises from the communication among different levels and their codesign. Neuroscience lab-on-neuromorphic chip platforms offer additional opportunity for mutual benefit between the two disciplines
Mnemosyne: Learning to Train Transformers with Transformers
In this work, we propose a new class of learnable optimizers, called
\textit{Mnemosyne}. It is based on the novel spatio-temporal low-rank implicit
attention Transformers that can learn to train entire neural network
architectures, including other Transformers, without any task-specific
optimizer tuning. We show that Mnemosyne: (a) outperforms popular LSTM
optimizers (also with new feature engineering to mitigate catastrophic
forgetting of LSTMs), (b) can successfully train Transformers while using
simple meta-training strategies that require minimal computational resources,
(c) matches accuracy-wise SOTA hand-designed optimizers with carefully tuned
hyper-parameters (often producing top performing models). Furthermore,
Mnemosyne provides space complexity comparable to that of its hand-designed
first-order counterparts, which allows it to scale to training larger sets of
parameters. We conduct an extensive empirical evaluation of Mnemosyne on: (a)
fine-tuning a wide range of Vision Transformers (ViTs) from medium-size
architectures to massive ViT-Hs (36 layers, 16 heads), (b) pre-training BERT
models and (c) soft prompt-tuning large 11B+ T5XXL models. We complement our
results with a comprehensive theoretical analysis of the compact associative
memory used by Mnemosyne which we believe was never done before
- …