50 research outputs found
A neural blackboard architecture of sentence structure
We present a neural architecture for sentence representation. Sentences are represented in terms of word representations as constituents. A word representation consists of a neural assembly distributed over the brain. Sentence representation does not result from associations between neural word assemblies. Instead, word assemblies are embedded in a neural architecture, in which the structural (thematic) relations between words can be represented. Arbitrary thematic relations between arguments and verbs can be represented. Arguments can consist of nouns and phrases, as in sentences with relative clauses. A number of sentences can be stored simultaneously in this architecture. We simulate how probe questions about thematic relations can be answered. We discuss how differences in sentence complexity, such as the difference between subject-extracted versus object-extracted relative clauses and the difference between right-branching versus center-embedded structures, can be related to the underlying neural dynamics of the model. Finally, we illustrate how memory capacity for sentence representation can be related to the nature of reverberating neural activity, which is used to store information temporarily in this architecture
The role of recurrent networks in neural architectures of grounded cognition: learning of control
Recurrent networks have been used as neural models of language processing, with mixed results. Here, we discuss the role of recurrent networks in a neural architecture of grounded cognition. In particular, we discuss how the control of binding in this architecture can be learned. We trained a simple recurrent network (SRN) and a feedforward network (FFN) for this task. The results show that information from the architecture is needed as input for these networks to learn control of binding. Thus, both control systems are recurrent. We found that the recurrent system consisting of the architecture and an SRN or an FFN as a "core" can learn basic (but recursive) sentence structures. Problems with control of binding arise when the system with the SRN is tested on number of new sentence structures. In contrast, control of binding for these structures succeeds with the FFN. Yet, for some structures with (unlimited) embeddings, difficulties arise due to dynamical binding conflicts in the architecture itself. In closing, we discuss potential future developments of the architecture presented here
GATology for Linguistics: What Syntactic Dependencies It Knows
Graph Attention Network (GAT) is a graph neural network which is one of the
strategies for modeling and representing explicit syntactic knowledge and can
work with pre-trained models, such as BERT, in downstream tasks. Currently,
there is still a lack of investigation into how GAT learns syntactic knowledge
from the perspective of model structure. As one of the strategies for modeling
explicit syntactic knowledge, GAT and BERT have never been applied and
discussed in Machine Translation (MT) scenarios. We design a dependency
relation prediction task to study how GAT learns syntactic knowledge of three
languages as a function of the number of attention heads and layers. We also
use a paired t-test and F1-score to clarify the differences in syntactic
dependency prediction between GAT and BERT fine-tuned by the MT task (MT-B).
The experiments show that better performance can be achieved by appropriately
increasing the number of attention heads with two GAT layers. With more than
two layers, learning suffers. Moreover, GAT is more competitive in training
speed and syntactic dependency prediction than MT-B, which may reveal a better
incorporation of modeling explicit syntactic knowledge and the possibility of
combining GAT and BERT in the MT tasks
Syntactic Knowledge via Graph Attention with BERT in Machine Translation
Although the Transformer model can effectively acquire context features via a
self-attention mechanism, deeper syntactic knowledge is still not effectively
modeled. To alleviate the above problem, we propose Syntactic knowledge via
Graph attention with BERT (SGB) in Machine Translation (MT) scenarios. Graph
Attention Network (GAT) and BERT jointly represent syntactic dependency feature
as explicit knowledge of the source language to enrich source language
representations and guide target language generation. Our experiments use gold
syntax-annotation sentences and Quality Estimation (QE) model to obtain
interpretability of translation quality improvement regarding syntactic
knowledge without being limited to a BLEU score. Experiments show that the
proposed SGB engines improve translation quality across the three MT tasks
without sacrificing BLEU scores. We investigate what length of source sentences
benefits the most and what dependencies are better identified by the SGB
engines. We also find that learning of specific dependency relations by GAT can
be reflected in the translation quality containing such relations and that
syntax on the graph leads to new modeling of syntactic aspects of source
sentences in the middle and bottom layers of BERT
Population density equations for stochastic processes with memory kernels
We present a method for solving population density equations (PDEs)–-a mean-field technique describing homogeneous populations of uncoupled neurons—where the populations can be subject to non-Markov noise for arbitrary distributions of jump sizes. The method combines recent developments in two different disciplines that traditionally have had limited interaction: computational neuroscience and the theory of random networks. The method uses a geometric binning scheme, based on the method of characteristics, to capture the deterministic neurodynamics of the population, separating the deterministic and stochastic process cleanly. We can independently vary the choice of the deterministic model and the model for the stochastic process, leading to a highly modular numerical solution strategy. We demonstrate this by replacing the master equation implicit in many formulations of the PDE formalism by a generalization called the generalized Montroll-Weiss equation—a recent result from random network theory—describing a random walker subject to transitions realized by a non-Markovian process. We demonstrate the method for leaky- and quadratic-integrate and fire neurons subject to spike trains with Poisson and gamma-distributed interspike intervals. We are able to model jump responses for both models accurately to both excitatory and inhibitory input under the assumption that all inputs are generated by one renewal process