53,415 research outputs found
Stochastic Answer Networks for Machine Reading Comprehension
We propose a simple yet robust stochastic answer network (SAN) that simulates
multi-step reasoning in machine reading comprehension. Compared to previous
work such as ReasoNet which used reinforcement learning to determine the number
of steps, the unique feature is the use of a kind of stochastic prediction
dropout on the answer module (final layer) of the neural network during the
training. We show that this simple trick improves robustness and achieves
results competitive to the state-of-the-art on the Stanford Question Answering
Dataset (SQuAD), the Adversarial SQuAD, and the Microsoft MAchine Reading
COmprehension Dataset (MS MARCO).Comment: 11 pages, 5 figures, Accepted to ACL 201
A Neural Model of How The Brain Represents and Compares Numbers
Many psychophysical experiments have shown that the representation of numbers and numerical quantities in humans and animals is related to number magnitude. A neural network model is proposed to quantitatively simulate error rates in quantification and numerical comparison tasks, and reaction times for number priming and numerical assessment and comparison tasks. Transient responses to inputs arc integrated before they activate an ordered spatial map that selectively responds to the number of events in a sequence. The dynamics of numerical comparison are encoded in activity pattern changes within this spatial map. Such changes cause a "directional comparison wave" whose properties mimic data about numerical comparison. These model mechanisms are variants of neural mechanisms that have elsewhere been used to explain data about motion perception, attention shifts, and target tracking. Thus, the present model suggests how numerical representations may have emerged as specializations of more primitive mechanisms in the cortical Where processing stream.National Science Foundation (IRI-97-20333); Defense Advanced research Projects Agency and the Office of Naval Research (N00014-95-1-0409); National Institute of Health (1-R29-DC02952-01
Multilingual Language Processing From Bytes
We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads
text as bytes and outputs span annotations of the form [start, length, label]
where start positions, lengths, and labels are separate entries in our
vocabulary. Because we operate directly on unicode bytes rather than
language-specific words or characters, we can analyze text in many languages
with a single model. Due to the small vocabulary size, these multilingual
models are very compact, but produce results similar to or better than the
state-of- the-art in Part-of-Speech tagging and Named Entity Recognition that
use only the provided training datasets (no external data sources). Our models
are learning "from scratch" in that they do not rely on any elements of the
standard pipeline in Natural Language Processing (including tokenization), and
thus can run in standalone fashion on raw text
Storing cycles in Hopfield-type networks with pseudoinverse learning rule: admissibility and network topology
Cyclic patterns of neuronal activity are ubiquitous in animal nervous
systems, and partially responsible for generating and controlling rhythmic
movements such as locomotion, respiration, swallowing and so on. Clarifying the
role of the network connectivities for generating cyclic patterns is
fundamental for understanding the generation of rhythmic movements. In this
paper, the storage of binary cycles in neural networks is investigated. We call
a cycle admissible if a connectivity matrix satisfying the cycle's
transition conditions exists, and construct it using the pseudoinverse learning
rule. Our main focus is on the structural features of admissible cycles and
corresponding network topology. We show that is admissible if and only
if its discrete Fourier transform contains exactly nonzero
columns. Based on the decomposition of the rows of into loops, where a
loop is the set of all cyclic permutations of a row, cycles are classified as
simple cycles, separable or inseparable composite cycles. Simple cycles contain
rows from one loop only, and the network topology is a feedforward chain with
feedback to one neuron if the loop-vectors in are cyclic permutations
of each other. Composite cycles contain rows from at least two disjoint loops,
and the neurons corresponding to the rows in from the same loop are
identified with a cluster. Networks constructed from separable composite cycles
decompose into completely isolated clusters. For inseparable composite cycles
at least two clusters are connected, and the cluster-connectivity is related to
the intersections of the spaces spanned by the loop-vectors of the clusters.
Simulations showing successfully retrieved cycles in continuous-time
Hopfield-type networks and in networks of spiking neurons are presented.Comment: 48 pages, 3 figure
A Neural Model of How the Brain Represents and Compares Multi-Digit Numbers: Spatial and Categorical Processes
Both animals and humans are capable of representing and comparing numerical quantities, but only humans seem to have evolved multi-digit place-value number systems. This article develops a neural model, called the Spatial Number Network, or SpaN model, which predicts how these shared numerical capabilities are computed using a spatial representation of number quantities in the Where cortical processing stream, notably the Inferior Parietal Cortex. Multi-digit numerical representations that obey a place-value principle are proposed to arise through learned interactions between categorical language representations in the What cortical processing stream and the Where spatial representation. It is proposed that learned semantic categories that symbolize separate digits, as well as place markers like "tens," "hundreds," "thousands," etc., are associated through learning with the corresponding spatial locations of the Where representation, leading to a place-value number system as an emergent property of What-Where information fusion. The model quantitatively simulates error rates in quantification and numerical comparison tasks, and reaction times for number priming and numerical assessment and comparison tasks. In the Where cortical process, it is proposed that transient responses to inputs are integrated before they activate an ordered spatial map that selectively responds to the number of events in a sequence. Neural mechanisms are defined which give rise to an ordered spatial numerical map ordering and Weber law characteristics as emergent properties. The dynamics of numerical comparison are encoded in activity pattern changes within this spatial map. Such changes cause a "directional comparison wave" whose properties mimic data about numerical comparison. These model mechanisms are variants of neural mechanisms that have elsewhere been used to explain data about motion perception, attention shifts, and target tracking. Thus, the present model suggests how numerical representations may have emerged as specializations of more primitive mechanisms in the cortical Where processing stream. The model's What-Where interactions can explain human psychophysical data, such as error rates and reaction times, about multi-digit (base 10) numerical stimuli, and describe how such a competence can develop through learning. The SpaN model and its explanatory range arc compared with other models of numerical representation.Defense Advanced Research Projects Agency and the Office of Naval Research (N00014-95-1-0409); National Science Foundation (IRI-97-20333
Neural network modeling of memory deterioration in Alzheimer's disease
The clinical course of Alzheimer's disease (AD) is generally characterized by progressive gradual deterioration, although large clinical variability exists. Motivated by the recent quantitative reports of synaptic changes in AD, we use a neural network model to investigate how the interplay between synaptic deletion and compensation determines the pattern of memory deterioration, a clinical hallmark of AD. Within the model we show that the deterioration of memory retrieval due to synaptic deletion can be much delayed by multiplying all the remaining synaptic weights by a common factor, which keeps the average input to each neuron at the same level. This parallels the experimental observation that the total synaptic area per unit volume (TSA) is initially preserved when synaptic deletion occurs. By using different dependencies of the compensatory factor on the amount of synaptic deletion one can define various compensation strategies, which can account for the observed variation in the severity and progression rate of AD
Supervised Learning in Spiking Neural Networks for Precise Temporal Encoding
Precise spike timing as a means to encode information in neural networks is
biologically supported, and is advantageous over frequency-based codes by
processing input features on a much shorter time-scale. For these reasons, much
recent attention has been focused on the development of supervised learning
rules for spiking neural networks that utilise a temporal coding scheme.
However, despite significant progress in this area, there still lack rules that
have a theoretical basis, and yet can be considered biologically relevant. Here
we examine the general conditions under which synaptic plasticity most
effectively takes place to support the supervised learning of a precise
temporal code. As part of our analysis we examine two spike-based learning
methods: one of which relies on an instantaneous error signal to modify
synaptic weights in a network (INST rule), and the other one on a filtered
error signal for smoother synaptic weight modifications (FILT rule). We test
the accuracy of the solutions provided by each rule with respect to their
temporal encoding precision, and then measure the maximum number of input
patterns they can learn to memorise using the precise timings of individual
spikes as an indication of their storage capacity. Our results demonstrate the
high performance of FILT in most cases, underpinned by the rule's
error-filtering mechanism, which is predicted to provide smooth convergence
towards a desired solution during learning. We also find FILT to be most
efficient at performing input pattern memorisations, and most noticeably when
patterns are identified using spikes with sub-millisecond temporal precision.
In comparison with existing work, we determine the performance of FILT to be
consistent with that of the highly efficient E-learning Chronotron, but with
the distinct advantage that FILT is also implementable as an online method for
increased biological realism.Comment: 26 pages, 10 figures, this version is published in PLoS ONE and
incorporates reviewer comment
- …