15,551 research outputs found
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
Long short-term memory (LSTM) networks and their variants are capable of
encapsulating long-range dependencies, which is evident from their performance
on a variety of linguistic tasks. On the other hand, simple recurrent networks
(SRNs), which appear more biologically grounded in terms of synaptic
connections, have generally been less successful at capturing long-range
dependencies as well as the loci of grammatical errors in an unsupervised
setting. In this paper, we seek to develop models that bridge the gap between
biological plausibility and linguistic competence. We propose a new
architecture, the Decay RNN, which incorporates the decaying nature of neuronal
activations and models the excitatory and inhibitory connections in a
population of neurons. Besides its biological inspiration, our model also shows
competitive performance relative to LSTMs on subject-verb agreement, sentence
grammaticality, and language modeling tasks. These results provide some
pointers towards probing the nature of the inductive biases required for RNN
architectures to model linguistic phenomena successfully.Comment: 11 pages, 5 figures (including appendix); to appear at ACL SRW 202
Dataflow Matrix Machines as a Generalization of Recurrent Neural Networks
Dataflow matrix machines are a powerful generalization of recurrent neural
networks. They work with multiple types of arbitrary linear streams, multiple
types of powerful neurons, and allow to incorporate higher-order constructions.
We expect them to be useful in machine learning and probabilistic programming,
and in the synthesis of dynamic systems and of deterministic and probabilistic
programs.Comment: 4 pages position paper (v2 - update references
Learning Numeracy: Binary Arithmetic with Neural Turing Machines
One of the main problems encountered so far with recurrent neural networks is
that they struggle to retain long-time information dependencies in their
recurrent connections. Neural Turing Machines (NTMs) attempt to mitigate this
issue by providing the neural network with an external portion of memory, in
which information can be stored and manipulated later on. The whole mechanism
is differentiable end-to-end, allowing the network to learn how to utilise this
long-term memory via stochastic gradient descent. This allows NTMs to infer
simple algorithms directly from data sequences. Nonetheless, the model can be
hard to train due to a large number of parameters and interacting components
and little related work is present. In this work we use NTMs to learn and
generalise two arithmetical tasks: binary addition and multiplication. These
tasks are two fundamental algorithmic examples in computer science, and are a
lot more challenging than the previously explored ones, with which we aim to
shed some light on the real capabilities on this neural model
Few-Shot Generalization Across Dialogue Tasks
Machine-learning based dialogue managers are able to learn complex behaviors
in order to complete a task, but it is not straightforward to extend their
capabilities to new domains. We investigate different policies' ability to
handle uncooperative user behavior, and how well expertise in completing one
task (such as restaurant reservations) can be reapplied when learning a new one
(e.g. booking a hotel). We introduce the Recurrent Embedding Dialogue Policy
(REDP), which embeds system actions and dialogue states in the same vector
space. REDP contains a memory component and attention mechanism based on a
modified Neural Turing Machine, and significantly outperforms a baseline LSTM
classifier on this task. We also show that both our architecture and baseline
solve the bAbI dialogue task, achieving 100% test accuracy
Fast Transient Simulation of High-Speed Channels Using Recurrent Neural Network
Generating eye diagrams by using a circuit simulator can be very
computationally intensive, especially in the presence of nonlinearities. It
often involves multiple Newton-like iterations at every time step when a
SPICE-like circuit simulator handles a nonlinear system in the transient
regime. In this paper, we leverage machine learning methods, to be specific,
the recurrent neural network (RNN), to generate black-box macromodels and
achieve significant reduction of computation time. Through the proposed
approach, an RNN model is first trained and then validated on a relatively
short sequence generated from a circuit simulator. Once the training completes,
the RNN can be used to make predictions on the remaining sequence in order to
generate an eye diagram. The training cost can also be amortized when the
trained RNN starts making predictions. Besides, the proposed approach requires
no complex circuit simulations nor substantial domain knowledge. We use two
high-speed link examples to demonstrate that the proposed approach provides
adequate accuracy while the computation time can be dramatically reduced. In
the high-speed link example with a PAM4 driver, the eye diagram generated by
RNN models shows good agreement with that obtained from a commercial circuit
simulator. This paper also investigates the impacts of various RNN topologies,
training schemes, and tunable parameters on both the accuracy and the
generalization capability of an RNN model. It is found out that the long
short-term memory (LSTM) network outperforms the vanilla RNN in terms of the
accuracy in predicting transient waveforms
Exploring Models and Data for Remote Sensing Image Caption Generation
Inspired by recent development of artificial satellite, remote sensing images
have attracted extensive attention. Recently, noticeable progress has been made
in scene classification and target detection.However, it is still not clear how
to describe the remote sensing image content with accurate and concise
sentences. In this paper, we investigate to describe the remote sensing images
with accurate and flexible sentences. First, some annotated instructions are
presented to better describe the remote sensing images considering the special
characteristics of remote sensing images. Second, in order to exhaustively
exploit the contents of remote sensing images, a large-scale aerial image data
set is constructed for remote sensing image caption. Finally, a comprehensive
review is presented on the proposed data set to fully advance the task of
remote sensing caption. Extensive experiments on the proposed data set
demonstrate that the content of the remote sensing image can be completely
described by generating language descriptions. The data set is available at
https://github.com/201528014227051/RSICD_optimalComment: 14 pages, 8 figure
A Selective Overview of Deep Learning
Deep learning has arguably achieved tremendous success in recent years. In
simple words, deep learning uses the composition of many nonlinear functions to
model the complex dependency between input features and labels. While neural
networks have a long history, recent advances have greatly improved their
performance in computer vision, natural language processing, etc. From the
statistical and scientific perspective, it is natural to ask: What is deep
learning? What are the new characteristics of deep learning, compared with
classical methods? What are the theoretical foundations of deep learning? To
answer these questions, we introduce common neural network models (e.g.,
convolutional neural nets, recurrent neural nets, generative adversarial nets)
and training techniques (e.g., stochastic gradient descent, dropout, batch
normalization) from a statistical point of view. Along the way, we highlight
new characteristics of deep learning (including depth and over-parametrization)
and explain their practical and theoretical benefits. We also sample recent
results on theories of deep learning, many of which are only suggestive. While
a complete understanding of deep learning remains elusive, we hope that our
perspectives and discussions serve as a stimulus for new statistical research
Efficient Probabilistic Inference in Generic Neural Networks Trained with Non-Probabilistic Feedback
Animals perform near-optimal probabilistic inference in a wide range of
psychophysical tasks. Probabilistic inference requires trial-to-trial
representation of the uncertainties associated with task variables and
subsequent use of this representation. Previous work has implemented such
computations using neural networks with hand-crafted and task-dependent
operations. We show that generic neural networks trained with a simple
error-based learning rule perform near-optimal probabilistic inference in nine
common psychophysical tasks. In a probabilistic categorization task,
error-based learning in a generic network simultaneously explains a monkey's
learning curve and the evolution of qualitative aspects of its choice behavior.
In all tasks, the number of neurons required for a given level of performance
grows sub-linearly with the input population size, a substantial improvement on
previous implementations of probabilistic inference. The trained networks
develop a novel sparsity-based probabilistic population code. Our results
suggest that probabilistic inference emerges naturally in generic neural
networks trained with error-based learning rules.Comment: 30 pages, 10 figures, 6 supplementary figure
Equilibrated Recurrent Neural Network: Neuronal Time-Delayed Self-Feedback Improves Accuracy and Stability
We propose a novel {\it Equilibrated Recurrent Neural Network} (ERNN) to
combat the issues of inaccuracy and instability in conventional RNNs. Drawing
upon the concept of autapse in neuroscience, we propose augmenting an RNN with
a time-delayed self-feedback loop. Our sole purpose is to modify the dynamics
of each internal RNN state and, at any time, enforce it to evolve close to the
equilibrium point associated with the input signal at that time. We show that
such self-feedback helps stabilize the hidden state transitions leading to fast
convergence during training while efficiently learning discriminative latent
features that result in state-of-the-art results on several benchmark datasets
at test-time. We propose a novel inexact Newton method to solve fixed-point
conditions given model parameters for generating the latent features at each
hidden state. We prove that our inexact Newton method converges locally with
linear rate (under mild conditions). We leverage this result for efficient
training of ERNNs based on backpropagation
Compositional generalization in a deep seq2seq model by separating syntax and semantics
Standard methods in deep learning for natural language processing fail to
capture the compositional structure of human language that allows for
systematic generalization outside of the training distribution. However, human
learners readily generalize in this way, e.g. by applying known grammatical
rules to novel words. Inspired by work in neuroscience suggesting separate
brain systems for syntactic and semantic processing, we implement a
modification to standard approaches in neural machine translation, imposing an
analogous separation. The novel model, which we call Syntactic Attention,
substantially outperforms standard methods in deep learning on the SCAN
dataset, a compositional generalization task, without any hand-engineered
features or additional supervision. Our work suggests that separating syntactic
from semantic learning may be a useful heuristic for capturing compositional
structure.Comment: 18 pages, 15 figures, preprint version of submission to NeurIPS 2019,
under revie
- …