4,717 research outputs found
Unsupervised Recurrent Neural Network Grammars
Recurrent neural network grammars (RNNG) are generative models of language
which jointly model syntax and surface structure by incrementally generating a
syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs
achieve strong language modeling and parsing performance, but require an
annotated corpus of parse trees. In this work, we experiment with unsupervised
learning of RNNGs. Since directly marginalizing over the space of latent trees
is intractable, we instead apply amortized variational inference. To maximize
the evidence lower bound, we develop an inference network parameterized as a
neural CRF constituency parser. On language modeling, unsupervised RNNGs
perform as well their supervised counterparts on benchmarks in English and
Chinese. On constituency grammar induction, they are competitive with recent
neural language models that induce tree structures from words through attention
mechanisms.Comment: NAACL 201
What's Going On in Neural Constituency Parsers? An Analysis
A number of differences have emerged between modern and classic approaches to
constituency parsing in recent years, with structural components like grammars
and feature-rich lexicons becoming less central while recurrent neural network
representations rise in popularity. The goal of this work is to analyze the
extent to which information provided directly by the model structure in
classical systems is still being captured by neural methods. To this end, we
propose a high-performance neural model (92.08 F1 on PTB) that is
representative of recent work and perform a series of investigative
experiments. We find that our model implicitly learns to encode much of the
same information that was explicitly provided by grammars and lexicons in the
past, indicating that this scaffolding can largely be subsumed by powerful
general-purpose neural machinery.Comment: NAACL 201
The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations
In order for neural networks to learn complex languages or grammars, they
must have sufficient computational power or resources to recognize or generate
such languages. Though many approaches have been discussed, one ob- vious
approach to enhancing the processing power of a recurrent neural network is to
couple it with an external stack memory - in effect creating a neural network
pushdown automata (NNPDA). This paper discusses in detail this NNPDA - its
construction, how it can be trained and how useful symbolic information can be
extracted from the trained network.
In order to couple the external stack to the neural network, an optimization
method is developed which uses an error function that connects the learning of
the state automaton of the neural network to the learning of the operation of
the external stack. To minimize the error function using gradient descent
learning, an analog stack is designed such that the action and storage of
information in the stack are continuous. One interpretation of a continuous
stack is the probabilistic storage of and action on data. After training on
sample strings of an unknown source grammar, a quantization procedure extracts
from the analog stack and neural network a discrete pushdown automata (PDA).
Simulations show that in learning deterministic context-free grammars - the
balanced parenthesis language, 1*n0*n, and the deterministic Palindrome - the
extracted PDA is correct in the sense that it can correctly recognize unseen
strings of arbitrary length. In addition, the extracted PDAs can be shown to be
identical or equivalent to the PDAs of the source grammars which were used to
generate the training strings
An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks
Rule extraction from black-box models is critical in domains that require
model validation before implementation, as can be the case in credit scoring
and medical diagnosis. Though already a challenging problem in statistical
learning in general, the difficulty is even greater when highly non-linear,
recursive models, such as recurrent neural networks (RNNs), are fit to data.
Here, we study the extraction of rules from second-order recurrent neural
networks trained to recognize the Tomita grammars. We show that production
rules can be stably extracted from trained RNNs and that in certain cases the
rules outperform the trained RNNs
Learning finite state machines with self-clustering recurrent networks
Recent work has shown that recurrent neural networks have the ability to learn finite state automata from examples. In particular, networks using second-order units have been successful at this task. In studying the performance and learning behavior of such networks we have found that the second-order network model attempts to form clusters in activation space as its internal representation of states. However, these learned states become unstable as longer and longer test input strings are presented to the network. In essence, the network “forgets” where the individual states are in activation space. In this paper we propose a new method to force such a network to learn stable states by introducing discretization into the network and using a pseudo-gradient learning rule to perform training. The essence of the learning rule is that in doing gradient descent, it makes use of the gradient of a sigmoid function as a heuristic hint in place of that of the hard-limiting function, while still using the discretized value in the feedback update path. The new structure uses isolated points in activation space instead of vague clusters as its internal representation of states. It is shown to have similar capabilities in learning finite state automata as the original network, but without the instability problem. The proposed pseudo-gradient learning rule may also be used as a basis for training other types of networks that have hard-limiting threshold activation functions
Which Neural Network Architecture matches Human Behavior in Artificial Grammar Learning?
In recent years artificial neural networks achieved performance close to or
better than humans in several domains: tasks that were previously human
prerogatives, such as language processing, have witnessed remarkable
improvements in state of the art models. One advantage of this technological
boost is to facilitate comparison between different neural networks and human
performance, in order to deepen our understanding of human cognition. Here, we
investigate which neural network architecture (feed-forward vs. recurrent)
matches human behavior in artificial grammar learning, a crucial aspect of
language acquisition. Prior experimental studies proved that artificial
grammars can be learnt by human subjects after little exposure and often
without explicit knowledge of the underlying rules. We tested four grammars
with different complexity levels both in humans and in feedforward and
recurrent networks. Our results show that both architectures can 'learn' (via
error back-propagation) the grammars after the same number of training
sequences as humans do, but recurrent networks perform closer to humans than
feedforward ones, irrespective of the grammar complexity level. Moreover,
similar to visual processing, in which feedforward and recurrent architectures
have been related to unconscious and conscious processes, our results suggest
that explicit learning is best modeled by recurrent architectures, whereas
feedforward networks better capture the dynamics involved in implicit learning
Inducing Regular Grammars Using Recurrent Neural Networks
Grammar induction is the task of learning a grammar from a set of examples.
Recently, neural networks have been shown to be powerful learning machines that
can identify patterns in streams of data. In this work we investigate their
effectiveness in inducing a regular grammar from data, without any assumptions
about the grammar. We train a recurrent neural network to distinguish between
strings that are in or outside a regular language, and utilize an algorithm for
extracting the learned finite-state automaton. We apply this method to several
regular languages and find unexpected results regarding the connections between
the network's states that may be regarded as evidence for generalization.Comment: Accepted to L&R 2018 workshop, ICML & IJCA
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
Despite the recent achievements in machine learning, we are still very far
from achieving real artificial intelligence. In this paper, we discuss the
limitations of standard deep learning approaches and show that some of these
limitations can be overcome by learning how to grow the complexity of a model
in a structured way. Specifically, we study the simplest sequence prediction
problems that are beyond the scope of what is learnable with standard recurrent
networks, algorithmically generated sequences which can only be learned by
models which have the capacity to count and to memorize sequences. We show that
some basic algorithms can be learned from sequential data using a recurrent
network associated with a trainable memory
Human Action Forecasting by Learning Task Grammars
For effective human-robot interaction, it is important that a robotic
assistant can forecast the next action a human will consider in a given task.
Unfortunately, real-world tasks are often very long, complex, and repetitive;
as a result forecasting is not trivial. In this paper, we propose a novel deep
recurrent architecture that takes as input features from a two-stream Residual
action recognition framework, and learns to estimate the progress of human
activities from video sequences -- this surrogate progress estimation task
implicitly learns a temporal task grammar with respect to which activities can
be localized and forecasted. To learn the task grammar, we propose a stacked
LSTM based multi-granularity progress estimation framework that uses a novel
cumulative Euclidean loss as objective. To demonstrate the effectiveness of our
proposed architecture, we showcase experiments on two challenging robotic
assistive tasks, namely (i) assembling an Ikea table from its constituents, and
(ii) changing the tires of a car. Our results demonstrate that learning task
grammars offers highly discriminative cues improving the forecasting accuracy
by more than 9% over the baseline two-stream forecasting model, while also
outperforming other competitive schemes
Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets
In order to build efficient deep recurrent neural architectures, it
isessential to analyze the complexity of long distance dependencies(LDDs) of
the dataset being modeled. In this context, in this pa-per, we present detailed
analysis of the complexity and the degreeof LDDs (orLDD characteristics)
exhibited by various sequentialbenchmark datasets. We observe that the datasets
sampled from asimilar process or task (e.g. natural language, or sequential
MNIST,etc) display similar LDD characteristics. Upon analysing the
LDDcharacteristics, we were able to analyze the factors influencingthem; such
as (i) number of unique symbols in a dataset, (ii) sizeof the dataset, (iii)
number of interacting symbols within a givenLDD, and (iv) the distance between
the interacting symbols. Wedemonstrate that analysing LDD characteristics can
inform theselection of optimal hyper-parameters for SOTA deep recurrentneural
architectures. This analysis can directly contribute to thedevelopment of more
accurate and efficient sequential models. Wealso introduce the use of
Strictlyk-Piecewise languages as a pro-cess to generate synthesized datasets
for language modelling. Theadvantage of these synthesized datasets is that they
enable targetedtesting of deep recurrent neural architectures in terms of their
abil-ity to model LDDs with different characteristics. Moreover, usinga variety
of Strictlyk-Piecewise languages we generate a numberof new benchmarking
datasets, and analyse the performance of anumber of SOTA recurrent
architectures on these new benchmarks
- …