40,822 research outputs found
Neural Network Generation of Temporal Sequences from Single Static Vector Inputs using Varying Length Distal Target Sequences
Training an agent to operate in an environment whose
mappings are largely unknown is generally recognized to be exceptionally
difficult. Further, granting such a learning agent the ability to
produce an appropriate sequence of actions entirely from a single input
stimulus remains a key problem. Various reinforcement learning
techniques have been utilized to handle such learning tasks, but
convergence to optimal policies is not guaranteed for many of these
methods. Traditional supervised learning methods hold more assurances of
convergence, but these methods are not well suited for tasks where
desired actions in the output space of the learner, termed proximal
actions, are not available for training. Rather, target outputs from the
environment are distal from where the learning takes place. For example,
a child acquiring language who makes speech errors must learn to correct
them based on heard information that reaches his/her auditory cortex
which is distant from the motor cortical regions that control speech
output. While distal supervised learning techniques for neural networks
have been devised, it remains to be established how they can be trained
to produce sequences of proximal actions from only a single static
input. In this research, I develop an architecture which incorporates
recurrent multi-layered neural networks that possess some form of
history in the form of a context vector into the distal supervised
learning framework, enabling it to learn to generate correct proximal
sequences from single static input stimuli. This is in contrast to
existing distal learning methods designed for non-recurrent neural
network learners that utilize no concept of memory of their prior
behavior. Also, I adapt a technique in this research known as teacher
forcing for use in distal sequential learning settings which is shown to
result in more efficient usage of the recurrent neural network's context
layer. The effectiveness of my approach is demonstrated by applying it
to acquire varying length phoneme sequence generation behavior using
only previously heard and stored auditory phoneme sequences. The results
indicate that simple recurrent backpropagation networks can be
integrated with distal learning methods to create effective sequence
generators even when they do not constantly update current state
information
Improving Language Modelling with Noise-contrastive estimation
Neural language models do not scale well when the vocabulary is large.
Noise-contrastive estimation (NCE) is a sampling-based method that allows for
fast learning with large vocabularies. Although NCE has shown promising
performance in neural machine translation, it was considered to be an
unsuccessful approach for language modelling. A sufficient investigation of the
hyperparameters in the NCE-based neural language models was also missing. In
this paper, we showed that NCE can be a successful approach in neural language
modelling when the hyperparameters of a neural network are tuned appropriately.
We introduced the 'search-then-converge' learning rate schedule for NCE and
designed a heuristic that specifies how to use this schedule. The impact of the
other important hyperparameters, such as the dropout rate and the weight
initialisation range, was also demonstrated. We showed that appropriate tuning
of NCE-based neural language models outperforms the state-of-the-art
single-model methods on a popular benchmark
Feedback control by online learning an inverse model
A model, predictor, or error estimator is often used by a feedback controller to control a plant. Creating such a model is difficult when the plant exhibits nonlinear behavior. In this paper, a novel online learning control framework is proposed that does not require explicit knowledge about the plant. This framework uses two learning modules, one for creating an inverse model, and the other for actually controlling the plant. Except for their inputs, they are identical. The inverse model learns by the exploration performed by the not yet fully trained controller, while the actual controller is based on the currently learned model. The proposed framework allows fast online learning of an accurate controller. The controller can be applied on a broad range of tasks with different dynamic characteristics. We validate this claim by applying our control framework on several control tasks: 1) the heating tank problem (slow nonlinear dynamics); 2) flight pitch control (slow linear dynamics); and 3) the balancing problem of a double inverted pendulum (fast linear and nonlinear dynamics). The results of these experiments show that fast learning and accurate control can be achieved. Furthermore, a comparison is made with some classical control approaches, and observations concerning convergence and stability are made
Echo State Condition at the Critical Point
Recurrent networks with transfer functions that fulfill the Lipschitz
continuity with K=1 may be echo state networks if certain limitations on the
recurrent connectivity are applied. It has been shown that it is sufficient if
the largest singular value of the recurrent connectivity is smaller than 1. The
main achievement of this paper is a proof under which conditions the network is
an echo state network even if the largest singular value is one. It turns out
that in this critical case the exact shape of the transfer function plays a
decisive role in determining whether the network still fulfills the echo state
condition. In addition, several examples with one neuron networks are outlined
to illustrate effects of critical connectivity. Moreover, within the manuscript
a mathematical definition for a critical echo state network is suggested
Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling
Recurrent neural networks have shown remarkable success in modeling
sequences. However low resource situations still adversely affect the
generalizability of these models. We introduce a new family of models, called
Lattice Recurrent Units (LRU), to address the challenge of learning deep
multi-layer recurrent models with limited resources. LRU models achieve this
goal by creating distinct (but coupled) flow of information inside the units: a
first flow along time dimension and a second flow along depth dimension. It
also offers a symmetry in how information can flow horizontally and vertically.
We analyze the effects of decoupling three different components of our LRU
model: Reset Gate, Update Gate and Projected State. We evaluate this family on
new LRU models on computational convergence rates and statistical efficiency.
Our experiments are performed on four publicly-available datasets, comparing
with Grid-LSTM and Recurrent Highway networks. Our results show that LRU has
better empirical computational convergence rates and statistical efficiency
values, along with learning more accurate language models.Comment: 8 pages, 7 figure
- …