3 research outputs found
Early Improving Recurrent Elastic Highway Network
To model time-varying nonlinear temporal dynamics in sequential data, a
recurrent network capable of varying and adjusting the recurrence depth between
input intervals is examined. The recurrence depth is extended by several
intermediate hidden state units, and the weight parameters involved in
determining these units are dynamically calculated. The motivation behind the
paper lies on overcoming a deficiency in Recurrent Highway Networks and
improving their performances which are currently at the forefront of RNNs: 1)
Determining the appropriate number of recurrent depth in RHN for different
tasks is a huge burden and just setting it to a large number is computationally
wasteful with possible repercussion in terms of performance degradation and
high latency. Expanding on the idea of adaptive computation time (ACT), with
the use of an elastic gate in the form of a rectified exponentially decreasing
function taking on as arguments as previous hidden state and input, the
proposed model is able to evaluate the appropriate recurrent depth for each
input. The rectified gating function enables the most significant intermediate
hidden state updates to come early such that significant performance gain is
achieved early. 2) Updating the weights from that of previous intermediate
layer offers a richer representation than the use of shared weights across all
intermediate recurrence layers. The weight update procedure is just an
expansion of the idea underlying hypernetworks. To substantiate the
effectiveness of the proposed network, we conducted three experiments:
regression on synthetic data, human activity recognition, and language modeling
on the Penn Treebank dataset. The proposed networks showed better performance
than other state-of-the-art recurrent networks in all three experiments.Comment: 9 pages, 3 figure
Layer Flexible Adaptive Computational Time for Recurrent Neural Networks
Deep recurrent neural networks perform well on sequence data and are the
model of choice. It is a daunting task to decide the number of layers,
especially considering different computational needs for tasks within a
sequence of different difficulties. We propose a layer flexible recurrent
neural network with adaptive computation time, and expand it to a sequence to
sequence model. Contrary to the adaptive computation time model, our model has
a dynamic number of transmission states which vary by step and sequence. We
evaluate the model on a financial data set and Wikipedia language modeling.
Experimental results show the performance improvement of 8% to 12% and indicate
the model's ability to dynamically change the number of layers
Relaxed Parameter Sharing: Effectively Modeling Time-Varying Relationships in Clinical Time-Series
Recurrent neural networks (RNNs) are commonly applied to clinical time-series
data with the goal of learning patient risk stratification models. Their
effectiveness is due, in part, to their use of parameter sharing over time
(i.e., cells are repeated hence the name recurrent). We hypothesize, however,
that this trait also contributes to the increased difficulty such models have
with learning relationships that change over time. Conditional shift, i.e.,
changes in the relationship between the input X and the output y, arises when
risk factors associated with the event of interest change over the course of a
patient admission. While in theory, RNNs and gated RNNs (e.g., LSTMs) in
particular should be capable of learning time-varying relationships, when
training data are limited, such models often fail to accurately capture these
dynamics. We illustrate the advantages and disadvantages of complete parameter
sharing (RNNs) by comparing an LSTM with shared parameters to a sequential
architecture with time-varying parameters on prediction tasks involving three
clinically-relevant outcomes: acute respiratory failure (ARF), shock, and
in-hospital mortality. In experiments using synthetic data, we demonstrate how
parameter sharing in LSTMs leads to worse performance in the presence of
conditional shift. To improve upon the dichotomy between complete parameter
sharing and no parameter sharing, we propose a novel RNN formulation based on a
mixture model in which we relax parameter sharing over time. The proposed
method outperforms standard LSTMs and other state-of-the-art baselines across
all tasks. In settings with limited data, relaxed parameter sharing can lead to
improved patient risk stratification performance.Comment: Machine Learning for Healthcare 201