16,167 research outputs found
A Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property
We present an architecture of a recurrent neural network (RNN) with a
fully-connected deep neural network (DNN) as its feature extractor. The RNN is
equipped with both causal temporal prediction and non-causal look-ahead, via
auto-regression (AR) and moving-average (MA), respectively. The focus of this
paper is a primal-dual training method that formulates the learning of the RNN
as a formal optimization problem with an inequality constraint that provides a
sufficient condition for the stability of the network dynamics. Experimental
results demonstrate the effectiveness of this new method, which achieves 18.86%
phone recognition error on the TIMIT benchmark for the core test set. The
result approaches the best result of 17.7%, which was obtained by using RNN
with long short-term memory (LSTM). The results also show that the proposed
primal-dual training method produces lower recognition errors than the popular
RNN methods developed earlier based on the carefully tuned threshold parameter
that heuristically prevents the gradient from exploding
Effective Representations of Clinical Notes
Clinical notes are a rich source of information about patient state. However,
using them to predict clinical events with machine learning models is
challenging. They are very high dimensional, sparse and have complex structure.
Furthermore, training data is often scarce because it is expensive to obtain
reliable labels for many clinical events. These difficulties have traditionally
been addressed by manual feature engineering encoding task specific domain
knowledge. We explored the use of neural networks and transfer learning to
learn representations of clinical notes that are useful for predicting future
clinical events of interest, such as all causes mortality, inpatient
admissions, and emergency room visits. Our data comprised 2.7 million notes and
115 thousand patients at Stanford Hospital. We used the learned
representations, along with commonly used bag of words and topic model
representations, as features for predictive models of clinical events. We
evaluated the effectiveness of these representations with respect to the
performance of the models trained on small datasets. Models using the neural
network derived representations performed significantly better than models
using the baseline representations with small () training datasets.
The learned representations offer significant performance gains over commonly
used baseline representations for a range of predictive modeling tasks and
cohort sizes, offering an effective alternative to task specific feature
engineering when plentiful labeled training data is not available
Deep Learning with the Random Neural Network and its Applications
The random neural network (RNN) is a mathematical model for an "integrate and
fire" spiking network that closely resembles the stochastic behaviour of
neurons in mammalian brains. Since its proposal in 1989, there have been
numerous investigations into the RNN's applications and learning algorithms.
Deep learning (DL) has achieved great success in machine learning. Recently,
the properties of the RNN for DL have been investigated, in order to combine
their power. Recent results demonstrate that the gap between RNNs and DL can be
bridged and the DL tools based on the RNN are faster and can potentially be
used with less energy expenditure than existing methods.Comment: 23 pages, 19 figure
Deep Frank-Wolfe For Neural Network Optimization
Learning a deep neural network requires solving a challenging optimization
problem: it is a high-dimensional, non-convex and non-smooth minimization
problem with a large number of terms. The current practice in neural network
optimization is to rely on the stochastic gradient descent (SGD) algorithm or
its adaptive variants. However, SGD requires a hand-designed schedule for the
learning rate. In addition, its adaptive variants tend to produce solutions
that generalize less well on unseen data than SGD with a hand-designed
schedule. We present an optimization method that offers empirically the best of
both worlds: our algorithm yields good generalization performance while
requiring only one hyper-parameter. Our approach is based on a composite
proximal framework, which exploits the compositional nature of deep neural
networks and can leverage powerful convex optimization algorithms by design.
Specifically, we employ the Frank-Wolfe (FW) algorithm for SVM, which computes
an optimal step-size in closed-form at each time-step. We further show that the
descent direction is given by a simple backward pass in the network, yielding
the same computational cost per iteration as SGD. We present experiments on the
CIFAR and SNLI data sets, where we demonstrate the significant superiority of
our method over Adam, Adagrad, as well as the recently proposed BPGrad and
AMSGrad. Furthermore, we compare our algorithm to SGD with a hand-designed
learning rate schedule, and show that it provides similar generalization while
converging faster. The code is publicly available at
https://github.com/oval-group/dfw.Comment: Published as a conference paper at ICLR 201
Combining Neural Networks and Log-linear Models to Improve Relation Extraction
The last decade has witnessed the success of the traditional feature-based
method on exploiting the discrete structures such as words or lexical patterns
to extract relations from text. Recently, convolutional and recurrent neural
networks has provided very effective mechanisms to capture the hidden
structures within sentences via continuous representations, thereby
significantly advancing the performance of relation extraction. The advantage
of convolutional neural networks is their capacity to generalize the
consecutive k-grams in the sentences while recurrent neural networks are
effective to encode long ranges of sentence context. This paper proposes to
combine the traditional feature-based method, the convolutional and recurrent
neural networks to simultaneously benefit from their advantages. Our systematic
evaluation of different network architectures and combination methods
demonstrates the effectiveness of this approach and results in the
state-of-the-art performance on the ACE 2005 and SemEval dataset
Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition
Encoder-decoder models have become an effective approach for sequence
learning tasks like machine translation, image captioning and speech
recognition, but have yet to show competitive results for handwritten text
recognition. To this end, we propose an attention-based sequence-to-sequence
model. It combines a convolutional neural network as a generic feature
extractor with a recurrent neural network to encode both the visual
information, as well as the temporal context between characters in the input
image, and uses a separate recurrent neural network to decode the actual
character sequence. We make experimental comparisons between various attention
mechanisms and positional encodings, in order to find an appropriate alignment
between the input and output sequence. The model can be trained end-to-end and
the optional integration of a hybrid loss allows the encoder to retain an
interpretable and usable output, if desired. We achieve competitive results on
the IAM and ICFHR2016 READ data sets compared to the state-of-the-art without
the use of a language model, and we significantly improve over any recent
sequence-to-sequence approaches.Comment: 8 pages, 1 figure, 8 table
Identify Susceptible Locations in Medical Records via Adversarial Attacks on Deep Predictive Models
The surging availability of electronic medical records (EHR) leads to
increased research interests in medical predictive modeling. Recently many deep
learning based predicted models are also developed for EHR data and
demonstrated impressive performance. However, a series of recent studies showed
that these deep models are not safe: they suffer from certain vulnerabilities.
In short, a well-trained deep network can be extremely sensitive to inputs with
negligible changes. These inputs are referred to as adversarial examples. In
the context of medical informatics, such attacks could alter the result of a
high performance deep predictive model by slightly perturbing a patient's
medical records. Such instability not only reflects the weakness of deep
architectures, more importantly, it offers guide on detecting susceptible parts
on the inputs. In this paper, we propose an efficient and effective framework
that learns a time-preferential minimum attack targeting the LSTM model with
EHR inputs, and we leverage this attack strategy to screen medical records of
patients and identify susceptible events and measurements. The efficient
screening procedure can assist decision makers to pay extra attentions to the
locations that can cause severe consequence if not measured correctly. We
conduct extensive empirical studies on a real-world urgent care cohort and
demonstrate the effectiveness of the proposed screening approach
Visual Attention Model for Cross-sectional Stock Return Prediction and End-to-End Multimodal Market Representation Learning
Technical and fundamental analysis are traditional tools used to analyze
individual stocks; however, the finance literature has shown that the price
movement of each individual stock correlates heavily with other stocks,
especially those within the same sector. In this paper we propose a general
purpose market representation that incorporates fundamental and technical
indicators and relationships between individual stocks. We treat the daily
stock market as a "market image" where rows (grouped by market sector)
represent individual stocks and columns represent indicators. We apply a
convolutional neural network over this market image to build market features in
a hierarchical way. We use a recurrent neural network, with an attention
mechanism over the market feature maps, to model temporal dynamics in the
market. We show that our proposed model outperforms strong baselines in both
short-term and long-term stock return prediction tasks. We also show another
use for our market image: to construct concise and dense market embeddings
suitable for downstream prediction tasks.Comment: Accepted as full paper in the 32nd International FLAIRS Conferenc
FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs
It is well known that many types of artificial neural networks, including
recurrent networks, can achieve a high classification accuracy even with
low-precision weights and activations. The reduction in precision generally
yields much more efficient hardware implementations in regards to hardware
cost, memory requirements, energy, and achievable throughput. In this paper, we
present the first systematic exploration of this design space as a function of
precision for Bidirectional Long Short-Term Memory (BiLSTM) neural network.
Specifically, we include an in-depth investigation of precision vs. accuracy
using a fully hardware-aware training flow, where during training quantization
of all aspects of the network including weights, input, output and in-memory
cell activations are taken into consideration. In addition, hardware resource
cost, power consumption and throughput scalability are explored as a function
of precision for FPGA-based implementations of BiLSTM, and multiple approaches
of parallelizing the hardware. We provide the first open source HLS library
extension of FINN for parameterizable hardware architectures of LSTM layers on
FPGAs which offers full precision flexibility and allows for parameterizable
performance scaling offering different levels of parallelism within the
architecture. Based on this library, we present an FPGA-based accelerator for
BiLSTM neural network designed for optical character recognition, along with
numerous other experimental proof points for a Zynq UltraScale+ XCZU7EV MPSoC
within the given design space.Comment: Accepted for publication, 28th International Conference on Field
Programmable Logic and Applications (FPL), August, 2018, Dublin, Irelan
Learning Simpler Language Models with the Differential State Framework
Learning useful information across long time lags is a critical and difficult
problem for temporal neural models in tasks such as language modeling. Existing
architectures that address the issue are often complex and costly to train. The
Differential State Framework (DSF) is a simple and high-performing design that
unifies previously introduced gated neural models. DSF models maintain
longer-term memory by learning to interpolate between a fast-changing
data-driven representation and a slowly changing, implicitly stable state. This
requires hardly any more parameters than a classical, simple recurrent network.
Within the DSF framework, a new architecture is presented, the Delta-RNN. In
language modeling at the word and character levels, the Delta-RNN outperforms
popular complex architectures, such as the Long Short Term Memory (LSTM) and
the Gated Recurrent Unit (GRU), and, when regularized, performs comparably to
several state-of-the-art baselines. At the subword level, the Delta-RNN's
performance is comparable to that of complex gated architectures.Comment: Edits/revisions applied throughout documen
- …