34 research outputs found
Deep Learning for Inflexible Multi-Asset Hedging of incomplete market
Models trained under assumptions in the complete market usually don't take
effect in the incomplete market. This paper solves the hedging problem in
incomplete market with three sources of incompleteness: risk factor,
illiquidity, and discrete transaction dates. A new jump-diffusion model is
proposed to describe stochastic asset prices. Three neutral networks, including
RNN, LSTM, Mogrifier-LSTM are used to attain hedging strategies with MSE Loss
and Huber Loss implemented and compared.As a result, Mogrifier-LSTM is the
fastest model with the best results under MSE and Huber Loss
CCheXR-Attention: Clinical concept extraction and chest x-ray reports classification using modified Mogrifier and bidirectional LSTM with multihead attention
Radiology reports cover different aspects, from radiological observation to the diagnosis of an imaging examination, such as X-rays, MRI, and CT scans. Abundant patient information presented in radiology reports poses a few major challenges. First, radiology reports follow a free-text reporting format, which causes the loss of a large amount of information in unstructured text. Second, the extraction of important features from these reports is a huge bottleneck for machine learning models. These challenges are important, particularly the extraction of key features such as symptoms, comparison/priors, technique, finding, and impression because they facilitate the decision-making on patients’ health. To alleviate this issue, a novel architecture CCheXR-Attention is proposed to extract the clinical features from the radiological reports and classify each report into normal and abnormal categories based on the extracted information. We have proposed a modified mogrifier LSTM model and integrated a multihead attention method to extract the more relevant features. Experimental outcomes on two benchmark datasets demonstrated that the proposed model surpassed state-of-the-art models
Vision Transformer Based Model for Describing a Set of Images as a Story
Visual Story-Telling is the process of forming a multi-sentence story from a
set of images. Appropriately including visual variation and contextual
information captured inside the input images is one of the most challenging
aspects of visual storytelling. Consequently, stories developed from a set of
images often lack cohesiveness, relevance, and semantic relationship. In this
paper, we propose a novel Vision Transformer Based Model for describing a set
of images as a story. The proposed method extracts the distinct features of the
input images using a Vision Transformer (ViT). Firstly, input images are
divided into 16X16 patches and bundled into a linear projection of flattened
patches. The transformation from a single image to multiple image patches
captures the visual variety of the input visual patterns. These features are
used as input to a Bidirectional-LSTM which is part of the sequence encoder.
This captures the past and future image context of all image patches. Then, an
attention mechanism is implemented and used to increase the discriminatory
capacity of the data fed into the language model, i.e. a Mogrifier-LSTM. The
performance of our proposed model is evaluated using the Visual Story-Telling
dataset (VIST), and the results show that our model outperforms the current
state of the art models.Comment: This paper has been accepted at the 35th Australasian Joint
Conference on Artificial Intelligence 2022 (Camera-ready version is attached
Gates Are Not What You Need in RNNs
Recurrent neural networks have flourished in many areas. Consequently, we can
see new RNN cells being developed continuously, usually by creating or using
gates in a new, original way. But what if we told you that gates in RNNs are
redundant? In this paper, we propose a new recurrent cell called Residual
Recurrent Unit (RRU) which beats traditional cells and does not employ a single
gate. It is based on the residual shortcut connection, linear transformations,
ReLU, and normalization. To evaluate our cell's effectiveness, we compare its
performance against the widely-used GRU and LSTM cells and the recently
proposed Mogrifier LSTM on several tasks including, polyphonic music modeling,
language modeling, and sentiment analysis. Our experiments show that RRU
outperforms the traditional gated units on most of these tasks. Also, it has
better robustness to parameter selection, allowing immediate application in new
tasks without much tuning. We have implemented the RRU in TensorFlow, and the
code is made available at https://github.com/LUMII-Syslab/RRU .Comment: Published in Artificial Intelligence and Soft Computing. ICAISC 2023.
Lecture Notes in Computer Science(), vol 14125. Springer, Cham., and is
available online at https://doi.org/10.1007/978-3-031-42505-9_2
Circling Back to Recurrent Models of Language
Just because some purely recurrent models suffer from being hard to optimize
and inefficient on today's hardware, they are not necessarily bad models of
language. We demonstrate this by the extent to which these models can still be
improved by a combination of a slightly better recurrent cell, architecture,
objective, as well as optimization. In the process, we establish a new state of
the art for language modelling on small datasets and on Enwik8 with dynamic
evaluation
Long Distance Relationships without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling
In sequence learning tasks such as language modelling, Recurrent Neural
Networks must learn relationships between input features separated by time.
State of the art models such as LSTM and Transformer are trained by
backpropagation of losses into prior hidden states and inputs held in memory.
This allows gradients to flow from present to past and effectively learn with
perfect hindsight, but at a significant memory cost. In this paper we show that
it is possible to train high performance recurrent networks using information
that is local in time, and thereby achieve a significantly reduced memory
footprint. We describe a predictive autoencoder called bRSM featuring recurrent
connections, sparse activations, and a boosting rule for improved cell
utilization. The architecture demonstrates near optimal performance on a
non-deterministic (stochastic) partially-observable sequence learning task
consisting of high-Markov-order sequences of MNIST digits. We find that this
model learns these sequences faster and more completely than an LSTM, and offer
several possible explanations why the LSTM architecture might struggle with the
partially observable sequence structure in this task. We also apply our model
to a next word prediction task on the Penn Treebank (PTB) dataset. We show that
a 'flattened' RSM network, when paired with a modern semantic word embedding
and the addition of boosting, achieves 103.5 PPL (a 20-point improvement over
the best N-gram models), beating ordinary RNNs trained with BPTT and
approaching the scores of early LSTM implementations. This work provides
encouraging evidence that strong results on challenging tasks such as language
modelling may be possible using less memory intensive, biologically-plausible
training regimes.Comment: 9 pages, 6 figures, 4 table