2,892 research outputs found
Learning with Interpretable Structure from Gated RNN
The interpretability of deep learning models has raised extended attention
these years. It will be beneficial if we can learn an interpretable structure
from deep learning models. In this paper, we focus on Recurrent Neural
Networks~(RNNs) especially gated RNNs whose inner mechanism is still not
clearly understood. We find that Finite State Automaton~(FSA) that processes
sequential data has more interpretable inner mechanism according to the
definition of interpretability and can be learned from RNNs as the
interpretable structure. We propose two methods to learn FSA from RNN based on
two different clustering methods. With the learned FSA and via experiments on
artificial and real datasets, we find that FSA is more trustable than the RNN
from which it learned, which gives FSA a chance to substitute RNNs in
applications involving humans' lives or dangerous facilities. Besides, we
analyze how the number of gates affects the performance of RNN. Our result
suggests that gate in RNN is important but the less the better, which could be
a guidance to design other RNNs. Finally, we observe that the FSA learned from
RNN gives semantic aggregated states and its transition graph shows us a very
interesting vision of how RNNs intrinsically handle text classification tasks
Recurrent Additive Networks
We introduce recurrent additive networks (RANs), a new gated RNN which is
distinguished by the use of purely additive latent state updates. At every time
step, the new state is computed as a gated component-wise sum of the input and
the previous state, without any of the non-linearities commonly used in RNN
transition dynamics. We formally show that RAN states are weighted sums of the
input vectors, and that the gates only contribute to computing the weights of
these sums. Despite this relatively simple functional form, experiments
demonstrate that RANs perform on par with LSTMs on benchmark language modeling
problems. This result shows that many of the non-linear computations in LSTMs
and related networks are not essential, at least for the problems we consider,
and suggests that the gates are doing more of the computational work than
previously understood
Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record
The wide implementation of electronic health record (EHR) systems facilitates
the collection of large-scale health data from real clinical settings. Despite
the significant increase in adoption of EHR systems, this data remains largely
unexplored, but presents a rich data source for knowledge discovery from
patient health histories in tasks such as understanding disease correlations
and predicting health outcomes. However, the heterogeneity, sparsity, noise,
and bias in this data present many complex challenges. This complexity makes it
difficult to translate potentially relevant information into machine learning
algorithms. In this paper, we propose a computational framework, Patient2Vec,
to learn an interpretable deep representation of longitudinal EHR data which is
personalized for each patient. To evaluate this approach, we apply it to the
prediction of future hospitalizations using real EHR data and compare its
predictive performance with baseline methods. Patient2Vec produces a vector
space with meaningful structure and it achieves an AUC around 0.799
outperforming baseline methods. In the end, the learned feature importance can
be visualized and interpreted at both the individual and population levels to
bring clinical insights.Comment: Accepted by IEEE Acces
Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis
The past decade has seen an explosion in the amount of digital information
stored in electronic health records (EHR). While primarily designed for
archiving patient clinical information and administrative healthcare tasks,
many researchers have found secondary use of these records for various clinical
informatics tasks. Over the same period, the machine learning community has
seen widespread advances in deep learning techniques, which also have been
successfully applied to the vast amount of EHR data. In this paper, we review
these deep EHR systems, examining architectures, technical aspects, and
clinical applications. We also identify shortcomings of current techniques and
discuss avenues of future research for EHR-based deep learning.Comment: Accepted for publication with Journal of Biomedical and Health
Informatics: http://ieeexplore.ieee.org/abstract/document/8086133
Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks
In this paper, we propose an additionsubtraction twin-gated recurrent network
(ATR) to simplify neural machine translation. The recurrent units of ATR are
heavily simplified to have the smallest number of weight matrices among units
of all existing gated RNNs. With the simple addition and subtraction operation,
we introduce a twin-gated mechanism to build input and forget gates which are
highly correlated. Despite this simplification, the essential non-linearities
and capability of modeling long-distance dependencies are preserved.
Additionally, the proposed ATR is more transparent than LSTM/GRU due to the
simplification. Forward self-attention can be easily established in ATR, which
makes the proposed network interpretable. Experiments on WMT14 translation
tasks demonstrate that ATR-based neural machine translation can yield
competitive performance on English- German and English-French language pairs in
terms of both translation quality and speed. Further experiments on NIST
Chinese-English translation, natural language inference and Chinese word
segmentation verify the generality and applicability of ATR on different
natural language processing tasks.Comment: EMNLP 2018, long paper, source code release
Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks
A variety of real-world processes (over networks) produce sequences of data
whose complex temporal dynamics need to be studied. More especially, the event
timestamps can carry important information about the underlying network
dynamics, which otherwise are not available from the time-series evenly sampled
from continuous signals. Moreover, in most complex processes, event sequences
and evenly-sampled times series data can interact with each other, which
renders joint modeling of those two sources of data necessary. To tackle the
above problems, in this paper, we utilize the rich framework of (temporal)
point processes to model event data and timely update its intensity function by
the synergic twin Recurrent Neural Networks (RNNs). In the proposed
architecture, the intensity function is synergistically modulated by one RNN
with asynchronous events as input and another RNN with time series as input.
Furthermore, to enhance the interpretability of the model, the attention
mechanism for the neural point process is introduced. The whole model with
event type and timestamp prediction output layers can be trained end-to-end and
allows a black-box treatment for modeling the intensity. We substantiate the
superiority of our model in synthetic data and three real-world benchmark
datasets.Comment: 14 page
MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks
We introduce MinimalRNN, a new recurrent neural network architecture that
achieves comparable performance as the popular gated RNNs with a simplified
structure. It employs minimal updates within RNN, which not only leads to
efficient learning and testing but more importantly better interpretability and
trainability. We demonstrate that by endorsing the more restrictive update
rule, MinimalRNN learns disentangled RNN states. We further examine the
learning dynamics of different RNN structures using input-output Jacobians, and
show that MinimalRNN is able to capture longer range dependencies than existing
RNN architectures.Comment: Presented at NIPS 2017 Symposium on Interpretable Machine Learnin
Learning Noun Cases Using Sequential Neural Networks
Morphological declension, which aims to inflect nouns to indicate number,
case and gender, is an important task in natural language processing (NLP).
This research proposal seeks to address the degree to which Recurrent Neural
Networks (RNNs) are efficient in learning to decline noun cases. Given the
challenge of data sparsity in processing morphologically rich languages and
also, the flexibility of sentence structures in such languages, we believe that
modeling morphological dependencies can improve the performance of neural
network models. It is suggested to carry out various experiments to understand
the interpretable features that may lead to a better generalization of the
learned models on cross-lingual tasks.Comment: 3 pages research proposa
SESA: Supervised Explicit Semantic Analysis
In recent years supervised representation learning has provided state of the
art or close to the state of the art results in semantic analysis tasks
including ranking and information retrieval. The core idea is to learn how to
embed items into a latent space such that they optimize a supervised objective
in that latent space. The dimensions of the latent space have no clear
semantics, and this reduces the interpretability of the system. For example, in
personalization models, it is hard to explain why a particular item is ranked
high for a given user profile. We propose a novel model of representation
learning called Supervised Explicit Semantic Analysis (SESA) that is trained in
a supervised fashion to embed items to a set of dimensions with explicit
semantics. The model learns to compare two objects by representing them in this
explicit space, where each dimension corresponds to a concept from a knowledge
base. This work extends Explicit Semantic Analysis (ESA) with a supervised
model for ranking problems. We apply this model to the task of Job-Profile
relevance in LinkedIn in which a set of skills defines our explicit dimensions
of the space. Every profile and job are encoded to this set of skills their
similarity is calculated in this space. We use RNNs to embed text input into
this space. In addition to interpretability, our model makes use of the
web-scale collaborative skills data that is provided by users for each LinkedIn
profile. Our model provides state of the art result while it remains
interpretable
SAM: Semantic Attribute Modulation for Language Modeling and Style Variation
This paper presents a Semantic Attribute Modulation (SAM) for language
modeling and style variation. The semantic attribute modulation includes
various document attributes, such as titles, authors, and document categories.
We consider two types of attributes, (title attributes and category
attributes), and a flexible attribute selection scheme by automatically scoring
them via an attribute attention mechanism. The semantic attributes are embedded
into the hidden semantic space as the generation inputs. With the attributes
properly harnessed, our proposed SAM can generate interpretable texts with
regard to the input attributes. Qualitative analysis, including word semantic
analysis and attention values, shows the interpretability of SAM. On several
typical text datasets, we empirically demonstrate the superiority of the
Semantic Attribute Modulated language model with different combinations of
document attributes. Moreover, we present a style variation for the lyric
generation using SAM, which shows a strong connection between the style
variation and the semantic attributes
- …