21 research outputs found
Mortality Prediction Models with Clinical Notes Using Sparse Attention at the Word and Sentence Levels
Intensive Care in-hospital mortality prediction has various clinical
applications. Neural prediction models, especially when capitalising on
clinical notes, have been put forward as improvement on currently existing
models. However, to be acceptable these models should be performant and
transparent. This work studies different attention mechanisms for clinical
neural prediction models in terms of their discrimination and calibration.
Specifically, we investigate sparse attention as an alternative to dense
attention weights in the task of in-hospital mortality prediction from clinical
notes. We evaluate the attention mechanisms based on: i) local self-attention
over words in a sentence, and ii) global self-attention with a transformer
architecture across sentences. We demonstrate that the sparse mechanism
approach outperforms the dense one for the local self-attention in terms of
predictive performance with a publicly available dataset, and puts higher
attention to prespecified relevant directive words. The performance at the
sentence level, however, deteriorates as sentences including the influential
directive words tend to be dropped all together.Comment: Technical Reports at the Department of Medical Informatics, Amsterdam
UMC, 2021. https://kik.amc.nl/KIK/reports/TR2021-01.pd
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
Scaling Up Multiagent Reinforcement Learning for Robotic Systems: Learn an Adaptive Sparse Communication Graph
The complexity of multiagent reinforcement learning (MARL) in multiagent
systems increases exponentially with respect to the agent number. This
scalability issue prevents MARL from being applied in large-scale multiagent
systems. However, one critical feature in MARL that is often neglected is that
the interactions between agents are quite sparse. Without exploiting this
sparsity structure, existing works aggregate information from all of the agents
and thus have a high sample complexity. To address this issue, we propose an
adaptive sparse attention mechanism by generalizing a sparsity-inducing
activation function. Then a sparse communication graph in MARL is learned by
graph neural networks based on this new attention mechanism. Through this
sparsity structure, the agents can communicate in an effective as well as
efficient way via only selectively attending to agents that matter the most and
thus the scale of the MARL problem is reduced with little optimality
compromised. Comparative results show that our algorithm can learn an
interpretable sparse structure and outperforms previous works by a significant
margin on applications involving a large-scale multiagent system
Not All Attention Is Needed: Gated Attention Network for Sequence Data
Although deep neural networks generally have fixed network structures, the
concept of dynamic mechanism has drawn more and more attention in recent years.
Attention mechanisms compute input-dependent dynamic attention weights for
aggregating a sequence of hidden states. Dynamic network configuration in
convolutional neural networks (CNNs) selectively activates only part of the
network at a time for different inputs. In this paper, we combine the two
dynamic mechanisms for text classification tasks. Traditional attention
mechanisms attend to the whole sequence of hidden states for an input sentence,
while in most cases not all attention is needed especially for long sequences.
We propose a novel method called Gated Attention Network (GA-Net) to
dynamically select a subset of elements to attend to using an auxiliary
network, and compute attention weights to aggregate the selected elements. It
avoids a significant amount of unnecessary computation on unattended elements,
and allows the model to pay attention to important parts of the sequence.
Experiments in various datasets show that the proposed method achieves better
performance compared with all baseline models with global or local attention
while requiring less computation and achieving better interpretability. It is
also promising to extend the idea to more complex attention-based models, such
as transformers and seq-to-seq models