110 research outputs found
A Neural Attention Model for Categorizing Patient Safety Events
Medical errors are leading causes of death in the US and as such, prevention
of these errors is paramount to promoting health care. Patient Safety Event
reports are narratives describing potential adverse events to the patients and
are important in identifying and preventing medical errors. We present a neural
network architecture for identifying the type of safety events which is the
first step in understanding these narratives. Our proposed model is based on a
soft neural attention model to improve the effectiveness of encoding long
sequences. Empirical results on two large-scale real-world datasets of patient
safety reports demonstrate the effectiveness of our method with significant
improvements over existing methods.Comment: ECIR 201
Exploiting Cognitive Structure for Adaptive Learning
Adaptive learning, also known as adaptive teaching, relies on learning path
recommendation, which sequentially recommends personalized learning items
(e.g., lectures, exercises) to satisfy the unique needs of each learner.
Although it is well known that modeling the cognitive structure including
knowledge level of learners and knowledge structure (e.g., the prerequisite
relations) of learning items is important for learning path recommendation,
existing methods for adaptive learning often separately focus on either
knowledge levels of learners or knowledge structure of learning items. To fully
exploit the multifaceted cognitive structure for learning path recommendation,
we propose a Cognitive Structure Enhanced framework for Adaptive Learning,
named CSEAL. By viewing path recommendation as a Markov Decision Process and
applying an actor-critic algorithm, CSEAL can sequentially identify the right
learning items to different learners. Specifically, we first utilize a
recurrent neural network to trace the evolving knowledge levels of learners at
each learning step. Then, we design a navigation algorithm on the knowledge
structure to ensure the logicality of learning paths, which reduces the search
space in the decision process. Finally, the actor-critic algorithm is used to
determine what to learn next and whose parameters are dynamically updated along
the learning path. Extensive experiments on real-world data demonstrate the
effectiveness and robustness of CSEAL.Comment: Accepted by KDD 2019 Research Track. In Proceedings of the 25th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19
An investigation of a deep learning based malware detection system
We investigate a Deep Learning based system for malware detection. In the
investigation, we experiment with different combination of Deep Learning
architectures including Auto-Encoders, and Deep Neural Networks with varying
layers over Malicia malware dataset on which earlier studies have obtained an
accuracy of (98%) with an acceptable False Positive Rates (1.07%). But these
results were done using extensive man-made custom domain features and investing
corresponding feature engineering and design efforts. In our proposed approach,
besides improving the previous best results (99.21% accuracy and a False
Positive Rate of 0.19%) indicates that Deep Learning based systems could
deliver an effective defense against malware. Since it is good in automatically
extracting higher conceptual features from the data, Deep Learning based
systems could provide an effective, general and scalable mechanism for
detection of existing and unknown malware.Comment: 13 Pages, 4 figure
A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion
Users may strive to formulate an adequate textual query for their information
need. Search engines assist the users by presenting query suggestions. To
preserve the original search intent, suggestions should be context-aware and
account for the previous queries issued by the user. Achieving context
awareness is challenging due to data sparsity. We present a probabilistic
suggestion model that is able to account for sequences of previous queries of
arbitrary lengths. Our novel hierarchical recurrent encoder-decoder
architecture allows the model to be sensitive to the order of queries in the
context while avoiding data sparsity. Additionally, our model can suggest for
rare, or long-tail, queries. The produced suggestions are synthetic and are
sampled one word at a time, using computationally cheap decoding techniques.
This is in contrast to current synthetic suggestion models relying upon machine
learning pipelines and hand-engineered feature sets. Results show that it
outperforms existing context-aware approaches in a next query prediction
setting. In addition to query suggestion, our model is general enough to be
used in a variety of other applications.Comment: To appear in Conference of Information Knowledge and Management
(CIKM) 201
Powerpropagation: A sparsity inducing weight reparameterisation
The training of sparse neural networks is becoming an increasingly important tool
for reducing the computational footprint of models at training and evaluation, as
well enabling the effective scaling up of models. Whereas much work over the
years has been dedicated to specialised pruning techniques, little attention has
been paid to the inherent effect of gradient based training on model sparsity. In
this work, we introduce Powerpropagation, a new weight-parameterisation for
neural networks that leads to inherently sparse models. Exploiting the behaviour
of gradient descent, our method gives rise to weight updates exhibiting a “rich get
richer” dynamic, leaving low-magnitude parameters largely unaffected by learning.
Models trained in this manner exhibit similar performance, but have a distribution
with markedly higher density at zero, allowing more parameters to be pruned safely.
Powerpropagation is general, intuitive, cheap and straight-forward to implement
and can readily be combined with various other techniques. To highlight its versatility, we explore it in two very different settings: Firstly, following a recent
line of work, we investigate its effect on sparse training for resource-constrained
settings. Here, we combine Powerpropagation with a traditional weight-pruning
technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing
superior performance on the ImageNet benchmark. Secondly, we advocate the use
of sparsity in overcoming catastrophic forgetting, where compressed representations allow accommodating a large number of tasks at fixed model capacity. In all
cases our reparameterisation considerably increases the efficacy of the off-the-shelf
methods
On the Role of Optimization in Double Descent: A Least Squares Study
Empirically it has been observed that the performance of deep neural networks
steadily improves as we increase model size, contradicting the classical view
on overfitting and generalization. Recently, the double descent phenomena has
been proposed to reconcile this observation with theory, suggesting that the
test error has a second descent when the model becomes sufficiently
overparameterized, as the model size itself acts as an implicit regularizer. In
this paper we add to the growing body of work in this space, providing a
careful study of learning dynamics as a function of model size for the least
squares scenario. We show an excess risk bound for the gradient descent
solution of the least squares objective. The bound depends on the smallest
non-zero eigenvalue of the covariance matrix of the input features, via a
functional form that has the double descent behavior. This gives a new
perspective on the double descent curves reported in the literature. Our
analysis of the excess risk allows to decouple the effect of optimization and
generalization error. In particular, we find that in case of noiseless
regression, double descent is explained solely by optimization-related
quantities, which was missed in studies focusing on the Moore-Penrose
pseudoinverse solution. We believe that our derivation provides an alternative
view compared to existing work, shedding some light on a possible cause of this
phenomena, at least in the considered least squares setting. We empirically
explore if our predictions hold for neural networks, in particular whether the
covariance of intermediary hidden activations has a similar behavior as the one
predicted by our derivations
A convolution BiLSTM neural network model for Chinese event extraction
Chinese event extraction is a challenging task in information extraction. Previous approaches highly depend on sophisticated feature engineering and complicated natural language processing (NLP) tools. In this paper, we first come up with the language specific issue in Chinese event extraction, and then propose a convolution bidirectional LSTM neural network that combines LSTM and CNN to capture both sentence-level and lexical information without any hand-craft features. Experiments on ACE 2005 dataset show that our approaches can achieve competitive performances in both trigger labeling and argument role labeling
Dynamic Key-Value Memory Networks for Knowledge Tracing
Knowledge Tracing (KT) is a task of tracing evolving knowledge state of
students with respect to one or more concepts as they engage in a sequence of
learning activities. One important purpose of KT is to personalize the practice
sequence to help students learn knowledge concepts efficiently. However,
existing methods such as Bayesian Knowledge Tracing and Deep Knowledge Tracing
either model knowledge state for each predefined concept separately or fail to
pinpoint exactly which concepts a student is good at or unfamiliar with. To
solve these problems, this work introduces a new model called Dynamic Key-Value
Memory Networks (DKVMN) that can exploit the relationships between underlying
concepts and directly output a student's mastery level of each concept. Unlike
standard memory-augmented neural networks that facilitate a single memory
matrix or two static memory matrices, our model has one static matrix called
key, which stores the knowledge concepts and the other dynamic matrix called
value, which stores and updates the mastery levels of corresponding concepts.
Experiments show that our model consistently outperforms the state-of-the-art
model in a range of KT datasets. Moreover, the DKVMN model can automatically
discover underlying concepts of exercises typically performed by human
annotations and depict the changing knowledge state of a student.Comment: To appear in 26th International Conference on World Wide Web (WWW),
201
Visual Depth Mapping from Monocular Images using Recurrent Convolutional Neural Networks
A reliable sense-and-avoid system is critical to enabling safe autonomous
operation of unmanned aircraft. Existing sense-and-avoid methods often require
specialized sensors that are too large or power intensive for use on small
unmanned vehicles. This paper presents a method to estimate object distances
based on visual image sequences, allowing for the use of low-cost, on-board
monocular cameras as simple collision avoidance sensors. We present a deep
recurrent convolutional neural network and training method to generate depth
maps from video sequences. Our network is trained using simulated camera and
depth data generated with Microsoft's AirSim simulator. Empirically, we show
that our model achieves superior performance compared to models generated using
prior methods.We further demonstrate that the method can be used for
sense-and-avoid of obstacles in simulation
Deep Learning of Representations: Looking Forward
Deep learning research aims at discovering learning algorithms that discover
multiple levels of distributed representations, with higher levels representing
more abstract concepts. Although the study of deep learning has already led to
impressive theoretical results, learning algorithms and breakthrough
experiments, several challenges lie ahead. This paper proposes to examine some
of these challenges, centering on the questions of scaling deep learning
algorithms to much larger models and datasets, reducing optimization
difficulties due to ill-conditioning or local minima, designing more efficient
and powerful inference and sampling procedures, and learning to disentangle the
factors of variation underlying the observed data. It also proposes a few
forward-looking research directions aimed at overcoming these challenges
- …