688 research outputs found
TandemNet: Distilling Knowledge from Medical Images Using Diagnostic Reports as Optional Semantic References
In this paper, we introduce the semantic knowledge of medical images from
their diagnostic reports to provide an inspirational network training and an
interpretable prediction mechanism with our proposed novel multimodal neural
network, namely TandemNet. Inside TandemNet, a language model is used to
represent report text, which cooperates with the image model in a tandem
scheme. We propose a novel dual-attention model that facilitates high-level
interactions between visual and semantic information and effectively distills
useful features for prediction. In the testing stage, TandemNet can make
accurate image prediction with an optional report text input. It also
interprets its prediction by producing attention on the image and text
informative feature pieces, and further generating diagnostic report
paragraphs. Based on a pathological bladder cancer images and their diagnostic
reports (BCIDR) dataset, sufficient experiments demonstrate that our method
effectively learns and integrates knowledge from multimodalities and obtains
significantly improved performance than comparing baselines.Comment: MICCAI2017 Ora
How did the discussion go: Discourse act classification in social media conversations
We propose a novel attention based hierarchical LSTM model to classify
discourse act sequences in social media conversations, aimed at mining data
from online discussion using textual meanings beyond sentence level. The very
uniqueness of the task is the complete categorization of possible pragmatic
roles in informal textual discussions, contrary to extraction of
question-answers, stance detection or sarcasm identification which are very
much role specific tasks. Early attempt was made on a Reddit discussion
dataset. We train our model on the same data, and present test results on two
different datasets, one from Reddit and one from Facebook. Our proposed model
outperformed the previous one in terms of domain independence; without using
platform-dependent structural features, our hierarchical LSTM with word
relevance attention mechanism achieved F1-scores of 71\% and 66\% respectively
to predict discourse roles of comments in Reddit and Facebook discussions.
Efficiency of recurrent and convolutional architectures in order to learn
discursive representation on the same task has been presented and analyzed,
with different word and comment embedding schemes. Our attention mechanism
enables us to inquire into relevance ordering of text segments according to
their roles in discourse. We present a human annotator experiment to unveil
important observations about modeling and data annotation. Equipped with our
text-based discourse identification model, we inquire into how heterogeneous
non-textual features like location, time, leaning of information etc. play
their roles in charaterizing online discussions on Facebook
Zero-shot keyword spotting for visual speech recognition in-the-wild
Visual keyword spotting (KWS) is the problem of estimating whether a text
query occurs in a given recording using only video information. This paper
focuses on visual KWS for words unseen during training, a real-world, practical
setting which so far has received no attention by the community. To this end,
we devise an end-to-end architecture comprising (a) a state-of-the-art visual
feature extractor based on spatiotemporal Residual Networks, (b) a
grapheme-to-phoneme model based on sequence-to-sequence neural networks, and
(c) a stack of recurrent neural networks which learn how to correlate visual
features with the keyword representation. Different to prior works on KWS,
which try to learn word representations merely from sequences of graphemes
(i.e. letters), we propose the use of a grapheme-to-phoneme encoder-decoder
model which learns how to map words to their pronunciation. We demonstrate that
our system obtains very promising visual-only KWS results on the challenging
LRS2 database, for keywords unseen during training. We also show that our
system outperforms a baseline which addresses KWS via automatic speech
recognition (ASR), while it drastically improves over other recently proposed
ASR-free KWS methods.Comment: Accepted at ECCV-201
Evolutionary multi-stage financial scenario tree generation
Multi-stage financial decision optimization under uncertainty depends on a
careful numerical approximation of the underlying stochastic process, which
describes the future returns of the selected assets or asset categories.
Various approaches towards an optimal generation of discrete-time,
discrete-state approximations (represented as scenario trees) have been
suggested in the literature. In this paper, a new evolutionary algorithm to
create scenario trees for multi-stage financial optimization models will be
presented. Numerical results and implementation details conclude the paper
Compound memory networks for few-shot video classification
© Springer Nature Switzerland AG 2018. In this paper, we propose a new memory network structure for few-shot video classification by making the following contributions. First, we propose a compound memory network (CMN) structure under the key-value memory network paradigm, in which each key memory involves multiple constituent keys. These constituent keys work collaboratively for training, which enables the CMN to obtain an optimal video representation in a larger space. Second, we introduce a multi-saliency embedding algorithm which encodes a variable-length video sequence into a fixed-size matrix representation by discovering multiple saliencies of interest. For example, given a video of car auction, some people are interested in the car, while others are interested in the auction activities. Third, we design an abstract memory on top of the constituent keys. The abstract memory and constituent keys form a layered structure, which makes the CMN more efficient and capable of being scaled, while also retaining the representation capability of the multiple keys. We compare CMN with several state-of-the-art baselines on a new few-shot video classification dataset and show the effectiveness of our approach
Activity Recognition and Prediction in Real Homes
In this paper, we present work in progress on activity recognition and
prediction in real homes using either binary sensor data or depth video data.
We present our field trial and set-up for collecting and storing the data, our
methods, and our current results. We compare the accuracy of predicting the
next binary sensor event using probabilistic methods and Long Short-Term Memory
(LSTM) networks, include the time information to improve prediction accuracy,
as well as predict both the next sensor event and its mean time of occurrence
using one LSTM model. We investigate transfer learning between apartments and
show that it is possible to pre-train the model with data from other apartments
and achieve good accuracy in a new apartment straight away. In addition, we
present preliminary results from activity recognition using low-resolution
depth video data from seven apartments, and classify four activities - no
movement, standing up, sitting down, and TV interaction - by using a relatively
simple processing method where we apply an Infinite Impulse Response (IIR)
filter to extract movements from the frames prior to feeding them to a
convolutional LSTM network for the classification.Comment: 12 pages, Symposium of the Norwegian AI Society NAIS 201
A Comparison of Classical Versus Deep Learning Techniques for Abusive Content Detection on Social Media Sites
The automated detection of abusive content on social media websites faces a variety of challenges including imbalanced training sets, the identification of an appropriate feature representation and the selection of optimal classifiers. Classifiers such as support vector machines (SVM), combined with bag of words or ngram feature representation, have traditionally dominated in text classification for decades. With the recent emergence of deep learning and word embeddings, an increasing number of researchers have started to focus on deep neural networks. In this paper, our aim is to explore cutting-edge techniques in automated abusive content detection. We use two deep learning approaches: convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We apply these to 9 public datasets derived from various social media websites. Firstly, we show that word embeddings pre-trained on the same data source as the subsequent classification task improves the prediction accuracy of deep learning models. Secondly, we investigate the impact of different levels of training set imbalances on classifier types. In comparison to the traditional SVM classifier, we identify that although deep learning models can outperform the classification results of the traditional SVM classifier when the associated training dataset is seriously imbalanced, the performance of the SVM classifier can be dramatically improved through the use of oversampling, surpassing the deep learning models. Our work can inform researchers in selecting appropriate text classification strategies in the detection of abusive content, including scenarios where the training datasets suffer from class imbalance
Application of Deep Learning Long Short-Term Memory in Energy Demand Forecasting
The smart metering infrastructure has changed how electricity is measured in
both residential and industrial application. The large amount of data collected
by smart meter per day provides a huge potential for analytics to support the
operation of a smart grid, an example of which is energy demand forecasting.
Short term energy forecasting can be used by utilities to assess if any
forecasted peak energy demand would have an adverse effect on the power system
transmission and distribution infrastructure. It can also help in load
scheduling and demand side management. Many techniques have been proposed to
forecast time series including Support Vector Machine, Artificial Neural
Network and Deep Learning. In this work we use Long Short Term Memory
architecture to forecast 3-day ahead energy demand across each month in the
year. The results show that 3-day ahead demand can be accurately forecasted
with a Mean Absolute Percentage Error of 3.15%. In addition to that, the paper
proposes way to quantify the time as a feature to be used in the training phase
which is shown to affect the network performance
DeepCare: A Deep Dynamic Memory Model for Predictive Medicine
Personalized predictive medicine necessitates the modeling of patient illness
and care processes, which inherently have long-term temporal dependencies.
Healthcare observations, recorded in electronic medical records, are episodic
and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural
network that reads medical records, stores previous illness history, infers
current illness states and predicts future medical outcomes. At the data level,
DeepCare represents care episodes as vectors in space, models patient health
state trajectories through explicit memory of historical records. Built on Long
Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle
irregular timed events by moderating the forgetting and consolidation of memory
cells. DeepCare also incorporates medical interventions that change the course
of illness and shape future medical risk. Moving up to the health state level,
historical and present health states are then aggregated through multiscale
temporal pooling, before passing through a neural network that estimates future
outcomes. We demonstrate the efficacy of DeepCare for disease progression
modeling, intervention recommendation, and future risk prediction. On two
important cohorts with heavy social and economic burden -- diabetes and mental
health -- the results show improved modeling and risk prediction accuracy.Comment: Accepted at JBI under the new name: "Predicting healthcare
trajectories from medical records: A deep learning approach
- …