52 research outputs found
ADADELTA: An Adaptive Learning Rate Method
We present a novel per-dimension learning rate method for gradient descent
called ADADELTA. The method dynamically adapts over time using only first order
information and has minimal computational overhead beyond vanilla stochastic
gradient descent. The method requires no manual tuning of a learning rate and
appears robust to noisy gradient information, different model architecture
choices, various data modalities and selection of hyperparameters. We show
promising results compared to other methods on the MNIST digit classification
task using a single machine and on a large scale voice dataset in a distributed
cluster environment.Comment: 6 page
Visualizing and Understanding Convolutional Networks
Large Convolutional Network models have recently demonstrated impressive
classification performance on the ImageNet benchmark. However there is no clear
understanding of why they perform so well, or how they might be improved. In
this paper we address both issues. We introduce a novel visualization technique
that gives insight into the function of intermediate feature layers and the
operation of the classifier. We also perform an ablation study to discover the
performance contribution from different model layers. This enables us to find
model architectures that outperform Krizhevsky \etal on the ImageNet
classification benchmark. We show our ImageNet model generalizes well to other
datasets: when the softmax classifier is retrained, it convincingly beats the
current state-of-the-art results on Caltech-101 and Caltech-256 datasets
Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks
Predicting the future health information of patients from the historical
Electronic Health Records (EHR) is a core research task in the development of
personalized healthcare. Patient EHR data consist of sequences of visits over
time, where each visit contains multiple medical codes, including diagnosis,
medication, and procedure codes. The most important challenges for this task
are to model the temporality and high dimensionality of sequential EHR data and
to interpret the prediction results. Existing work solves this problem by
employing recurrent neural networks (RNNs) to model EHR data and utilizing
simple attention mechanism to interpret the results. However, RNN-based
approaches suffer from the problem that the performance of RNNs drops when the
length of sequences is large, and the relationships between subsequent visits
are ignored by current RNN-based approaches. To address these issues, we
propose {\sf Dipole}, an end-to-end, simple and robust model for predicting
patients' future health information. Dipole employs bidirectional recurrent
neural networks to remember all the information of both the past visits and the
future visits, and it introduces three attention mechanisms to measure the
relationships of different visits for the prediction. With the attention
mechanisms, Dipole can interpret the prediction results effectively. Dipole
also allows us to interpret the learned medical code representations which are
confirmed positively by medical experts. Experimental results on two real world
EHR datasets show that the proposed Dipole can significantly improve the
prediction accuracy compared with the state-of-the-art diagnosis prediction
approaches and provide clinically meaningful interpretation
CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection
Robust face detection in the wild is one of the ultimate components to
support various facial related problems, i.e. unconstrained face recognition,
facial periocular recognition, facial landmarking and pose estimation, facial
expression recognition, 3D facial model construction, etc. Although the face
detection problem has been intensely studied for decades with various
commercial applications, it still meets problems in some real-world scenarios
due to numerous challenges, e.g. heavy facial occlusions, extremely low
resolutions, strong illumination, exceptionally pose variations, image or video
compression artifacts, etc. In this paper, we present a face detection approach
named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN)
to robustly solve the problems mentioned above. Similar to the region-based
CNNs, our proposed network consists of the region proposal component and the
region-of-interest (RoI) detection component. However, far apart of that
network, there are two main contributions in our proposed network that play a
significant role to achieve the state-of-the-art performance in face detection.
Firstly, the multi-scale information is grouped both in region proposal and RoI
detection to deal with tiny face regions. Secondly, our proposed network allows
explicit body contextual reasoning in the network inspired from the intuition
of human vision system. The proposed approach is benchmarked on two recent
challenging face detection databases, i.e. the WIDER FACE Dataset which
contains high degree of variability, as well as the Face Detection Dataset and
Benchmark (FDDB). The experimental results show that our proposed approach
trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE
Dataset by a large margin, and consistently achieves competitive results on
FDDB against the recent state-of-the-art face detection methods
Efficient and Extensible Policy Mining for Relationship-Based Access Control
Relationship-based access control (ReBAC) is a flexible and expressive
framework that allows policies to be expressed in terms of chains of
relationship between entities as well as attributes of entities. ReBAC policy
mining algorithms have a potential to significantly reduce the cost of
migration from legacy access control systems to ReBAC, by partially automating
the development of a ReBAC policy. Existing ReBAC policy mining algorithms
support a policy language with a limited set of operators; this limits their
applicability. This paper presents a ReBAC policy mining algorithm designed to
be both (1) easily extensible (to support additional policy language features)
and (2) scalable. The algorithm is based on Bui et al.'s evolutionary algorithm
for ReBAC policy mining algorithm. First, we simplify their algorithm, in order
to make it easier to extend and provide a methodology that extends it to handle
new policy language features. However, extending the policy language increases
the search space of candidate policies explored by the evolutionary algorithm,
thus causes longer running time and/or worse results. To address the problem,
we enhance the algorithm with a feature selection phase. The enhancement
utilizes a neural network to identify useful features. We use the result of
feature selection to reduce the evolutionary algorithm's search space. The new
algorithm is easy to extend and, as shown by our experiments, is more efficient
and produces better policies
LAL: linguistically aware learning for scene text recognition
Scene text recognition is the task of recognizing character sequences in images of natural scenes. The considerable diversity in the appearance of text in a scene image and potentially highly complex backgrounds make text recognition challenging. Previous approaches employ character sequence generators to analyze text regions and, subsequently, compare the candidate character sequences against a language model. In this work, we propose a bimodal framework that simultaneously utilizes visual and linguistic information to enhance recognition performance. Our linguistically aware learning (LAL) method effectively learns visual embeddings using a rectifier, encoder, and attention decoder approach, and linguistic embeddings, using a deep next-character prediction model. We present an innovative way of combining these two embeddings effectively. Our experiments on eight standard benchmarks show that our method outperforms previous methods by large margins, particularly on rotated, foreshortened, and curved text. We show that the bimodal approach has a statistically significant impact. We also contribute a new dataset, and show robust performance when LAL is combined with a text detector in a pipelined text spotting framework.Published versio
Deep Unsupervised Multi-View Detection of Video Game Stream Highlights
We consider the problem of automatic highlight-detection in video game streams. Currently, the vast majority of highlight-detection systems for games are triggered by the occurrence of hard-coded game events (e.g., score change, end-game), while most advanced tools and techniques are based on detection of highlights via visual analysis of game footage. We argue that in the context of game streaming, events that may constitute highlights are not only dependent on game footage, but also on social signals that are conveyed by the streamer during the play session (e.g., when interacting with viewers, or when commenting and reacting to the game). In this light, we present a multi-view unsupervised deep learning methodology for novelty-based highlight detection. The method jointly analyses both game footage and social signals such as the players facial expressions and speech, and shows promising results for generating highlights on streams of popular games such as Player Unknown's Battlegrounds
Measuring, Understanding, and Classifying News Media Sympathy on Twitter after Crisis Events
This paper investigates bias in coverage between Western and Arab media on
Twitter after the November 2015 Beirut and Paris terror attacks. Using two
Twitter datasets covering each attack, we investigate how Western and Arab
media differed in coverage bias, sympathy bias, and resulting information
propagation. We crowdsourced sympathy and sentiment labels for 2,390 tweets
across four languages (English, Arabic, French, German), built a regression
model to characterize sympathy, and thereafter trained a deep convolutional
neural network to predict sympathy. Key findings show: (a) both events were
disproportionately covered (b) Western media exhibited less sympathy, where
each media coverage was more sympathetic towards the country affected in their
respective region (c) Sympathy predictions supported ground truth analysis that
Western media was less sympathetic than Arab media (d) Sympathetic tweets do
not spread any further. We discuss our results in light of global news flow,
Twitter affordances, and public perception impact.Comment: In Proc. CHI 2018 Papers program. Please cite: El Ali, A., Stratmann,
T., Park, S., Sch\"oning, J., Heuten, W. & Boll, S. (2018). Measuring,
Understanding, and Classifying News Media Sympathy on Twitter after Crisis
Events. In Proceedings of the 2018 CHI Conference on Human Factors in
Computing Systems (CHI '18). ACM, New York, NY, USA. DOI:
https://doi.org/10.1145/3173574.317413
Automatically Segmenting the Left Atrium from Cardiac Images Using Successive 3D U-Nets and a Contour Loss
International audienceRadiological imaging offers effective measurement of anatomy, which is useful in disease diagnosis and assessment. Previous study has shown that the left atrial wall remodeling can provide information to predict treatment outcome in atrial fibrillation. Nevertheless, the segmentation of the left atrial structures from medical images is still very time-consuming. Current advances in neural network may help creating automatic segmentation models that reduce the workload for clinicians. In this preliminary study, we propose automated, two-stage, three-dimensional U-Nets with convolutional neural network, for the challenging task of left atrial segmentation. Unlike previous two-dimensional image segmentation methods, we use 3D U-Nets to obtain the heart cavity directly in 3D. The dual 3D U-Net structure consists of, a first U-Net to coarsely segment and locate the left atrium, and a second U-Net to accurately segment the left atrium under higher resolution. In addition, we introduce a Contour loss based on additional distance information to adjust the final segmentation. We randomly split the data into training datasets (80 subjects) and validation datasets (20 subjects) to train multiple models, with different augmentation setting. Experiments show that the average Dice coefficients for validation datasets are around 0.91 - 0.92, the sensitivity around 0.90-0.94 and the specificity 0.99. Compared with traditional Dice loss, models trained with Contour loss in general offer smaller Hausdorff distance with similar Dice coefficient, and have less connected components in predictions. Finally, we integrate several trained models in an ensemble prediction to segment testing datasets
- …