Search CORE

52 research outputs found

ADADELTA: An Adaptive Learning Rate Method

Author: Zeiler Matthew D.
Publication venue
Publication date: 22/12/2012
Field of study

We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.Comment: 6 page

arXiv.org e-Print Archive

CiteSeerX

Visualizing and Understanding Convolutional Networks

Author: Fergus Rob
Zeiler Matthew D
Publication venue
Publication date: 28/11/2013
Field of study

Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model architectures that outperform Krizhevsky \etal on the ImageNet classification benchmark. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets

arXiv.org e-Print Archive

CiteSeerX

Crossref

Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks

Author: Ba Jimmy
Cheng Yu
Choi Edward
Lipton Zachary C
Suo Qiuling
Wang Xiang
Xu Kelvin
Zeiler Matthew D
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/06/2017
Field of study

Predicting the future health information of patients from the historical Electronic Health Records (EHR) is a core research task in the development of personalized healthcare. Patient EHR data consist of sequences of visits over time, where each visit contains multiple medical codes, including diagnosis, medication, and procedure codes. The most important challenges for this task are to model the temporality and high dimensionality of sequential EHR data and to interpret the prediction results. Existing work solves this problem by employing recurrent neural networks (RNNs) to model EHR data and utilizing simple attention mechanism to interpret the results. However, RNN-based approaches suffer from the problem that the performance of RNNs drops when the length of sequences is large, and the relationships between subsequent visits are ignored by current RNN-based approaches. To address these issues, we propose {\sf Dipole}, an end-to-end, simple and robust model for predicting patients' future health information. Dipole employs bidirectional recurrent neural networks to remember all the information of both the past visits and the future visits, and it introduces three attention mechanisms to measure the relationships of different visits for the prediction. With the attention mechanisms, Dipole can interpret the prediction results effectively. Dipole also allows us to interpret the learned medical code representations which are confirmed positively by medical experts. Experimental results on two real world EHR datasets show that the proposed Dipole can significantly improve the prediction accuracy compared with the state-of-the-art diagnosis prediction approaches and provide clinically meaningful interpretation

arXiv.org e-Print Archive

Crossref

CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection

Author: C. Lawrence Zitnick
Dong Chen
M Everingham
Markus Mathias
Matthew D. Zeiler
P Felzenszwalb
R Girshick
Tsung-Yi Lin
Publication venue
Publication date: 16/06/2016
Field of study

Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e. unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g. heavy facial occlusions, extremely low resolutions, strong illumination, exceptionally pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. Firstly, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Secondly, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e. the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods

arXiv.org e-Print Archive

Crossref

Efficient and Extensible Policy Mining for Relationship-Based Access Control

Author: Bui Thang
Cotrini Carlos
Das Saptarshi
Mocanu Decebal C.
Zeiler Matthew D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/08/2019
Field of study

Relationship-based access control (ReBAC) is a flexible and expressive framework that allows policies to be expressed in terms of chains of relationship between entities as well as attributes of entities. ReBAC policy mining algorithms have a potential to significantly reduce the cost of migration from legacy access control systems to ReBAC, by partially automating the development of a ReBAC policy. Existing ReBAC policy mining algorithms support a policy language with a limited set of operators; this limits their applicability. This paper presents a ReBAC policy mining algorithm designed to be both (1) easily extensible (to support additional policy language features) and (2) scalable. The algorithm is based on Bui et al.'s evolutionary algorithm for ReBAC policy mining algorithm. First, we simplify their algorithm, in order to make it easier to extend and provide a methodology that extends it to handle new policy language features. However, extending the policy language increases the search space of candidate policies explored by the evolutionary algorithm, thus causes longer running time and/or worse results. To address the problem, we enhance the algorithm with a feature selection phase. The enhancement utilizes a neural network to identify useful features. We use the result of feature selection to reduce the evolutionary algorithm's search space. The new algorithm is easy to extend and, as shown by our experiments, is more efficient and produces better policies

arXiv.org e-Print Archive

Crossref

LAL: linguistically aware learning for scene text recognition

Author: Cheng Zhanzhan
Karatzas Dimosthenis
Karatzas Dimosthenis
Liu Xuebo
Lucas S. M.
Zeiler Matthew D.
Zhan Fangneng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/10/2020
Field of study

Scene text recognition is the task of recognizing character sequences in images of natural scenes. The considerable diversity in the appearance of text in a scene image and potentially highly complex backgrounds make text recognition challenging. Previous approaches employ character sequence generators to analyze text regions and, subsequently, compare the candidate character sequences against a language model. In this work, we propose a bimodal framework that simultaneously utilizes visual and linguistic information to enhance recognition performance. Our linguistically aware learning (LAL) method effectively learns visual embeddings using a rectifier, encoder, and attention decoder approach, and linguistic embeddings, using a deep next-character prediction model. We present an innovative way of combining these two embeddings effectively. Our experiments on eight standard benchmarks show that our method outperforms previous methods by large margins, particularly on rotated, foreshortened, and curved text. We show that the bimodal approach has a statistically significant impact. We also contribute a new dataset, and show robust performance when LAL is combined with a text detector in a pipelined text spotting framework.Published versio

Crossref

Boston University Institutional Repository (OpenBU)

Deep Unsupervised Multi-View Detection of Video Game Stream Highlights

Author: Alvernaz S.
Amer M. R.
Feichtenhofer Christoph
Katsaggelos Aggelos K.
Lewis Jm
Pankaj Malhotra Gautam Shroff
Ren Reede
Simonyan K.
Zeiler Matthew D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/07/2018
Field of study

We consider the problem of automatic highlight-detection in video game streams. Currently, the vast majority of highlight-detection systems for games are triggered by the occurrence of hard-coded game events (e.g., score change, end-game), while most advanced tools and techniques are based on detection of highlights via visual analysis of game footage. We argue that in the context of game streaming, events that may constitute highlights are not only dependent on game footage, but also on social signals that are conveyed by the streamer during the play session (e.g., when interacting with viewers, or when commenting and reacting to the game). In this light, we present a multi-view unsupervised deep learning methodology for novelty-based highlight detection. The method jointly analyses both game footage and social signals such as the players facial expressions and speech, and shows promising results for generating highlights on streams of popular games such as Player Unknown's Battlegrounds

arXiv.org e-Print Archive

Goldsmiths Research Online

Crossref

Measuring, Understanding, and Classifying News Media Sympathy on Twitter after Crisis Events

Author: Abbar Sofiane
An Jisun
Bergsma Shane
Cha Meeyoung
Diakopoulos Nicholas
dos Reis Júlio Cesar
Goldberg Yoav
Hansen Lars Kai
Kim Yonghwan
Kwak Haewoon
Levenshtein V. I.
Lotan Gilad
Lui Marco
Mejova Yelena
Mikolov Tomas
Price Vincent
Schulz Axel
Shoemaker P.J.
Singer J.
Vargas Saúl
Zeiler Matthew D.
Publication venue
Publication date: 15/03/2018
Field of study

This paper investigates bias in coverage between Western and Arab media on Twitter after the November 2015 Beirut and Paris terror attacks. Using two Twitter datasets covering each attack, we investigate how Western and Arab media differed in coverage bias, sympathy bias, and resulting information propagation. We crowdsourced sympathy and sentiment labels for 2,390 tweets across four languages (English, Arabic, French, German), built a regression model to characterize sympathy, and thereafter trained a deep convolutional neural network to predict sympathy. Key findings show: (a) both events were disproportionately covered (b) Western media exhibited less sympathy, where each media coverage was more sympathetic towards the country affected in their respective region (c) Sympathy predictions supported ground truth analysis that Western media was less sympathetic than Arab media (d) Sympathetic tweets do not spread any further. We discuss our results in light of global news flow, Twitter affordances, and public perception impact.Comment: In Proc. CHI 2018 Papers program. Please cite: El Ali, A., Stratmann, T., Park, S., Sch\"oning, J., Heuten, W. & Boll, S. (2018). Measuring, Understanding, and Classifying News Media Sympathy on Twitter after Crisis Events. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA. DOI: https://doi.org/10.1145/3173574.317413

arXiv.org e-Print Archive

Crossref

CWI's Institutional Repository

Automatically Segmenting the Left Atrium from Cardiac Images Using Successive 3D U-Nets and a Contour Loss

Author: C McGann
CA Morillo
CN Cecco De
F Isensee
G Litjens
J Liu
M Zoni-Berisso
Matthew D. Zeiler
O Oktay
O Ronneberger
X Li
Y Bengio
Ö Çiçek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/09/2018
Field of study

International audienceRadiological imaging offers effective measurement of anatomy, which is useful in disease diagnosis and assessment. Previous study has shown that the left atrial wall remodeling can provide information to predict treatment outcome in atrial fibrillation. Nevertheless, the segmentation of the left atrial structures from medical images is still very time-consuming. Current advances in neural network may help creating automatic segmentation models that reduce the workload for clinicians. In this preliminary study, we propose automated, two-stage, three-dimensional U-Nets with convolutional neural network, for the challenging task of left atrial segmentation. Unlike previous two-dimensional image segmentation methods, we use 3D U-Nets to obtain the heart cavity directly in 3D. The dual 3D U-Net structure consists of, a first U-Net to coarsely segment and locate the left atrium, and a second U-Net to accurately segment the left atrium under higher resolution. In addition, we introduce a Contour loss based on additional distance information to adjust the final segmentation. We randomly split the data into training datasets (80 subjects) and validation datasets (20 subjects) to train multiple models, with different augmentation setting. Experiments show that the average Dice coefficients for validation datasets are around 0.91 - 0.92, the sensitivity around 0.90-0.94 and the specificity 0.99. Compared with traditional Dice loss, models trained with Contour loss in general offer smaller Hausdorff distance with similar Dice coefficient, and have less connected components in predictions. Finally, we integrate several trained models in an ensemble prediction to segment testing datasets

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server