273 research outputs found
How Does Batch Normalization Help Optimization?
Batch Normalization (BatchNorm) is a widely adopted technique that enables
faster and more stable training of deep neural networks (DNNs). Despite its
pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly
understood. The popular belief is that this effectiveness stems from
controlling the change of the layers' input distributions during training to
reduce the so-called "internal covariate shift". In this work, we demonstrate
that such distributional stability of layer inputs has little to do with the
success of BatchNorm. Instead, we uncover a more fundamental impact of
BatchNorm on the training process: it makes the optimization landscape
significantly smoother. This smoothness induces a more predictive and stable
behavior of the gradients, allowing for faster training.Comment: In NeurIPS'1
Multi-Level Batch Normalization In Deep Networks For Invasive Ductal Carcinoma Cell Discrimination In Histopathology Images
Breast cancer is the most diagnosed cancer and the most predominant cause of
death in women worldwide. Imaging techniques such as the breast cancer
pathology helps in the diagnosis and monitoring of the disease. However
identification of malignant cells can be challenging given the high
heterogeneity in tissue absorbotion from staining agents. In this work, we
present a novel approach for Invasive Ductal Carcinoma (IDC) cells
discrimination in histopathology slides. We propose a model derived from the
Inception architecture, proposing a multi-level batch normalization module
between each convolutional steps. This module was used as a base block for the
feature extraction in a CNN architecture. We used the open IDC dataset in which
we obtained a balanced accuracy of 0.89 and an F1 score of 0.90, thus
surpassing recent state of the art classification algorithms tested on this
public dataset.Comment: 4 pages, 5 figure
Context-Aware Systems for Sequential Item Recommendation
Quizlet is the most popular online learning tool in the United States, and is
used by over 2/3 of high school students, and 1/2 of college students. With
more than 95% of Quizlet users reporting improved grades as a result, the
platform has become the de-facto tool used in millions of classrooms. In this
paper, we explore the task of recommending suitable content for a student to
study, given their prior interests, as well as what their peers are studying.
We propose a novel approach, i.e. Neural Educational Recommendation Engine
(NERE), to recommend educational content by leveraging student behaviors rather
than ratings. We have found that this approach better captures social factors
that are more aligned with learning. NERE is based on a recurrent neural
network that includes collaborative and content-based approaches for
recommendation, and takes into account any particular student's speed, mastery,
and experience to recommend the appropriate task. We train NERE by jointly
learning the user embeddings and content embeddings, and attempt to predict the
content embedding for the final timestamp. We also develop a confidence
estimator for our neural network, which is a crucial requirement for
productionizing this model. We apply NERE to Quizlet's proprietary dataset, and
present our results. We achieved an R^2 score of 0.81 in the content embedding
space, and a recall score of 54% on our 100 nearest neighbors. This vastly
exceeds the recall@100 score of 12% that a standard matrix-factorization
approach provides. We conclude with a discussion on how NERE will be deployed,
and position our work as one of the first educational recommender systems for
the K-12 space
CrossNorm: Normalization for Off-Policy TD Reinforcement Learning
Off-policy temporal difference (TD) methods are a powerful class of
reinforcement learning (RL) algorithms. Intriguingly, deep off-policy TD
algorithms are not commonly used in combination with feature normalization
techniques, despite positive effects of normalization in other domains. We show
that naive application of existing normalization techniques is indeed not
effective, but that well-designed normalization improves optimization stability
and removes the necessity of target networks. In particular, we introduce a
normalization based on a mixture of on- and off-policy transitions, which we
call cross-normalization. It can be regarded as an extension of batch
normalization that re-centers data for two different distributions, as present
in off-policy learning. Applied to DDPG and TD3, cross-normalization improves
over the state of the art across a range of MuJoCo benchmark tasks
A Domain Agnostic Normalization Layer for Unsupervised Adversarial Domain Adaptation
We propose a normalization layer for unsupervised domain adaption in semantic
scene segmentation. Normalization layers are known to improve convergence and
generalization and are part of many state-of-the-art fully-convolutional neural
networks. We show that conventional normalization layers worsen the performance
of current Unsupervised Adversarial Domain Adaption (UADA), which is a method
to improve network performance on unlabeled datasets and the focus of our
research. Therefore, we propose a novel Domain Agnostic Normalization layer and
thereby unlock the benefits of normalization layers for unsupervised
adversarial domain adaptation. In our evaluation, we adapt from the synthetic
GTA5 data set to the real Cityscapes data set, a common benchmark experiment,
and surpass the state-of-the-art. As our normalization layer is domain agnostic
at test time, we furthermore demonstrate that UADA using Domain Agnostic
Normalization improves performance on unseen domains, specifically on
Apolloscape and Mapillary
- …