49,083 research outputs found
Robust Semi-Supervised Learning with Out of Distribution Data
Recent Semi-supervised learning (SSL) works show significant improvement in
SSL algorithms' performance using better-unlabeled data representations.
However, recent work [Oliver et al., 2018] shows that the SSL algorithm's
performance could degrade when the unlabeled set has out-of-distribution
examples (OODs). In this work, we first study the critical causes of OOD's
negative impact on SSL algorithms. We found that (1) the OOD's effect on the
SSL algorithm's performance increases as its distance to the decision boundary
decreases, and (2) Batch Normalization (BN), a popular module, could degrade
the performance instead of improving the performance when the unlabeled set
contains OODs. To address the above causes, we proposed a novel unified-robust
SSL approach that can be easily extended to many existing SSL algorithms, and
improve their robustness against OODs. In particular, we propose a simple
modification of batch normalization, called weighted batch normalization, that
improves BN's robustness against OODs. We also developed two efficient
hyper-parameter optimization algorithms that have different tradeoffs in
computational efficiency and accuracy. Extensive experiments on synthetic and
real-world datasets prove that our proposed approaches significantly improves
the robustness of four representative SSL algorithms against OODs compared with
four state-of-the-art robust SSL approaches.Comment: Preprin
Implicit Filter Sparsification In Convolutional Neural Networks
We show implicit filter level sparsity manifests in convolutional neural
networks (CNNs) which employ Batch Normalization and ReLU activation, and are
trained with adaptive gradient descent techniques and L2 regularization or
weight decay. Through an extensive empirical study (Mehta et al., 2019) we
hypothesize the mechanism behind the sparsification process, and find
surprising links to certain filter sparsification heuristics proposed in
literature. Emergence of, and the subsequent pruning of selective features is
observed to be one of the contributing mechanisms, leading to feature sparsity
at par or better than certain explicit sparsification / pruning approaches. In
this workshop article we summarize our findings, and point out corollaries of
selective-featurepenalization which could also be employed as heuristics for
filter pruningComment: ODML-CDNNR 2019 (ICML'19 workshop) extended abstract of the CVPR 2019
paper "On Implicit Filter Level Sparsity in Convolutional Neural Networks,
Mehta et al." (arXiv:1811.12495
A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models
Training large vocabulary Neural Network Language Models (NNLMs) is a
difficult task due to the explicit requirement of the output layer
normalization, which typically involves the evaluation of the full softmax
function over the complete vocabulary. This paper proposes a Batch Noise
Contrastive Estimation (B-NCE) approach to alleviate this problem. This is
achieved by reducing the vocabulary, at each time step, to the target words in
the batch and then replacing the softmax by the noise contrastive estimation
approach, where these words play the role of targets and noise samples at the
same time. In doing so, the proposed approach can be fully formulated and
implemented using optimal dense matrix operations. Applying B-NCE to train
different NNLMs on the Large Text Compression Benchmark (LTCB) and the One
Billion Word Benchmark (OBWB) shows a significant reduction of the training
time with no noticeable degradation of the models performance. This paper also
presents a new baseline comparative study of different standard NNLMs on the
large OBWB on a single Titan-X GPU.Comment: Accepted for publication at INTERSPEECH'1
- …