19 research outputs found

    Foothill: A Quasiconvex Regularization for Edge Computing of Deep Neural Networks

    Full text link
    Deep neural networks (DNNs) have demonstrated success for many supervised learning tasks, ranging from voice recognition, object detection, to image classification. However, their increasing complexity might yield poor generalization error that make them hard to be deployed on edge devices. Quantization is an effective approach to compress DNNs in order to meet these constraints. Using a quasiconvex base function in order to construct a binary quantizer helps training binary neural networks (BNNs) and adding noise to the input data or using a concrete regularization function helps to improve generalization error. Here we introduce foothill function, an infinitely differentiable quasiconvex function. This regularizer is flexible enough to deform towards L1L_1 and L2L_2 penalties. Foothill can be used as a binary quantizer, as a regularizer, or as a loss. In particular, we show this regularizer reduces the accuracy gap between BNNs and their full-precision counterpart for image classification on ImageNet.Comment: Accepted in 16th International Conference of Image Analysis and Recognition (ICIAR 2019

    Dropout Training as Adaptive Regularization

    Full text link
    Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.Comment: 11 pages. Advances in Neural Information Processing Systems (NIPS), 201

    Curriculum Dropout

    Full text link
    Dropout is a very effective way of regularizing neural networks. Stochastically "dropping out" units with a certain probability discourages over-specific co-adaptations of feature detectors, preventing overfitting and improving network generalization. Besides, Dropout can be interpreted as an approximate model aggregation technique, where an exponential number of smaller networks are averaged in order to get a more powerful ensemble. In this paper, we show that using a fixed dropout probability during training is a suboptimal choice. We thus propose a time scheduling for the probability of retaining neurons in the network. This induces an adaptive regularization scheme that smoothly increases the difficulty of the optimization problem. This idea of "starting easy" and adaptively increasing the difficulty of the learning problem has its roots in curriculum learning and allows one to train better models. Indeed, we prove that our optimization strategy implements a very general curriculum scheme, by gradually adding noise to both the input and intermediate feature representations within the network architecture. Experiments on seven image classification datasets and different network architectures show that our method, named Curriculum Dropout, frequently yields to better generalization and, at worst, performs just as well as the standard Dropout method.Comment: Accepted at ICCV (International Conference on Computer Vision) 201
    corecore