318,930 research outputs found
Improved Dropout for Shallow and Deep Learning
Dropout has been witnessed with great success in training deep neural
networks by independently zeroing out the outputs of neurons at random. It has
also received a surge of interest for shallow learning, e.g., logistic
regression. However, the independent sampling for dropout could be suboptimal
for the sake of convergence. In this paper, we propose to use multinomial
sampling for dropout, i.e., sampling features or neurons according to a
multinomial distribution with different probabilities for different
features/neurons. To exhibit the optimal dropout probabilities, we analyze the
shallow learning with multinomial dropout and establish the risk bound for
stochastic optimization. By minimizing a sampling dependent factor in the risk
bound, we obtain a distribution-dependent dropout with sampling probabilities
dependent on the second order statistics of the data distribution. To tackle
the issue of evolving distribution of neurons in deep learning, we propose an
efficient adaptive dropout (named \textbf{evolutional dropout}) that computes
the sampling probabilities on-the-fly from a mini-batch of examples. Empirical
studies on several benchmark datasets demonstrate that the proposed dropouts
achieve not only much faster convergence and but also a smaller testing error
than the standard dropout. For example, on the CIFAR-100 data, the evolutional
dropout achieves relative improvements over 10\% on the prediction performance
and over 50\% on the convergence speed compared to the standard dropout.Comment: In NIPS 201
Altitude Training: Strong Bounds for Single-Layer Dropout
Dropout training, originally designed for deep neural networks, has been
successful on high-dimensional single-layer natural language tasks. This paper
proposes a theoretical explanation for this phenomenon: we show that, under a
generative Poisson topic model with long documents, dropout training improves
the exponent in the generalization bound for empirical risk minimization.
Dropout achieves this gain much like a marathon runner who practices at
altitude: once a classifier learns to perform reasonably well on training
examples that have been artificially corrupted by dropout, it will do very well
on the uncorrupted test set. We also show that, under similar conditions,
dropout preserves the Bayes decision boundary and should therefore induce
minimal bias in high dimensions.Comment: Advances in Neural Information Processing Systems (NIPS), 201
Bayesian Dropout
Dropout has recently emerged as a powerful and simple method for training
neural networks preventing co-adaptation by stochastically omitting neurons.
Dropout is currently not grounded in explicit modelling assumptions which so
far has precluded its adoption in Bayesian modelling. Using Bayesian entropic
reasoning we show that dropout can be interpreted as optimal inference under
constraints. We demonstrate this on an analytically tractable regression model
providing a Bayesian interpretation of its mechanism for regularizing and
preventing co-adaptation as well as its connection to other Bayesian
techniques. We also discuss two general approximate techniques for applying
Bayesian dropout for general models, one based on an analytical approximation
and the other on stochastic variational techniques. These techniques are then
applied to a Baysian logistic regression problem and are shown to improve
performance as the model become more misspecified. Our framework roots dropout
as a theoretically justified and practical tool for statistical modelling
allowing Bayesians to tap into the benefits of dropout training.Comment: 21 pages, 3 figures. Manuscript prepared 2014 and awaiting submissio
Identifying Attrition Phases in Survey Data: Applicability and Assessment Study
Background: Although Web-based questionnaires are an efficient, increasingly popular mode of data collection, their utility is often challenged by high participant dropout. Researchers can gain insight into potential causes of high participant dropout by analyzing the dropout patterns.
Objective: This study proposed the application of and assessed the use of user-specified and existing hypothesis testing methods in a novel setting—survey dropout data—to identify phases of higher or lower survey dropout.
Methods: First, we proposed the application of user-specified thresholds to identify abrupt differences in the dropout rate. Second, we proposed the application of 2 existing hypothesis testing methods to detect significant differences in participant dropout. We assessed these methods through a simulation study and through application to a case study, featuring a questionnaire addressing decision-making surrounding cancer screening.
Results: The user-specified method set to a low threshold performed best at accurately detecting phases of high attrition in both the simulation study and test case application, although all proposed methods were too sensitive.
Conclusions: The user-specified method set to a low threshold correctly identified the attrition phases. Hypothesis testing methods, although sensitive at times, were unable to accurately identify the attrition phases. These results strengthen the case for further development of and research surrounding the science of attrition
Dropout Training as Adaptive Regularization
Dropout and other feature noising schemes control overfitting by artificially
corrupting the training data. For generalized linear models, dropout performs a
form of adaptive regularization. Using this viewpoint, we show that the dropout
regularizer is first-order equivalent to an L2 regularizer applied after
scaling the features by an estimate of the inverse diagonal Fisher information
matrix. We also establish a connection to AdaGrad, an online learning
algorithm, and find that a close relative of AdaGrad operates by repeatedly
solving linear dropout-regularized problems. By casting dropout as
regularization, we develop a natural semi-supervised algorithm that uses
unlabeled data to create a better adaptive regularizer. We apply this idea to
document classification tasks, and show that it consistently boosts the
performance of dropout training, improving on state-of-the-art results on the
IMDB reviews dataset.Comment: 11 pages. Advances in Neural Information Processing Systems (NIPS),
201
- …
