6,378 research outputs found
Dropout Training as Adaptive Regularization
Dropout and other feature noising schemes control overfitting by artificially
corrupting the training data. For generalized linear models, dropout performs a
form of adaptive regularization. Using this viewpoint, we show that the dropout
regularizer is first-order equivalent to an L2 regularizer applied after
scaling the features by an estimate of the inverse diagonal Fisher information
matrix. We also establish a connection to AdaGrad, an online learning
algorithm, and find that a close relative of AdaGrad operates by repeatedly
solving linear dropout-regularized problems. By casting dropout as
regularization, we develop a natural semi-supervised algorithm that uses
unlabeled data to create a better adaptive regularizer. We apply this idea to
document classification tasks, and show that it consistently boosts the
performance of dropout training, improving on state-of-the-art results on the
IMDB reviews dataset.Comment: 11 pages. Advances in Neural Information Processing Systems (NIPS),
201
OCReP: An Optimally Conditioned Regularization for Pseudoinversion Based Neural Training
In this paper we consider the training of single hidden layer neural networks
by pseudoinversion, which, in spite of its popularity, is sometimes affected by
numerical instability issues. Regularization is known to be effective in such
cases, so that we introduce, in the framework of Tikhonov regularization, a
matricial reformulation of the problem which allows us to use the condition
number as a diagnostic tool for identification of instability. By imposing
well-conditioning requirements on the relevant matrices, our theoretical
analysis allows the identification of an optimal value for the regularization
parameter from the standpoint of stability. We compare with the value derived
by cross-validation for overfitting control and optimisation of the
generalization performance. We test our method for both regression and
classification tasks. The proposed method is quite effective in terms of
predictivity, often with some improvement on performance with respect to the
reference cases considered. This approach, due to analytical determination of
the regularization parameter, dramatically reduces the computational load
required by many other techniques.Comment: Published on Neural Network
LambdaOpt: Learn to Regularize Recommender Models in Finer Levels
Recommendation models mainly deal with categorical variables, such as
user/item ID and attributes. Besides the high-cardinality issue, the
interactions among such categorical variables are usually long-tailed, with the
head made up of highly frequent values and a long tail of rare ones. This
phenomenon results in the data sparsity issue, making it essential to
regularize the models to ensure generalization. The common practice is to
employ grid search to manually tune regularization hyperparameters based on the
validation data. However, it requires non-trivial efforts and large computation
resources to search the whole candidate space; even so, it may not lead to the
optimal choice, for which different parameters should have different
regularization strengths. In this paper, we propose a hyperparameter
optimization method, LambdaOpt, which automatically and adaptively enforces
regularization during training. Specifically, it updates the regularization
coefficients based on the performance of validation data. With LambdaOpt, the
notorious tuning of regularization hyperparameters can be avoided; more
importantly, it allows fine-grained regularization (i.e. each parameter can
have an individualized regularization coefficient), leading to better
generalized models. We show how to employ LambdaOpt on matrix factorization, a
classical model that is representative of a large family of recommender models.
Extensive experiments on two public benchmarks demonstrate the superiority of
our method in boosting the performance of top-K recommendation.Comment: Accepted by KDD 201
Local-Aggregate Modeling for Big-Data via Distributed Optimization: Applications to Neuroimaging
Technological advances have led to a proliferation of structured big data
that have matrix-valued covariates. We are specifically motivated to build
predictive models for multi-subject neuroimaging data based on each subject's
brain imaging scans. This is an ultra-high-dimensional problem that consists of
a matrix of covariates (brain locations by time points) for each subject; few
methods currently exist to fit supervised models directly to this tensor data.
We propose a novel modeling and algorithmic strategy to apply generalized
linear models (GLMs) to this massive tensor data in which one set of variables
is associated with locations. Our method begins by fitting GLMs to each
location separately, and then builds an ensemble by blending information across
locations through regularization with what we term an aggregating penalty. Our
so called, Local-Aggregate Model, can be fit in a completely distributed manner
over the locations using an Alternating Direction Method of Multipliers (ADMM)
strategy, and thus greatly reduces the computational burden. Furthermore, we
propose to select the appropriate model through a novel sequence of faster
algorithmic solutions that is similar to regularization paths. We will
demonstrate both the computational and predictive modeling advantages of our
methods via simulations and an EEG classification problem.Comment: 41 pages, 5 figures and 3 table
- …