172 research outputs found
Convex Optimization for Big Data
This article reviews recent advances in convex optimization algorithms for
Big Data, which aim to reduce the computational, storage, and communications
bottlenecks. We provide an overview of this emerging field, describe
contemporary approximation techniques like first-order methods and
randomization for scalability, and survey the important role of parallel and
distributed computation. The new Big Data algorithms are based on surprisingly
simple principles and attain staggering accelerations even on classical
problems.Comment: 23 pages, 4 figurs, 8 algorithm
Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives
We consider a discrete optimization formulation for learning sparse
classifiers, where the outcome depends upon a linear combination of a small
subset of features. Recent work has shown that mixed integer programming (MIP)
can be used to solve (to optimality) -regularized regression problems
at scales much larger than what was conventionally considered possible. Despite
their usefulness, MIP-based global optimization approaches are significantly
slower compared to the relatively mature algorithms for -regularization
and heuristics for nonconvex regularized problems. We aim to bridge this gap in
computation times by developing new MIP-based algorithms for
-regularized classification. We propose two classes of scalable
algorithms: an exact algorithm that can handle features in a
few minutes, and approximate algorithms that can address instances with
in times comparable to the fast -based algorithms. Our
exact algorithm is based on the novel idea of \textsl{integrality generation},
which solves the original problem (with binary variables) via a sequence of
mixed integer programs that involve a small number of binary variables. Our
approximate algorithms are based on coordinate descent and local combinatorial
search. In addition, we present new estimation error bounds for a class of
-regularized estimators. Experiments on real and synthetic data
demonstrate that our approach leads to models with considerably improved
statistical performance (especially, variable selection) when compared to
competing methods.Comment: To appear in JML
The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial
In this tutorial paper, we first define mean squared error, variance,
covariance, and bias of both random variables and classification/predictor
models. Then, we formulate the true and generalization errors of the model for
both training and validation/test instances where we make use of the Stein's
Unbiased Risk Estimator (SURE). We define overfitting, underfitting, and
generalization using the obtained true and generalization errors. We introduce
cross validation and two well-known examples which are -fold and
leave-one-out cross validations. We briefly introduce generalized cross
validation and then move on to regularization where we use the SURE again. We
work on both and norm regularizations. Then, we show that
bootstrap aggregating (bagging) reduces the variance of estimation. Boosting,
specifically AdaBoost, is introduced and it is explained as both an additive
model and a maximum margin model, i.e., Support Vector Machine (SVM). The upper
bound on the generalization error of boosting is also provided to show why
boosting prevents from overfitting. As examples of regularization, the theory
of ridge and lasso regressions, weight decay, noise injection to input/weights,
and early stopping are explained. Random forest, dropout, histogram of oriented
gradients, and single shot multi-box detector are explained as examples of
bagging in machine learning and computer vision. Finally, boosting tree and SVM
models are mentioned as examples of boosting.Comment: 23 pages, 9 figure
- …