Search CORE

138 research outputs found

Efficient Elastic Net Regularization for Sparse Linear Models

Author: Elkan Charles
Lipton Zachary C.
Publication venue
Publication date: 02/07/2015
Field of study

This paper presents an algorithm for efficient training of sparse linear models with elastic net regularization. Extending previous work on delayed updates, the new algorithm applies stochastic gradient updates to non-zero features only, bringing weights current as needed with closed-form updates. Closed-form delayed updates for the

\ell_1

\ell_{\infty}

, and rarely used

\ell_2

regularizers have been described previously. This paper provides closed-form updates for the popular squared norm

\ell^2_2

and elastic net regularizers. We provide dynamic programming algorithms that perform each delayed update in constant time. The new

\ell^2_2

and elastic net methods handle both fixed and varying learning rates, and both standard {stochastic gradient descent} (SGD) and {forward backward splitting (FoBoS)}. Experimental results show that on a bag-of-words dataset with

260,941

features, but only

88

nonzero features on average per training example, the dynamic programming method trains a logistic regression classifier with elastic net regularization over

2000

times faster than otherwise

arXiv.org e-Print Archive

CiteSeerX

Input and Weight Space Smoothing for Semi-supervised Learning

Author: Cicek Safa
Soatto Stefano
Publication venue
Publication date: 23/05/2018
Field of study

We propose regularizing the empirical loss for semi-supervised learning by acting on both the input (data) space, and the weight (parameter) space. We show that the two are not equivalent, and in fact are complementary, one affecting the minimality of the resulting representation, the other insensitivity to nuisance variability. We propose a method to perform such smoothing, which combines known input-space smoothing with a novel weight-space smoothing, based on a min-max (adversarial) optimization. The resulting Adversarial Block Coordinate Descent (ABCD) algorithm performs gradient ascent with a small learning rate for a random subset of the weights, and standard gradient descent on the remaining weights in the same mini-batch. It achieves comparable performance to the state-of-the-art without resorting to heavy data augmentation, using a relatively simple architecture

arXiv.org e-Print Archive

Crossref