This paper presents an algorithm for efficient training of sparse linear
models with elastic net regularization. Extending previous work on delayed
updates, the new algorithm applies stochastic gradient updates to non-zero
features only, bringing weights current as needed with closed-form updates.
Closed-form delayed updates for the ℓ1, ℓ∞, and rarely used
ℓ2 regularizers have been described previously. This paper provides
closed-form updates for the popular squared norm ℓ22 and elastic net
regularizers.
We provide dynamic programming algorithms that perform each delayed update in
constant time. The new ℓ22 and elastic net methods handle both fixed and
varying learning rates, and both standard {stochastic gradient descent} (SGD)
and {forward backward splitting (FoBoS)}. Experimental results show that on a
bag-of-words dataset with 260,941 features, but only 88 nonzero features on
average per training example, the dynamic programming method trains a logistic
regression classifier with elastic net regularization over 2000 times faster
than otherwise