This paper presents an algorithm for efficient training of sparse linear
models with elastic net regularization. Extending previous work on delayed
updates, the new algorithm applies stochastic gradient updates to non-zero
features only, bringing weights current as needed with closed-form updates.
Closed-form delayed updates for the β1β, βββ, and rarely used
β2β regularizers have been described previously. This paper provides
closed-form updates for the popular squared norm β22β and elastic net
regularizers.
We provide dynamic programming algorithms that perform each delayed update in
constant time. The new β22β and elastic net methods handle both fixed and
varying learning rates, and both standard {stochastic gradient descent} (SGD)
and {forward backward splitting (FoBoS)}. Experimental results show that on a
bag-of-words dataset with 260,941 features, but only 88 nonzero features on
average per training example, the dynamic programming method trains a logistic
regression classifier with elastic net regularization over 2000 times faster
than otherwise