5,945 research outputs found
Practical recommendations for gradient-based training of deep architectures
Learning algorithms related to artificial neural networks and in particular
for Deep Learning may seem to involve many bells and whistles, called
hyper-parameters. This chapter is meant as a practical guide with
recommendations for some of the most commonly used hyper-parameters, in
particular in the context of learning algorithms based on back-propagated
gradient and gradient-based optimization. It also discusses how to deal with
the fact that more interesting results can be obtained when allowing one to
adjust many hyper-parameters. Overall, it describes elements of the practice
used to successfully and efficiently train and debug large-scale and often deep
multi-layer neural networks. It closes with open questions about the training
difficulties observed with deeper architectures
Efficient Elastic Net Regularization for Sparse Linear Models
This paper presents an algorithm for efficient training of sparse linear
models with elastic net regularization. Extending previous work on delayed
updates, the new algorithm applies stochastic gradient updates to non-zero
features only, bringing weights current as needed with closed-form updates.
Closed-form delayed updates for the , , and rarely used
regularizers have been described previously. This paper provides
closed-form updates for the popular squared norm and elastic net
regularizers.
We provide dynamic programming algorithms that perform each delayed update in
constant time. The new and elastic net methods handle both fixed and
varying learning rates, and both standard {stochastic gradient descent} (SGD)
and {forward backward splitting (FoBoS)}. Experimental results show that on a
bag-of-words dataset with features, but only nonzero features on
average per training example, the dynamic programming method trains a logistic
regression classifier with elastic net regularization over times faster
than otherwise
- β¦