5,945 research outputs found

    Practical recommendations for gradient-based training of deep architectures

    Full text link
    Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures

    Efficient Elastic Net Regularization for Sparse Linear Models

    Full text link
    This paper presents an algorithm for efficient training of sparse linear models with elastic net regularization. Extending previous work on delayed updates, the new algorithm applies stochastic gradient updates to non-zero features only, bringing weights current as needed with closed-form updates. Closed-form delayed updates for the β„“1\ell_1, β„“βˆž\ell_{\infty}, and rarely used β„“2\ell_2 regularizers have been described previously. This paper provides closed-form updates for the popular squared norm β„“22\ell^2_2 and elastic net regularizers. We provide dynamic programming algorithms that perform each delayed update in constant time. The new β„“22\ell^2_2 and elastic net methods handle both fixed and varying learning rates, and both standard {stochastic gradient descent} (SGD) and {forward backward splitting (FoBoS)}. Experimental results show that on a bag-of-words dataset with 260,941260,941 features, but only 8888 nonzero features on average per training example, the dynamic programming method trains a logistic regression classifier with elastic net regularization over 20002000 times faster than otherwise
    • …
    corecore