Practical recommendations for gradient-based training of deep
  architectures

Bengio, Yoshua

research

Practical recommendations for gradient-based training of deep architectures

Authors: Yoshua Bengio
Publication date: 16 September 2012
Publisher

Abstract

Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.760.1...

Last time updated on 30/10/2017