840 research outputs found
Practical Gauss-Newton Optimisation for Deep Learning
We present an efficient block-diagonal ap- proximation to the Gauss-Newton
matrix for feedforward neural networks. Our result- ing algorithm is
competitive against state- of-the-art first order optimisation methods, with
sometimes significant improvement in optimisation performance. Unlike
first-order methods, for which hyperparameter tuning of the optimisation
parameters is often a labo- rious process, our approach can provide good
performance even when used with default set- tings. A side result of our work
is that for piecewise linear transfer functions, the net- work objective
function can have no differ- entiable local maxima, which may partially explain
why such transfer functions facilitate effective optimisation.Comment: ICML 201
Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching
Current techniques and systems for distributed model training mostly assume
that clusters are comprised of homogeneous servers with a constant resource
availability. However, cluster heterogeneity is pervasive in computing
infrastructure, and is a fundamental characteristic of low-cost transient
resources (such as EC2 spot instances). In this paper, we develop a dynamic
batching technique for distributed data-parallel training that adjusts the
mini-batch sizes on each worker based on its resource availability and
throughput. Our mini-batch controller seeks to equalize iteration times on all
workers, and facilitates training on clusters comprised of servers with
different amounts of CPU and GPU resources. This variable mini-batch technique
uses proportional control and ideas from PID controllers to find stable
mini-batch sizes. Our empirical evaluation shows that dynamic batching can
reduce model training times by more than 4x on heterogeneous clusters
- …