A supervised learning algorithm searches over a set of functions A→B
parametrised by a space P to find the best approximation to some ideal
function f:A→B. It does this by taking examples (a,f(a))∈A×B, and updating the parameter according to some rule. We define a
category where these update rules may be composed, and show that gradient
descent---with respect to a fixed step size and an error function satisfying a
certain property---defines a monoidal functor from a category of parametrised
functions to this category of update rules. This provides a structural
perspective on backpropagation, as well as a broad generalisation of neural
networks.Comment: 13 pages + 4 page appendi