The remarkable performance of overparameterized deep neural networks (DNNs)
must arise from an interplay between network architecture, training algorithms,
and structure in the data. To disentangle these three components, we apply a
Bayesian picture, based on the functions expressed by a DNN, to supervised
learning. The prior over functions is determined by the network, and is varied
by exploiting a transition between ordered and chaotic regimes. For Boolean
function classification, we approximate the likelihood using the error spectrum
of functions on data. When combined with the prior, this accurately predicts
the posterior, measured for DNNs trained with stochastic gradient descent. This
analysis reveals that structured data, combined with an intrinsic Occam's
razor-like inductive bias towards (Kolmogorov) simple functions that is strong
enough to counteract the exponential growth of the number of functions with
complexity, is a key to the success of DNNs