We study a class of stochastic optimization problems of the mean-field type
arising in the optimal training of a deep residual neural network. We consider
the sampling problem arising from a continuous layer idealization, and
establish the existence of optimal relaxed controls when the training set has
finite size. The core of our paper is to prove the Gamma-convergence of the
sequence of sampled objective functionals, i.e., show that as the size of the
training set grows large, the minimizer of the sampled relaxed problem
converges to that of the limiting optimization problem. We connect the limit of
the large sampled objective functional to the unique solution, in the
trajectory sense, of a nonlinear Fokker-Planck-Kolmogorov (FPK) equation in a
random environment. We construct an example to show that, under mild
assumptions, the optimal network weights can be numerically computed by solving
a second-order differential equation with Neumann boundary conditions in the
sense of distributions