3 research outputs found
When are Neural ODE Solutions Proper ODEs?
A key appeal of the recently proposed Neural Ordinary Differential
Equation(ODE) framework is that it seems to provide a continuous-time extension
of discrete residual neural networks. As we show herein, though, trained Neural
ODE models actually depend on the specific numerical method used during
training. If the trained model is supposed to be a flow generated from an ODE,
it should be possible to choose another numerical solver with equal or smaller
numerical error without loss of performance. We observe that if training relies
on a solver with overly coarse discretization, then testing with another solver
of equal or smaller numerical error results in a sharp drop in accuracy. In
such cases, the combination of vector field and numerical method cannot be
interpreted as a flow generated from an ODE, which arguably poses a fatal
breakdown of the Neural ODE concept. We observe, however, that there exists a
critical step size beyond which the training yields a valid ODE vector field.
We propose a method that monitors the behavior of the ODE solver during
training to adapt its step size, aiming to ensure a valid ODE without
unnecessarily increasing computational cost. We verify this adaption algorithm
on two common bench mark datasets as well as a synthetic dataset. Furthermore,
we introduce a novel synthetic dataset in which the underlying ODE directly
generates a classification task
Dynamical System Inspired Adaptive Time Stepping Controller for Residual Network Families
The correspondence between residual networks and dynamical systems motivates researchers to unravel the physics of ResNets with well-developed tools in numeral methods of ODE systems. The Runge-Kutta-Fehlberg method is an adaptive time stepping that renders a good trade-off between the stability and efficiency. Can we also have an adaptive time stepping for ResNets to ensure both stability and performance? In this study, we analyze the effects of time stepping on the Euler method and ResNets. We establish a stability condition for ResNets with step sizes and weight parameters, and point out the effects of step sizes on the stability and performance. Inspired by our analyses, we develop an adaptive time stepping controller that is dependent on the parameters of the current step, and aware of previous steps. The controller is jointly optimized with the network training so that variable step sizes and evolution time can be adaptively adjusted. We conduct experiments on ImageNet and CIFAR to demonstrate the effectiveness. It is shown that our proposed method is able to improve both stability and accuracy without introducing additional overhead in inference phase