Implicit layer deep learning techniques, like Neural Differential Equations,
have become an important modeling framework due to their ability to adapt to
new problems automatically. Training a neural differential equation is
effectively a search over a space of plausible dynamical systems. However,
controlling the computational cost for these models is difficult since it
relies on the number of steps the adaptive solver takes. Most prior works have
used higher-order methods to reduce prediction timings while greatly increasing
training time or reducing both training and prediction timings by relying on
specific training algorithms, which are harder to use as a drop-in replacement
due to strict requirements on automatic differentiation. In this manuscript, we
use internal cost heuristics of adaptive differential equation solvers at
stochastic time points to guide the training toward learning a dynamical system
that is easier to integrate. We "close the black-box" and allow the use of our
method with any adjoint technique for gradient calculations of the differential
equation solution. We perform experimental studies to compare our method to
global regularization to show that we attain similar performance numbers
without compromising the flexibility of implementation on ordinary differential
equations (ODEs) and stochastic differential equations (SDEs). We develop two
sampling strategies to trade off between performance and training time. Our
method reduces the number of function evaluations to 0.556-0.733x and
accelerates predictions by 1.3-2x