21 research outputs found
Optimizing differential equations to fit data and predict outcomes
Many scientific problems focus on observed patterns of change or on how to
design a system to achieve particular dynamics. Those problems often require
fitting differential equation models to target trajectories. Fitting such
models can be difficult because each evaluation of the fit must calculate the
distance between the model and target patterns at numerous points along a
trajectory. The gradient of the fit with respect to the model parameters can be
challenging. Recent technical advances in automatic differentiation through
numerical differential equation solvers potentially change the fitting process
into a relatively easy problem, opening up new possibilities to study dynamics.
However, application of the new tools to real data may fail to achieve a good
fit. This article illustrates how to overcome a variety of common challenges,
using the classic ecological data for oscillations in hare and lynx
populations. Models include simple ordinary differential equations (ODEs) and
neural ordinary differential equations (NODEs), which use artificial neural
networks to estimate the derivatives of differential equation systems.
Comparing the fits obtained with ODEs versus NODEs, representing small and
large parameter spaces, and changing the number of variable dimensions provide
insight into the geometry of the observed and model trajectories. To analyze
the quality of the models for predicting future observations, a
Bayesian-inspired preconditioned stochastic gradient Langevin dynamics (pSGLD)
calculation of the posterior distribution of predicted model trajectories
clarifies the tendency for various models to underfit or overfit the data.
Coupling fitted differential equation systems with pSGLD sampling provides a
powerful way to study the properties of optimization surfaces, raising an
analogy with mutation-selection dynamics on fitness landscapes
Hybrid Process Models in Electrochemical Syntheses under Deep Uncertainty
Chemical process engineering and machine learning are merging rapidly, and hybrid process models have shown promising results in process analysis and process design. However, uncertainties in first-principles process models have an adverse effect on extrapolations and inferences based on hybrid process models. Parameter sensitivities are an essential tool to understand better the underlying uncertainty propagation and hybrid system identification challenges. Still, standard parameter sensitivity concepts may fail to address comprehensive parameter uncertainty problems, i.e., deep uncertainty with aleatoric and epistemic contributions. This work shows a highly effective and reproducible sampling strategy to calculate simulation uncertainties and global parameter sensitivities for hybrid process models under deep uncertainty. We demonstrate the workflow with two electrochemical synthesis simulation studies, including the synthesis of furfuryl alcohol and 4-aminophenol. Compared with Monte Carlo reference simulations, the CPU-time was significantly reduced. The general findings of the hybrid model sensitivity studies under deep uncertainty are twofold. First, epistemic uncertainty has a significant effect on uncertainty analysis. Second, the predicted parameter sensitivities of the hybrid process models add value to the interpretation and analysis of the hybrid models themselves but are not suitable for predicting the real process/full first-principles process model’s sensitivities
Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed!
Implicit layer deep learning techniques, like Neural Differential Equations,
have become an important modeling framework due to their ability to adapt to
new problems automatically. Training a neural differential equation is
effectively a search over a space of plausible dynamical systems. However,
controlling the computational cost for these models is difficult since it
relies on the number of steps the adaptive solver takes. Most prior works have
used higher-order methods to reduce prediction timings while greatly increasing
training time or reducing both training and prediction timings by relying on
specific training algorithms, which are harder to use as a drop-in replacement
due to strict requirements on automatic differentiation. In this manuscript, we
use internal cost heuristics of adaptive differential equation solvers at
stochastic time points to guide the training toward learning a dynamical system
that is easier to integrate. We "close the black-box" and allow the use of our
method with any adjoint technique for gradient calculations of the differential
equation solution. We perform experimental studies to compare our method to
global regularization to show that we attain similar performance numbers
without compromising the flexibility of implementation on ordinary differential
equations (ODEs) and stochastic differential equations (SDEs). We develop two
sampling strategies to trade off between performance and training time. Our
method reduces the number of function evaluations to 0.556-0.733x and
accelerates predictions by 1.3-2x
ChronoMID-Cross-modal neural networks for 3-D temporal medical imaging data.
ChronoMID-neural networks for temporally-varying, hence Chrono, Medical Imaging Data-makes the novel application of cross-modal convolutional neural networks (X-CNNs) to the medical domain. In this paper, we present multiple approaches for incorporating temporal information into X-CNNs and compare their performance in a case study on the classification of abnormal bone remodelling in mice. Previous work developing medical models has predominantly focused on either spatial or temporal aspects, but rarely both. Our models seek to unify these complementary sources of information and derive insights in a bottom-up, data-driven approach. As with many medical datasets, the case study herein exhibits deep rather than wide data; we apply various techniques, including extensive regularisation, to account for this. After training on a balanced set of approximately 70000 images, two of the models-those using difference maps from known reference points-outperformed a state-of-the-art convolutional neural network baseline by over 30pp (> 99% vs. 68.26%) on an unseen, balanced validation set comprising around 20000 images. These models are expected to perform well with sparse data sets based on both previous findings with X-CNNs and the representations of time used, which permit arbitrarily large and irregular gaps between data points. Our results highlight the importance of identifying a suitable description of time for a problem domain, as unsuitable descriptors may not only fail to improve a model, they may in fact confound it