Machine learning methods are increasingly used to build computationally
inexpensive surrogates for complex physical models. The predictive capability
of these surrogates suffers when data are noisy, sparse, or time-dependent. As
we are interested in finding a surrogate that provides valid predictions of any
potential future model evaluations, we introduce an online learning method
empowered by optimizer-driven sampling. The method has two advantages over
current approaches. First, it ensures that all turning points on the model
response surface are included in the training data. Second, after any new model
evaluations, surrogates are tested and "retrained" (updated) if the "score"
drops below a validity threshold. Tests on benchmark functions reveal that
optimizer-directed sampling generally outperforms traditional sampling methods
in terms of accuracy around local extrema, even when the scoring metric favors
overall accuracy. We apply our method to simulations of nuclear matter to
demonstrate that highly accurate surrogates for the nuclear equation of state
can be reliably auto-generated from expensive calculations using a few model
evaluations.Comment: 13 pages, 6 figures, submitted to Nature Machine Intelligenc