Hydrogen diffusion in metals and alloys plays an important role in the
discovery of new materials for fuel cell and energy storage technology. While
analytic models use hand-selected features that have clear physical ties to
hydrogen diffusion, they often lack accuracy when making quantitative
predictions. Machine learning models are capable of making accurate
predictions, but their inner workings are obscured, rendering it unclear which
physical features are truly important. To develop interpretable machine
learning models to predict the activation energies of hydrogen diffusion in
metals and random binary alloys, we create a database for physical and chemical
properties of the species and use it to fit six machine learning models. Our
models achieve root-mean-squared-errors between 98-119 meV on the testing data
and accurately predict that elemental Ru has a large activation energy, while
elemental Cr and Fe have small activation energies.By analyzing the feature
importances of these fitted models, we identify relevant physical properties
for predicting hydrogen diffusivity. While metrics for measuring the individual
feature importances for machine learning models exist, correlations between the
features lead to disagreement between models and limit the conclusions that can
be drawn. Instead grouped feature importances, formed by combining the features
via their correlations, agree across the six models and reveal that the two
groups containing the packing factor and electronic specific heat are
particularly significant for predicting hydrogen diffusion in metals and random
binary alloys. This framework allows us to interpret machine learning models
and enables rapid screening of new materials with the desired rates of hydrogen
diffusion.Comment: 36 pages, 8 figures, supplemental materia