The modeling of jet substructure significantly differs between Parton Shower
Monte Carlo (PSMC) programs. Despite this, we observe that machine learning
classifiers trained on different PSMCs learn nearly the same function. This
means that when these classifiers are applied to the same PSMC for testing,
they result in nearly the same performance. This classifier universality
indicates that a machine learning model trained on one simulation and tested on
another simulation (or data) will likely be optimal. Our observations are based
on detailed studies of shallow and deep neural networks applied to simulated
Lorentz boosted Higgs jet tagging at the LHC.Comment: 25 pages, 7 figures, 7 table