Deep learning is increasingly becoming a promising pathway to improving the
accuracy of sub-grid scale (SGS) turbulence closure models for large eddy
simulations (LES). We leverage the concept of differentiable turbulence,
whereby an end-to-end differentiable solver is used in combination with
physics-inspired choices of deep learning architectures to learn highly
effective and versatile SGS models for two-dimensional turbulent flow. We
perform an in-depth analysis of the inductive biases in the chosen
architectures, finding that the inclusion of small-scale non-local features is
most critical to effective SGS modeling, while large-scale features can improve
pointwise accuracy of the a-posteriori solution field. The filtered velocity
gradient tensor can be mapped directly to the SGS stress via decomposition of
the inputs and outputs into isotropic, deviatoric, and anti-symmetric
components. We see that the model can generalize to a variety of flow
configurations, including higher and lower Reynolds numbers and different
forcing conditions. We show that the differentiable physics paradigm is more
successful than offline, a-priori learning, and that hybrid solver-in-the-loop
approaches to deep learning offer an ideal balance between computational
efficiency, accuracy, and generalization. Our experiments provide physics-based
recommendations for deep-learning based SGS modeling for generalizable closure
modeling of turbulence