Finite element analysis of solid mechanics is a foundational tool of modern
engineering, with low-order finite element methods and assembled sparse
matrices representing the industry standard for implicit analysis. We use
performance models and numerical experiments to demonstrate that high-order
methods greatly reduce the costs to reach engineering tolerances while enabling
effective use of GPUs. We demonstrate the reliability, efficiency, and
scalability of matrix-free p-multigrid methods with algebraic multigrid
coarse solvers through large deformation hyperelastic simulations of multiscale
structures. We investigate accuracy, cost, and execution time on multi-node CPU
and GPU systems for moderate to large models using AMD MI250X (OLCF Crusher),
NVIDIA A100 (NERSC Perlmutter), and V100 (LLNL Lassen and OLCF Summit),
resulting in order of magnitude efficiency improvements over a broad range of
model properties and scales. We discuss efficient matrix-free representation of
Jacobians and demonstrate how automatic differentiation enables rapid
development of nonlinear material models without impacting debuggability and
workflows targeting GPUs