This paper introduces an efficient and generic framework for finite-element
simulations under an implicit time integration scheme. Being compatible with
generic constitutive models, a fast matrix assembly method exploits the fact
that system matrices are created in a deterministic way as long as the mesh
topology remains constant. Using the sparsity pattern of the assembled system
brings about significant optimizations on the assembly stage. As a result,
developed techniques of GPU-based parallelization can be directly applied with
the assembled system. Moreover, an asynchronous Cholesky precondition scheme is
used to improve the convergence of the system solver. On this basis, a
GPU-based Cholesky preconditioner is developed, significantly reducing the data
transfer between the CPU/GPU during the solving stage. We evaluate the
performance of our method with different mesh elements and hyperelastic models
and compare it with typical approaches on the CPU and the GPU