With the availability of user oriented software tools, dedicated architectures, such as the parallel computing
platform and programming model CUDA (Compute Unified Device Architecture) released by NVIDIA,
one of the main producers of graphics cards, and of improved, highly performing GPU (Graphics
Processing Unit) boards, GPGPU (General Purpose programming on GPU) is attracting increasing interest
in the engineering community, for the development of analysis tools suitable to be used in validation/
verification and virtual reality applications. For their inherent explicit and decoupled structure, explicit
dynamics finite element formulations appear to be particularly attractive for implementations on hybrid
CPU/GPU or pure GPU architectures. The issue of an optimized, double-precision finite element GPU
implementation of an explicit dynamics finite element solver for elastic shell problems in small strains
and large displacements and rotations, using unstructured meshes, is here addressed. The conceptual
difference between a GPU implementation directly adapted from a standard CPU approach and a new
optimized formulation, specifically conceived for GPUs, is discussed and comparatively assessed. It is
shown that a speedup factor of about 5 can be achieved by an optimized algorithm reformulation and
careful memory management. A speedup of more than 40 is achieved with respect of state-of-the art
commercial codes running on CPU, obtaining real-time simulations in some cases, on commodity hardware.
When a last generation GPU board is used, it is shown that a problem with more than 16 millions
degrees of freedom can be solved in just few hours of computing time, opening the way to virtualization
approaches for real large scale engineering problems