Particle transport simulations are a cornerstone of high-energy physics
(HEP), constituting a substantial part of the computing workload performed in
HEP. To boost the simulation throughput and energy efficiency, GPUs as
accelerators have been explored in recent years, further driven by the
increasing use of GPUs on HPCs. The Accelerated demonstrator of electromagnetic
Particle Transport (AdePT) is an advanced prototype for offloading the
simulation of electromagnetic showers in Geant4 to GPUs, and still undergoes
continuous development and optimization. Improving memory layout and data
access is vital to use modern, massively parallel GPU hardware efficiently,
contributing to the challenge of migrating traditional CPU based data
structures to GPUs in AdePT. The low-level abstraction of memory access (LLAMA)
is a C++ library that provides a zero-runtime-overhead data structure
abstraction layer, focusing on multidimensional arrays of nested, structured
data. It provides a framework for defining and switching custom memory mappings
at compile time to define data layouts and instrument data access, making LLAMA
an ideal tool to tackle the memory-related optimization challenges in AdePT.
Our contribution shares insights gained with LLAMA when instrumenting data
access inside AdePT, complementing traditional GPU profiler outputs. We
demonstrate traces of read/write counts to data structure elements as well as
memory heatmaps. The acquired knowledge allowed for subsequent data layout
optimizations