This report highlights our work on improving GPU parallelization by
supporting compute nodes with multiple GPUs. However, since the default support
for multi-GPUs in OpenACC is limited[6], the current implementation allows each
MPI process to access only a single GPU. Thus, the only way to take full
advantage of multi-GPU nodes in the current version is to launch multiple
processes, which increases resource contention. We investigated the benefits of
having only one process offload to all available GPU devices.Comment: Technical Repor