39 research outputs found
Performance Measurements of BLACS Routines on CRAY T3E
The ScaLAPACK library is based on the BLACS (Basic Linear Algebra Communication Subroutines) library. Unfortunately the optimized SHMEM-based CRAY T3E BLACS library contained in libcomm.a has a bug which leads to problems when sub-grids are created. Therefore it is sometimes necessary to use a MPI-based public domain BLACS library by linking the libraries libblacs.a and liblacsF77init.a.In this report performance measurements are presented using BLACS routines of both libraries in order to gain more information about the differences. Communication routines and a global combine operation will be compared
SUPRENUM software for the symmetric eigenvalue problem
The efficient use of the SUPRENUM computer calls for a careful choice of adequate algorithms and an implementation taking into account the special characteristics of a parallel computer with distributed memory. To demonstrate these facts, well-known algorithms solving the symmetric eigenvalue problem are presented, parallelized in particular for the SUPRENUM machine.The main problems arise in the amount of communication calls, and ways are shown to reduce this amount by using block algorithms rather than the usual ones
Performance Benchmark of Standard Eigensolver in KNL Systems
With the invention of many-core systems like the Intel KNLstandard eigensolver libraries have to be adapted to thosearchitectures. The pure MPIparallelizing strategy may be no longer suited for these newarchitectures because too many MPI processes need too much memory forbuffers. In this talk we will present the first performance evaluation results ofthe eigensolver librariesELPA and EigenExa on the JURECA booster KNL nodes. Both libraries aretuned for KNL usage andthey offer a hybrid parallelization with MPI in combination with OpenMP.The ELPA 2-step eigensolver provides special kernels with KNL intrinsics for the back transformation of eigenvectors andtheir usage indeed leads to better performanceon KNL than using just AVX2 kernels