10,253 research outputs found
MILC staggered conjugate gradient performance on Intel KNL
We review our work done to optimize the staggered conjugate gradient (CG)
algorithm in the MILC code for use with the Intel Knights Landing (KNL)
architecture. KNL is the second gener- ation Intel Xeon Phi processor. It is
capable of massive thread parallelism, data parallelism, and high on-board
memory bandwidth and is being adopted in supercomputing centers for scientific
research. The CG solver consumes the majority of time in production running, so
we have spent most of our effort on it. We compare performance of an MPI+OpenMP
baseline version of the MILC code with a version incorporating the QPhiX
staggered CG solver, for both one-node and multi-node runs.Comment: 8 pages, 4 figure
Distributed-Memory Breadth-First Search on Massive Graphs
This chapter studies the problem of traversing large graphs using the
breadth-first search order on distributed-memory supercomputers. We consider
both the traditional level-synchronous top-down algorithm as well as the
recently discovered direction optimizing algorithm. We analyze the performance
and scalability trade-offs in using different local data structures such as CSR
and DCSC, enabling in-node multithreading, and graph decompositions such as 1D
and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451
- …