1 research outputs found
Performance Analysis and Optimization of the Tiled Cholesky Factorization on NUMA Machines
International audienceWe discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread placement and data placement in order to achieve performance gain up to 50% compared to state-of-the-art libraries such as Plasma or MKL