1 research outputs found

    Performance Analysis and Optimization of the Tiled Cholesky Factorization on NUMA Machines

    No full text
    International audienceWe discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread placement and data placement in order to achieve performance gain up to 50% compared to state-of-the-art libraries such as Plasma or MKL
    corecore