2 research outputs found

    JArena: Partitioned Shared Memory for NUMA-awareness in Multi-threaded Scientific Applications

    Full text link
    The distributed shared memory (DSM) architecture is widely used in today's computer design to mitigate the ever-widening processing-memory gap, and inevitably exhibits non-uniform memory access (NUMA) to shared-memory parallel applications. Failure to achieve full NUMA-awareness can significantly downgrade application performance, especially on today's manycore platforms with tens to hundreds of cores. Yet traditional approaches such as first-touch and memory policy fail short in either false page-sharing, fragmentation, or ease-of-use. In this paper, we propose a partitioned shared memory approach which allows multi-threaded applications to achieve full NUMA-awareness with only minor code changes and develop a companying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation. Experiments on a 256-core cc-NUMA computing node show that the proposed approach achieves true NUMA-awareness and improves the performance of typical multi-threaded scientific applications up to 4.3 folds with the increased use of cores.Comment: 12 pages, 3 figures, submitted to Euro-Par 201

    JSweep: A Patch-centric Data-driven Approach for Parallel Sweeps on Large-scale Meshes

    Full text link
    In mesh-based numerical simulations, sweep is an important computation pattern. During sweeping a mesh, computations on cells are strictly ordered by data dependencies in given directions. Due to such a serial order, parallelizing sweep is challenging, especially for unstructured and deforming structured meshes. Meanwhile, recent high-fidelity multi-physics simulations of particle transport, including nuclear reactor and inertial confinement fusion, require {\em sweeps} on large scale meshes with billions of cells and hundreds of directions. In this paper, we present JSweep, a parallel data-driven computational framework integrated in the JAxMIN infrastructure. The essential of JSweep is a general patch-centric data-driven abstraction, coupled with a high performance runtime system leveraging hybrid parallelism of MPI+threads and achieving dynamic communication on contemporary multi-core clusters. Built on JSweep, we implement a representative data-driven algorithm, Sn transport, featuring optimizations of vertex clustering, multi-level priority strategy and patch-angle parallelism. Experimental evaluation with two real-world applications on structured and unstructured meshes respectively, demonstrates that JSweep can scale to tens of thousands of processor cores with reasonable parallel efficiency.Comment: 10 pages, 17 figure
    corecore