2 research outputs found

    Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube

    Full text link
    Memories that exploit three-dimensional (3D)-stacking technology, which integrate memory and logic dies in a single stack, are becoming popular. These memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC) design for connecting their internal structural organizations. This novel usage of NoC, in addition to aiding processing-in-memory capabilities, enables numerous benefits such as high bandwidth and memory-level parallelism. However, the implications of NoCs on the characteristics of 3D-stacked memories in terms of memory access latency and bandwidth have not been fully explored. This paper addresses this knowledge gap by (i) characterizing an HMC prototype on the AC-510 accelerator board and revealing its access latency behaviors, and (ii) by investigating the implications of such behaviors on system and software designs

    Near-memory primitive support and infratructure for sparse algorithm

    Get PDF
    This thesis introduces an approach to solving the problem of memory latency performance penalties with traditional accelerators. By introducing simple near-data-processing (NDP) accelerators for primitives such as SpMV (Sparse Matrix Multiplication of Vectors) and DGEMM (Double Precision Dense Matrix Multiplication) kernels, applications can achieve a considerable performance boost. We evaluate our work for SuperLU application for the HPC community. Thesis Statement: Reevaluating core primitives such as DGEMM, SCATTER, and GATHER for 3D-stacked PIM architectures that incorporate re-configurable fabrics can deliver multi-fold performance improvements for SUPERLU and other sparse algorithms.M.S
    corecore