3 research outputs found

    Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube

    Full text link
    Memories that exploit three-dimensional (3D)-stacking technology, which integrate memory and logic dies in a single stack, are becoming popular. These memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC) design for connecting their internal structural organizations. This novel usage of NoC, in addition to aiding processing-in-memory capabilities, enables numerous benefits such as high bandwidth and memory-level parallelism. However, the implications of NoCs on the characteristics of 3D-stacked memories in terms of memory access latency and bandwidth have not been fully explored. This paper addresses this knowledge gap by (i) characterizing an HMC prototype on the AC-510 accelerator board and revealing its access latency behaviors, and (ii) by investigating the implications of such behaviors on system and software designs

    Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube

    Full text link
    Three-dimensional (3D)-stacking technology, which enables the integration of DRAM and logic dies, offers high bandwidth and low energy consumption. This technology also empowers new memory designs for executing tasks not traditionally associated with memories. A practical 3D-stacked memory is Hybrid Memory Cube (HMC), which provides significant access bandwidth and low power consumption in a small area. Although several studies have taken advantage of the novel architecture of HMC, its characteristics in terms of latency and bandwidth or their correlation with temperature and power consumption have not been fully explored. This paper is the first, to the best of our knowledge, to characterize the thermal behavior of HMC in a real environment using the AC-510 accelerator and to identify temperature as a new limitation for this state-of-the-art design space. Moreover, besides bandwidth studies, we deconstruct factors that contribute to latency and reveal their sources for high- and low-load accesses. The results of this paper demonstrates essential behaviors and performance bottlenecks for future explorations of packet-switched and 3D-stacked memories.Comment: EEE Catalog Number: CFP17236-USB ISBN 13: 978-1-5386-1232-

    Accelerating Checkpoint/Restart Application Performance in Large-Scale Systems with Network Attached Memory

    Get PDF
    Technology scaling and a continual increase in operating frequency have been the main driver of processor performance for several decades. A recent slowdown in this evolution is compensated by multi-core architectures, which challenge application developers and also increase the disparity between the processor and memory performance. The increasing core count and growing scale of computing systems furthermore turn attention to communication as a significant contributor on application run-times. Larger systems also comprise many more components which are subject to failures. In order to mitigate the effects of these failures, fault tolerance techniques such as Checkpoint/Restart are used. These techniques often rely on message-based communication and data transport stresses the local memory interface. In order to reduce communication overhead it is desirable to either decrease the number of messages, or otherwise to accelerate the execution of commonly used global operations. Finally, power consumption of large-scale systems has become a major concern and the efficiency of such systems must considerably improve to allow future Exascale systems to operate within a reasonable power budget. This work addresses the topics memory interface, communication, fault tolerance, and energy efficiency in large-scale systems. It presents Network Attached Memory (NAM), an FPGA-based hardware prototype that can be directly connected to a common high-performance interconnection network in large-scale systems. It provides access to the emerging memory technology Hybrid Memory Cube (HMC) as shared memory resource, tightly integrated with processing elements. The first part introduces the HMC memory architecture and serial interface, and thoroughly evaluates it in an FPGA using a custom-developed host controller, which has become an open-source initiative. The next part describes the hardware architecture of the NAM design and prototype, and theoretically evaluates the expected performance and bottlenecks. The NAM design was fully prototyped in an FPGA and the contribution also comprises a corresponding software stack. As a first use case NAM serves as Checkpoint/Restart target, aiming to reduce inter-node communication and to accelerate the creation of checkpoint parity information. Reducing checkpointing overhead improves application run-times and energy efficiency likewise. The final part of this work evaluates the NAM performance in a 16 node test system. It shows a good read/write scaling behavior for an increasing number of nodes. For Checkpoint/Restart with a real application, a 2.1X improvement over a standard approach is a remarkable result. It proves the successful concept of a dedicated hardware component to reduce communication and fault tolerance overhead for current and future large-scale systems
    corecore