186 research outputs found

    Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube

    Full text link
    Three-dimensional (3D)-stacking technology, which enables the integration of DRAM and logic dies, offers high bandwidth and low energy consumption. This technology also empowers new memory designs for executing tasks not traditionally associated with memories. A practical 3D-stacked memory is Hybrid Memory Cube (HMC), which provides significant access bandwidth and low power consumption in a small area. Although several studies have taken advantage of the novel architecture of HMC, its characteristics in terms of latency and bandwidth or their correlation with temperature and power consumption have not been fully explored. This paper is the first, to the best of our knowledge, to characterize the thermal behavior of HMC in a real environment using the AC-510 accelerator and to identify temperature as a new limitation for this state-of-the-art design space. Moreover, besides bandwidth studies, we deconstruct factors that contribute to latency and reveal their sources for high- and low-load accesses. The results of this paper demonstrates essential behaviors and performance bottlenecks for future explorations of packet-switched and 3D-stacked memories.Comment: EEE Catalog Number: CFP17236-USB ISBN 13: 978-1-5386-1232-

    Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube

    Full text link
    Memories that exploit three-dimensional (3D)-stacking technology, which integrate memory and logic dies in a single stack, are becoming popular. These memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC) design for connecting their internal structural organizations. This novel usage of NoC, in addition to aiding processing-in-memory capabilities, enables numerous benefits such as high bandwidth and memory-level parallelism. However, the implications of NoCs on the characteristics of 3D-stacked memories in terms of memory access latency and bandwidth have not been fully explored. This paper addresses this knowledge gap by (i) characterizing an HMC prototype on the AC-510 accelerator board and revealing its access latency behaviors, and (ii) by investigating the implications of such behaviors on system and software designs

    Genome sequence alignment in processing-In-memory architectures

    Get PDF
    Finalmente, también realizamos un estudio experimental de varias arquitecturas con diferentes tecnologías de memoria (DDR y HBM) y núcleos de procesamiento de distintos tipos, explotando, en algunos casos, procesamiento en la memoria (PIM). La aplicación de referencia es Bowtie2, una aplicación completa para el alineamiento de secuencias en el genoma. La implementación y evaluación de estas arquitecturas se realiza utilizando un simulador arquitectural basado en gem5.La combinación de la aparición de un cuello de botella en el acceso a los datos y la creciente importancia de las aplicaciones de procesamiento intensivo de datos, muy limitadas por el sistema de memoria, crea un importante problema que debe ser abordado. Por ello, en esta tesis nos proponemos afrontar este problema e intentar reducir su efecto en la medida de lo posible. El principal objetivo de esta tesis es el diseño de nuevas soluciones arquitecturales y algorítmicas para superar el problema del cuello de botella conocido como memory-wall y mejorar el rendimiento de aplicaciones con gran uso de memoria que no son capaces de beneficiarse lo suficiente de las jerarquías de memoria actuales. Además, creemos que es esencial centrarse en la eficiencia energética, un factor cuya importancia crece cada día y uno de los factores más limitantes en la computación de alto rendimiento. Las principales contribuciones de esta tesis son: Primero, analizamos el comportamiento de aplicaciones con accesos de memoria aleatorios, que no aprovechan correctamente las nuevas arquitecturas de memoria con jerarquías cache profundas. Específicamente, analizamos la estructura de datos FM-index y un algoritmo de búsqueda de secuencias basado en esa estructura, ampliamente usado en el alineamiento de secuencias en el genoma. Después de este análisis y de obtener un conocimiento más detallado del cuello de botella de la memoria, proponemos una nueva versión de FM-index que permite reducir el consumo de ancho de banda de memoria, de forma que mejora significativamente el rendimiento computacional. Posteriormente, proponemos una nueva arquitectura energéticamente eficiente, basada en un cubo de memoria en 3D (3D-Stacked) al que añadimos unos núcleos sencillos de bajo consumo en su capa lógica. Esta arquitectura permite la ejecución cerca de los datos (near-data-processing

    A Modern Primer on Processing in Memory

    Full text link
    Modern computing systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in computing that cause performance, scalability and energy bottlenecks: (1) data access is a key bottleneck as many important applications are increasingly data-intensive, and memory bandwidth and energy do not scale well, (2) energy consumption is a key limiter in almost all computing platforms, especially server and mobile systems, (3) data movement, especially off-chip to on-chip, is very expensive in terms of bandwidth, energy and latency, much more so than computation. These trends are especially severely-felt in the data-intensive server and energy-constrained mobile systems of today. At the same time, conventional memory technology is facing many technology scaling challenges in terms of reliability, energy, and performance. As a result, memory system architects are open to organizing memory in different ways and making it more intelligent, at the expense of higher cost. The emergence of 3D-stacked memory plus logic, the adoption of error correcting codes inside the latest DRAM chips, proliferation of different main memory standards and chips, specialized for different purposes (e.g., graphics, low-power, high bandwidth, low latency), and the necessity of designing new solutions to serious reliability and security issues, such as the RowHammer phenomenon, are an evidence of this trend. This chapter discusses recent research that aims to practically enable computation close to data, an approach we call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, or in the memory controllers), so that data movement between the computation units and memory is reduced or eliminated.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0398

    Memory Hierarchy Design for Next Generation Scalable Many-core Platforms

    Get PDF
    Performance and energy consumption in modern computing platforms is largely dominated by the memory hierarchy. The increasing computational power in the multiprocessors and accelerators, and the emergence of the data-intensive workloads (e.g. large-scale graph traversal and scientific algorithms) requiring fast transfer of large volumes of data, are two main trends which intensify this problem by putting even higher pressure on the memory hierarchy. This increasing gap between computation speed and data transfer speed is commonly referred as the “memory wall” problem. With the emergence of heterogeneous Three Dimensional (3D) Integration based on through-silicon-vias (TSV), this situation has started to recover in the past years. On one hand, it is now possible to improve memory access bandwidth and/or latency by either stacking memories directly on top of processors or through abstracted memory interfaces such as Micron’s Hybrid Memory Cube (HMC). On the other hand, near memory computation has become worthy of revisiting due to the cost-effective integration of logic and memory in 3D stacks. These two directions bring about several interesting opportunities including performance improvement, energy and cost reduction, product miniaturization, and modular design for improved time to market. In this research, we study the effectiveness of the 3D integration technology and the optimization opportunities which it can provide in the different layers of the memory hierarchy in cluster-based many-core platforms ranging from intra-cluster L1 to inter-cluster L2 scratchpad memories (SPMs), as well as the main memory. In addition, by moving a part of the computation to where data resides, in the 3D-stacked memory context, we demonstrate further energy and performance improvement opportunities

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends
    corecore