16 research outputs found

    DESTINY: A Comprehensive Tool with 3D and Multi-Level Cell Memory Modeling Capability

    Get PDF
    To enable the design of large capacity memory structures, novel memory technologies such as non-volatile memory (NVM) and novel fabrication approaches, e.g., 3D stacking and multi-level cell (MLC) design have been explored. The existing modeling tools, however, cover only a few memory technologies, technology nodes and fabrication approaches. We present DESTINY, a tool for modeling 2D/3D memories designed using SRAM, resistive RAM (ReRAM), spin transfer torque RAM (STT-RAM), phase change RAM (PCM) and embedded DRAM (eDRAM) and 2D memories designed using spin orbit torque RAM (SOT-RAM), domain wall memory (DWM) and Flash memory. In addition to single-level cell (SLC) designs for all of these memories, DESTINY also supports modeling MLC designs for NVMs. We have extensively validated DESTINY against commercial and research prototypes of these memories. DESTINY is very useful for performing design-space exploration across several dimensions, such as optimizing for a target (e.g., latency, area or energy-delay product) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e., 2D v/s 3D) for a given optimization target, etc. We believe that DESTINY will boost studies of next-generation memory architectures used in systems ranging from mobile devices to extreme-scale supercomputers. The latest source-code of DESTINY is available from the following git repository: https://bitbucket.org/sparsh_mittal/destiny_v2

    Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

    Full text link

    Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0

    Get PDF
    Journal ArticleA significant part of future microprocessor real estate will be dedicated to L2 or L3 caches. These on-chip caches will heavily impact processor performance, power dissipation, and thermal management strategies. There are a number of interconnect design considerations that influence power/performance/area characteristics of large caches, such as wire models (width/spacing/repeaters), signaling strategy (RC/differential/transmission), router design, etc. Yet, to date, there exists no analytical tool that takes all of these parameters into account to carry out a design space exploration for large caches and estimate an optimal organization. In this work, we implement two major extensions to the CACTI cache modeling tool that focus on interconnect design for a large cache. First, we add the ability to model different types of wires, such as RC-based wires with different power/delay characteristics and differential low-swing buses. Second, we add the ability to model Non-uniform Cache Access (NUCA). We not only adopt state-of-the-art design space exploration strategies for NUCA, we also enhance this exploration by considering on-chip network contention and a wider spectrum of wiring and routing choices. We present a validation analysis of the new tool (to be released as CACTI 6.0) and present a case study to showcase how the tool can improve architecture research methodologies

    Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0

    Get PDF
    Journal ArticleA significant part of future microprocessor real estate will be dedicated to L2 or L3 caches. These on-chip caches will heavily impact processor performance, power dissipation, and thermal management strategies. There are a number of interconnect design considerations that influence power/performance/area characteristics of large caches, such as wire models (width/spacing/repeaters), signaling strategy (RC/differential/transmission), router design, etc. Yet, to date, there exists no analytical tool that takes all of these parameters into account to carry out a design space exploration for large caches and estimate an optimal organization. In this work, we implement two major extensions to the CACTI cache modeling tool that focus on interconnect design for a large cache. First, we add the ability to model different types of wires, such as RC-based wires with different power/delay characteristics and differential low-swing buses. Second, we add the ability to model Non-uniform Cache Access (NUCA). We not only adopt state-of-the-art design space exploration strategies for NUCA, we also enhance this exploration by considering on-chip network contention and a wider spectrum of wiring and routing choices. We present a validation analysis of the new tool (to be released as CACTI 6.0) and present a case study to showcase how the tool can improve architecture research methodologies

    Understanding the impact of 3D stacked layouts on ILP

    Get PDF
    Journal Article3D die-stacked chips can alleviate the penalties imposed by long wires within micro-processor circuits. Many recent studies have attempted to partition each microprocessor structure across three dimensions to reduce their access times. In this paper, we implement each microprocessor structure on a single 2D die and leverage 3D to reduce the lengths of wires that communicate data between microprocessor structures within a single core. We begin with a criticality analysis of inter-structure wire delays and show that for most tra- ditional simple superscalar cores, 2D floorplans are already very efficient at minimizing critical wire delays. For an aggressive wire-constrained clustered superscalar architecture, an exploration of the design space reveals that 3D can yield higher benefit. However, this benefit may be negated by the higher power density and temperature entailed by 3D integration. Overall, we report a negative result and argue against leveraging 3D for higher ILP

    Thermal modeling and analysis of 3D multi-processor chips

    Get PDF
    As 3D chip multi-processors (3D-CMPs) become the main trend in processor development, various thermal management strategies have been recently proposed to optimize system performance while controlling the temperature of the system to stay below a threshold. These thermal-aware policies require the envision of high-level models that capture the complex thermal behavior of (nano)structures that build the 3D stack. Moreover, the floorplanning of the chip strongly determines the thermal profile of the system and a quick exploration of the design space is required to minimize the damage of the thermal effects

    Leveraging 3D technology for improved reliability

    Get PDF
    Journal ArticleAggressive technology scaling over the years has helped improve processor performance but has caused a reduction in processor reliability. Shrinking transistor sizes and lower supply voltages have increased the vulnerability of computer systems towards transient faults. An increase in within-die and die-to-die parameter variations has also led to a greater number of dynamic timing errors. A potential solution to mitigate the impact of such errors is redundancy via an in-order checker processor. Emerging 3D performance as well as reduced power consumption because of shorter on-chip wires. In this paper, we leverage the "snap-on" functionality provided by 3D integration and propose implementing the redundant checker processor on a second die. This allows manufacturers to easily create a family of "reliable processors" without significantly impacting the cost or performance for customers that care less about reliability. We comprehensively evaluate design choices for this second die, including the effects of L2 cache organization, deep pipelining, and frequency. An interesting feature made possible by 3D integration is the incorporation of heterogeneous process technologies within a single chip

    Technology Implications for Large Last-Level Caches

    Get PDF
    Large last-level cache (L3C) is efficient for bridging the performance and power gap between processor and memory. Several memory technologies, including SRAM, STT-RAM (MRAM), and embedded DRAM (eDRAM), have been used or considered as the technology to implement L3Cs. However, each of them has inherent weaknesses: SRAM is relatively low density and dissipates high leakage; STT-RAM has long write latency and requires high write energy; eDRAM requires refresh. As future processors are expected to have larger last-level caches, the objective of this dissertation is to study the tradeoffs associated with using each of these technologies to implement L3Cs. In order to make useful comparisons between L3Cs built with SRAM, STT-RAM, and eDRAM, we consider and implement several levels of details. First, to obtain unbiased cache performance and power properties (i.e., read/write access latency, read/write access energy, leakage power, refresh power, area), we prototype caches based on realistic memory and device models. Second, we present simplistic analytical models that enable us to quickly examine different memory technologies under various scenarios. Third, we review power-optimization techniques for each of the technologies, and propose using a low-cost dead-line prediction scheme for eDRAM-based L3Cs to eliminate unnecessary refreshes. Finally, the highlight of this dissertation is the comparison and analysis of low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. We report system performance, last-level cache energy breakdown, and memory hierarchy energy breakdown, using an augmented full-system simulator with the execution of a range of workloads and input sets. From the insights gained through simulation results, STT-RAM has the highest potential to save energy in future L3C designs. For contemporary processors, SRAM-based L3C results in the fastest system performance, whereas eDRAM consumes the lowest energy

    Hardware/Software Co-design for Multicore Architectures

    Get PDF
    Siirretty Doriast
    corecore