43 research outputs found

    Design and Analysis of Low-power SRAMs

    Get PDF
    The explosive growth of battery operated devices has made low-power design a priority in recent years. Moreover, embedded SRAM units have become an important block in modern SoCs. The increasing number of transistor count in the SRAM units and the surging leakage current of the MOS transistors in the scaled technologies have made the SRAM unit a power hungry block from both dynamic and static perspectives. Owing to high bitline voltage swing during write operation, the write power consumption is dominated the dynamic power consumption. The static power consumption is mainly due to the leakage current associated with the SRAM cells distributed in the array. Moreover, as supply voltage decreases to tackle the power consumption, the data stability of the SRAM cells have become a major concern in recent years. To reduce the write power consumption, several schemes such as row based sense amplifying cell (SAC) and hierarchical bitline sense amplification (HBLSA) have been proposed. However, these schemes impose architectural limitations on the design in terms of the number of words on a row. Beside, the effectiveness of these methods is limited to the dynamic power consumption. Conventionally, reduction of the cell supply voltage and exploiting the body effect has been suggested to reduce the cell leakage current. However, variation of the supply voltage of the cell associates with a higher dynamic power consumption and reduced cell data stability. Conventionally qualified by Static Noise Margin (SNM), the ability of the cell to retain the data is reduced under a lower supply voltage conditions. In this thesis, we revisit the concept of data stability from the dynamic perspective. A new criteria for the data stability of the SRAM cell is defined. The new criteria suggests that the access time and non-access time (recovery time) of the cell can influence the data stability in a SRAM cell. The speed vs. stability trade-off opens new opportunities for aggressive power reduction for low-power applications. Experimental results of a test chip implemented in a 130 nm CMOS technology confirmed the concept and opened a ground for introduction of a new operational mode for the SRAM cells. We introduced a new architecture; Segmented Virtual Grounding (SVGND) to reduce the dynamic and static power reduction in SRAM units at the same time. Thanks to the new concept for the data stability in SRAM cells, we introduced the new operational mode of Accessed Retention Mode (AR-Mode) to the SRAM cell. In this mode, the accessed SRAM cell can retain the data, however, it does not discharge the bitline. The new architecture outperforms the recently reported low-power schemes in terms of dynamic power consumption, thanks to the exclusive discharge of the bitline and the cell virtual ground. In addition, the architecture reduces the leakage current significantly since it uses the back body biasing in both load and drive transistors. A 40Kb SRAM unit based on SVGND architecture is implemented in a 130 nm CMOS technology. Experimental results exhibit a remarkable static and dynamic power reduction compared to the conventional and previously reported low-power schemes as expect from the simulation results

    Design Of High Performance Comparator Using Mixed Logic Line Decorder

    Get PDF
    This paper presents a combined reasoning layout method for line decoders, by combining pass transistor double worth logic, transmission gateway logic and also fixed complementary metal-oxide semiconductor. Two brand-new geographies are presented for the 2-4 decoders, a 14-transistor geography aiming on reducing transistor matter and also power dissipation and also a 15-transistor topology aiming above power-delay efficiency. In each instance both normal as well as inverting decoders are applied, yielding a total amount of four brand-new designs. Moreover, by utilizing mixed-logic 2-4 decoders integrated with basic CMOS blog post decoder, designed 4 new 4-16 decoders. All proposed decoders have full-swinging capability and also reduced transistor matter compared to their traditional CMOS equivalents. Finally, a variety of comparative EZ wave simulations at the 130nm (PYXIS GDK) shows that the recommended circuits provide a substantial improvement in power and delay, exceeding CMOS in almost all situations

    Re-designing Main Memory Subsystems with Emerging Monolithic 3D (M3D) Integration and Phase Change Memory Technologies

    Get PDF
    Over the past two decades, Dynamic Random-Access Memory (DRAM) has emerged as the dominant technology for implementing the main memory subsystems of all types of computing systems. However, inferring from several recent trends, computer architects in both the industry and academia have widely accepted that the density (memory capacity per chip area) and latency of DRAM based main memory subsystems cannot sufficiently scale in the future to meet the requirements of future data-centric workloads related to Artificial Intelligence (AI), Big Data, and Internet-of-Things (IoT). In fact, the achievable density and access latency in main memory subsystems presents a very fundamental trade-off. Pushing for a higher density inevitably increases access latency, and pushing for a reduced access latency often leads to a decreased density. This trade-off is so fundamental in DRAM based main memory subsystems that merely looking to re-architect DRAM subsystems cannot improve this trade-off, unless disruptive technological advancements are realized for implementing main memory subsystems. In this thesis, we focus on two key contributions to overcome the density (represented as the total chip area for the given capacity) and access latency related challenges in main memory subsystems. First, we show that the fundamental area-latency trade-offs in DRAM can be significantly improved by redesigning the DRAM cell-array structure using the emerging monolithic 3D (M3D) integration technology. A DRAM bank structure can be split across two or more M3D-integrated tiers on the same DRAM chip, to consequently be able to significantly reduce the total on-chip area occupancy of the DRAM bank and its access peripherals. This approach is fundamentally different from the well known approach of through-silicon vias (TSVs)-based 3D stacking of DRAM tiers. This is because the M3D integration based approach does not require a separate DRAM chip per tier, whereas the 3D-stacking based approach does. Our evaluation results for PARSEC benchmarks show that our designed M3D DRAM cellarray organizations can yield up to 9.56% less latency and up to 21.21% less energy-delay product (EDP), with up to 14% less DRAM die area, compared to the conventional 2D DDR4 DRAM. Second, we demonstrate a pathway for eliminating the write disturbance errors in single-level-cell PCM, thereby positioning the PCM technology, which has inherently more relaxed density and latency trade-off compared to DRAM, as a more viable option for replacing the DRAM technology. We introduce low-temperature partial-RESET operations for writing ‘0’s in PCM cells. Compared to traditional operations that write \u270\u27s in PCM cells, partial-RESET operations do not cause disturbance errors in neighboring cells during PCM writes. The overarching theme that connects the two individual contributions into this single thesis is the density versus latency argument. The existing PCM technology has 3 to 4× higher write latency compared to DRAM; nevertheless, the existing PCM technology can store 2 to 4 bits in a single cell compared to one bit per cell storage capacity of DRAM. Therefore, unlike DRAM, it becomes possible to increase the density of PCM without consequently increasing PCM latency. In other words, PCM exhibits inherently improved (more relaxed) density and latency trade-off. Thus, both of our contributions in this thesis, the first contribution of re-designing DRAM with M3D integration technology and the second contribution of making the PCM technology a more viable replacement of DRAM by eliminating the write disturbance errors in PCM, connect to the common overarching goal of improving the density and latency trade-off in main memory subsystems. In addition, we also discuss in this thesis possible future research directions that are aimed at extending the impacts of our proposed ideas so that they can transform the performance of main memory subsystems of the future

    myCACTI: A new cache design tool for pipelined nanometer caches

    Get PDF
    TThe presence of caches in microprocessors has always been one of the most important techniques in bridging the memory wall, or the speed gap between the microprocessor and main memory. This importance is continuously increasing especially as we enter the regime of nanometer process technologies (i.e. 90nm and below), as industry has favored investing a larger and larger fraction of a chip.s transistor budget to improving the on-chip cache. This is the case in practice, as it has proven to be an efficient way to utilize the increasing number of transistors available with each succeeding technology. Consequently, it becomes even more important to have cache design tools that give accurate representations of designs that exist in actual microprocessors. The prevalent cache design tools that are the most widely used in academe are CACTI [Wilton1996] and eCACTI [Mamidipaka2004], and these have proven to be very useful tools not just for cache designers, but also for computer architects. This dissertation will show that both CACTI and eCACTI still contain major limitations and even flaws in their design, making them unsuitable for use in very-deep submicron and nanometer caches, especially pipelined designs. These limitations and flaws will be discussed in detail. This dissertation then introduces a new tool, called myCACTI, that addresses all these limitations and, in addition, introduces major enhancements to the simulation framework. This dissertation then demonstrates the use of myCACTI in the cache design process. Detailed design space explorations are done on multiple cache configurations to produce pareto optimal curves of the caches to show optimal implementations. Detailed studies are also performed to characterize the delay and power dissipation of different cache configurations and implementations. Finally, future directions to the development of myCACTI are identified to show possible ways that the tool can be improved in such a way as to allow even more different kinds of studies to be performed

    A low-power cache system for high-performance processors

    Get PDF
    制度:新 ; 報告番号:甲3439号 ; 学位の種類:博士(工学) ; 授与年月日:12-Sep-11 ; 早大学位記番号:新576

    Doctor of Philosophy

    Get PDF
    dissertationThe computing landscape is undergoing a major change, primarily enabled by ubiquitous wireless networks and the rapid increase in the use of mobile devices which access a web-based information infrastructure. It is expected that most intensive computing may either happen in servers housed in large datacenters (warehouse- scale computers), e.g., cloud computing and other web services, or in many-core high-performance computing (HPC) platforms in scientific labs. It is clear that the primary challenge to scaling such computing systems into the exascale realm is the efficient supply of large amounts of data to hundreds or thousands of compute cores, i.e., building an efficient memory system. Main memory systems are at an inflection point, due to the convergence of several major application and technology trends. Examples include the increasing importance of energy consumption, reduced access stream locality, increasing failure rates, limited pin counts, increasing heterogeneity and complexity, and the diminished importance of cost-per-bit. In light of these trends, the memory system requires a major overhaul. The key to architecting the next generation of memory systems is a combination of the prudent incorporation of novel technologies, and a fundamental rethinking of certain conventional design decisions. In this dissertation, we study every major element of the memory system - the memory chip, the processor-memory channel, the memory access mechanism, and memory reliability, and identify the key bottlenecks to efficiency. Based on this, we propose a novel main memory system with the following innovative features: (i) overfetch-aware re-organized chips, (ii) low-cost silicon photonic memory channels, (iii) largely autonomous memory modules with a packet-based interface to the proces- sor, and (iv) a RAID-based reliability mechanism. Such a system is energy-efficient, high-performance, low-complexity, reliable, and cost-effective, making it ideally suited to meet the requirements of future large-scale computing systems

    Doctor of Philosophy in Computing

    Get PDF
    dissertatio

    Circuit and Architecture Co-Design of STT-RAM for High Performance and Low Energy

    Get PDF
    Spin-Transfer Torque Random Access Memory (STT-RAM) has been proved a promising emerging nonvolatile memory technology suitable for many applications such as cache mem- ory of CPU. Compared with other conventional memory technology, STT-RAM offers many attractive features such as nonvolatility, fast random access speed and extreme low leakage power. However, STT-RAM is still facing many challenges. First of all, programming STT-RAM is a stochastic process due to random thermal fluctuations, so the write errors are hard to avoid. Secondly, the existing STT-RAM cell designs can be used for only single-port accesses, which limits the memory access bandwidth and constraints the system performance. Finally, while other memory technology supports multi-level cell (MLC) design to boost the storage density, adopting MLC to STT-RAM brings many disadvantages such as requirement for large transistor and low access speed. In this work, we proposed solutions on both circuit and architecture level to address these challenges. For the write error issues, we proposed two probabilistic methods, namely write-verify- rewrite with adaptive period (WRAP) and verify-one-while-writing (VOW), for performance improvement and write failure reduction. For dual-port solution, we propose the design methods to support dual-port accesses for STT-RAM. The area increment by introducing an additional port is reduced by leveraging the shared source-line structure. Detailed analysis on the performance/reliability degrada- tion caused by dual-port accesses is performed, and the corresponding design optimization is provided. To unleash the potential of MLC STT-RAM cache, we proposed a new design through a cross-layer co-optimization. The memory cell structure integrated the reversed stacking of magnetic junction tunneling (MTJ) for a more balanced device and design trade-off. In architecture development, we presented an adaptive mode switching mechanism: based on application’s memory access behavior, the MLC STT-RAM cache can dynamically change between low latency SLC mode and high capacity MLC mode. Finally, we present a 4Kb test chip design which can support different types and sizes of MTJs. A configurable sensing solution is used in the test chip so that it can support wide range of MTJ resistance. Such test chip design can help to evaluate various type of MTJs in the future

    Scavenger: A New Last Level Cache Architecture with Global Block Priority

    Get PDF
    corecore