64 research outputs found

    Design of Address Decoder and Sense Amplifier for SRAM

    Get PDF
    Address decoder and sense amplifier is important component of SRAM memory. Selection of storage cell and read operation is depends on decoder and sense amplifier respectively. Hence, performance of SRAM is depends on these components. This work survey the address decoder and sense amplifier for SRAM memory, concentrating on delay optimization and power efficient circuit techniques. We have concentrated on optimal decoder structure with least number of transistors to reduce area of SRAM In static decoders we have stared with simple AND gate decoder and its result is examined. These simple decoder are neither area efficient nor faster one because AND/OR gate are not natural gates, they are made up from combination of NAND/NOR and NOT gate. Decoder having only NOR/NAND gate are area efficient and fast too. Therefore universal decoding having NAND-NOR alternate stages scheme is taken and examined. Universal decoding scheme are having some serious issue like different path delay which may results in false decoding as well as extra power dissipation. To overcome from this issue Novel Address decoding scheme is implemented and their result is compared with simple AND decoder and Universal decoder. Novel address decoder circuit is presented and analyzed. Novel address decoder using NAND-NOR alternate stages with pre-decoder and replica inverter chain circuit is implemented successfully. Current mirror sense-amp and latched type sense amplifier is also implemented for SRAM. These two amplifiers are the basic one and having tremendous advantage due to their small size. They are fast enough and can be fit below the SRAM cell. We have implemented and tested 1Kb; 8 bit; 1.25GHz SRAM memory in Cadence by using UMC 90nm technology, for that decoder and sense amplifier is deployed

    Designing Method of Compact n-to-2n Decoders

    Get PDF
    What decoder is, everyone knows. The paper presents fast and efficient method of layouts design of n-to- 2n-lines decoders. Two scenarios of layout arrangement are proposed and described. Based on a few building blocks only, especially prepared, and appropriate procedure of their placement, a decoder of any size can be build. Layouts of all needed fundamental blocks were designed in CMOS technology, as standard library. Moreover, some important parameters, such area, power dissipation and delay, were assessed and compared for decoders designed with proposed method and traditional. Power consumption were considered under extended model, which takes into account changes of input vectors, not only switching activity factor. All designs were done in UMC 180 CMOS technology

    Low-Power High-Performance Ternary Content Addressable Memory Circuits

    Get PDF
    Ternary content addressable memories (TCAMs) are hardware-based parallel lookup tables with bit-level masking capability. They are attractive for applications such as packet forwarding and classification in network routers. Despite the attractive features of TCAMs, high power consumption is one of the most critical challenges faced by TCAM designers. This work proposes circuit techniques for reducing TCAM power consumption. The main contribution of this work is divided in two parts: (i) reduction in match line (ML) sensing energy, and (ii) static-power reduction techniques. The ML sensing energy is reduced by employing (i) positive-feedback ML sense amplifiers (MLSAs), (ii) low-capacitance comparison logic, and (iii) low-power ML-segmentation techniques. The positive-feedback MLSAs include both resistive and active feedback to reduce the ML sensing energy. A body-bias technique can further improve the feedback action at the expense of additional area and ML capacitance. The measurement results of the active-feedback MLSA show 50-56% reduction in ML sensing energy. The measurement results of the proposed low-capacitance comparison logic show 25% and 42% reductions in ML sensing energy and time, respectively, which can further be improved by careful layout. The low-power ML-segmentation techniques include dual ML TCAM and charge-shared ML. Simulation results of the dual ML TCAM that connects two sides of the comparison logic to two ML segments for sequential sensing show 43% power savings for a small (4%) trade-off in the search speed. The charge-shared ML scheme achieves power savings by partial recycling of the charge stored in the first ML segment. Chip measurement results show that the charge-shared ML scheme results in 11% and 9% reductions in ML sensing time and energy, respectively, which can be improved to 19-25% by using a digitally controlled charge sharing time-window and a slightly modified MLSA. The static power reduction is achieved by a dual-VDD technique and low-leakage TCAM cells. The dual-VDD technique trades-off the excess noise margin of MLSA for smaller cell leakage by applying a smaller VDD to TCAM cells and a larger VDD to the peripheral circuits. The low-leakage TCAM cells trade off the speed of READ and WRITE operations for smaller cell area and leakage. Finally, design and testing of a complete TCAM chip are presented, and compared with other published designs

    Design and Analysis of Low-power SRAMs

    Get PDF
    The explosive growth of battery operated devices has made low-power design a priority in recent years. Moreover, embedded SRAM units have become an important block in modern SoCs. The increasing number of transistor count in the SRAM units and the surging leakage current of the MOS transistors in the scaled technologies have made the SRAM unit a power hungry block from both dynamic and static perspectives. Owing to high bitline voltage swing during write operation, the write power consumption is dominated the dynamic power consumption. The static power consumption is mainly due to the leakage current associated with the SRAM cells distributed in the array. Moreover, as supply voltage decreases to tackle the power consumption, the data stability of the SRAM cells have become a major concern in recent years. To reduce the write power consumption, several schemes such as row based sense amplifying cell (SAC) and hierarchical bitline sense amplification (HBLSA) have been proposed. However, these schemes impose architectural limitations on the design in terms of the number of words on a row. Beside, the effectiveness of these methods is limited to the dynamic power consumption. Conventionally, reduction of the cell supply voltage and exploiting the body effect has been suggested to reduce the cell leakage current. However, variation of the supply voltage of the cell associates with a higher dynamic power consumption and reduced cell data stability. Conventionally qualified by Static Noise Margin (SNM), the ability of the cell to retain the data is reduced under a lower supply voltage conditions. In this thesis, we revisit the concept of data stability from the dynamic perspective. A new criteria for the data stability of the SRAM cell is defined. The new criteria suggests that the access time and non-access time (recovery time) of the cell can influence the data stability in a SRAM cell. The speed vs. stability trade-off opens new opportunities for aggressive power reduction for low-power applications. Experimental results of a test chip implemented in a 130 nm CMOS technology confirmed the concept and opened a ground for introduction of a new operational mode for the SRAM cells. We introduced a new architecture; Segmented Virtual Grounding (SVGND) to reduce the dynamic and static power reduction in SRAM units at the same time. Thanks to the new concept for the data stability in SRAM cells, we introduced the new operational mode of Accessed Retention Mode (AR-Mode) to the SRAM cell. In this mode, the accessed SRAM cell can retain the data, however, it does not discharge the bitline. The new architecture outperforms the recently reported low-power schemes in terms of dynamic power consumption, thanks to the exclusive discharge of the bitline and the cell virtual ground. In addition, the architecture reduces the leakage current significantly since it uses the back body biasing in both load and drive transistors. A 40Kb SRAM unit based on SVGND architecture is implemented in a 130 nm CMOS technology. Experimental results exhibit a remarkable static and dynamic power reduction compared to the conventional and previously reported low-power schemes as expect from the simulation results

    Re-designing Main Memory Subsystems with Emerging Monolithic 3D (M3D) Integration and Phase Change Memory Technologies

    Get PDF
    Over the past two decades, Dynamic Random-Access Memory (DRAM) has emerged as the dominant technology for implementing the main memory subsystems of all types of computing systems. However, inferring from several recent trends, computer architects in both the industry and academia have widely accepted that the density (memory capacity per chip area) and latency of DRAM based main memory subsystems cannot sufficiently scale in the future to meet the requirements of future data-centric workloads related to Artificial Intelligence (AI), Big Data, and Internet-of-Things (IoT). In fact, the achievable density and access latency in main memory subsystems presents a very fundamental trade-off. Pushing for a higher density inevitably increases access latency, and pushing for a reduced access latency often leads to a decreased density. This trade-off is so fundamental in DRAM based main memory subsystems that merely looking to re-architect DRAM subsystems cannot improve this trade-off, unless disruptive technological advancements are realized for implementing main memory subsystems. In this thesis, we focus on two key contributions to overcome the density (represented as the total chip area for the given capacity) and access latency related challenges in main memory subsystems. First, we show that the fundamental area-latency trade-offs in DRAM can be significantly improved by redesigning the DRAM cell-array structure using the emerging monolithic 3D (M3D) integration technology. A DRAM bank structure can be split across two or more M3D-integrated tiers on the same DRAM chip, to consequently be able to significantly reduce the total on-chip area occupancy of the DRAM bank and its access peripherals. This approach is fundamentally different from the well known approach of through-silicon vias (TSVs)-based 3D stacking of DRAM tiers. This is because the M3D integration based approach does not require a separate DRAM chip per tier, whereas the 3D-stacking based approach does. Our evaluation results for PARSEC benchmarks show that our designed M3D DRAM cellarray organizations can yield up to 9.56% less latency and up to 21.21% less energy-delay product (EDP), with up to 14% less DRAM die area, compared to the conventional 2D DDR4 DRAM. Second, we demonstrate a pathway for eliminating the write disturbance errors in single-level-cell PCM, thereby positioning the PCM technology, which has inherently more relaxed density and latency trade-off compared to DRAM, as a more viable option for replacing the DRAM technology. We introduce low-temperature partial-RESET operations for writing ‘0’s in PCM cells. Compared to traditional operations that write \u270\u27s in PCM cells, partial-RESET operations do not cause disturbance errors in neighboring cells during PCM writes. The overarching theme that connects the two individual contributions into this single thesis is the density versus latency argument. The existing PCM technology has 3 to 4× higher write latency compared to DRAM; nevertheless, the existing PCM technology can store 2 to 4 bits in a single cell compared to one bit per cell storage capacity of DRAM. Therefore, unlike DRAM, it becomes possible to increase the density of PCM without consequently increasing PCM latency. In other words, PCM exhibits inherently improved (more relaxed) density and latency trade-off. Thus, both of our contributions in this thesis, the first contribution of re-designing DRAM with M3D integration technology and the second contribution of making the PCM technology a more viable replacement of DRAM by eliminating the write disturbance errors in PCM, connect to the common overarching goal of improving the density and latency trade-off in main memory subsystems. In addition, we also discuss in this thesis possible future research directions that are aimed at extending the impacts of our proposed ideas so that they can transform the performance of main memory subsystems of the future

    Design Of High Performance Comparator Using Mixed Logic Line Decorder

    Get PDF
    This paper presents a combined reasoning layout method for line decoders, by combining pass transistor double worth logic, transmission gateway logic and also fixed complementary metal-oxide semiconductor. Two brand-new geographies are presented for the 2-4 decoders, a 14-transistor geography aiming on reducing transistor matter and also power dissipation and also a 15-transistor topology aiming above power-delay efficiency. In each instance both normal as well as inverting decoders are applied, yielding a total amount of four brand-new designs. Moreover, by utilizing mixed-logic 2-4 decoders integrated with basic CMOS blog post decoder, designed 4 new 4-16 decoders. All proposed decoders have full-swinging capability and also reduced transistor matter compared to their traditional CMOS equivalents. Finally, a variety of comparative EZ wave simulations at the 130nm (PYXIS GDK) shows that the recommended circuits provide a substantial improvement in power and delay, exceeding CMOS in almost all situations

    SRAM Read-Assist Scheme for Low Power High Performance Applications

    Get PDF
    Semiconductor technology scaling resulted in a considerable reduction in the transistor cost and an astonishing enhancement in the performance of VLSI (very large scale integration) systems. These nanoscale technologies have facilitated integration of large SRAMs which are now very popular for both processors and system-on-chip (SOC) designs. The density of SRAM array had a quadratic increase with each generation of CMOS technology. However, these nanoscale technologies unveiled few significant challenges to the design of high performance and low power embedded memories. First, process variation has become more significant in these technologies which threaten reliability of sensing circuitry. In order to alleviate this problem, we need to have larger signal swings on the bitlines (BLs) which degrade speed as well as power dissipation. The second challenge is due to the variation in the cell current which will reduce the worst case cell current. Since this cell current is responsible for discharging BLs, this problem will translate to longer activation time for the wordlines (WLs). The longer the WL pulse width is, the more likely is the cell to be unstable. A long WL pulse width can also degrade noise margin. Furthermore, as a result of continuous increase in the size of SRAMs, the BL capacitance has increased significantly which will deteriorate speed as well as power dissipation. The aforementioned problems require additional techniques and treatment such as read-assist techniques to insure fast, low power and reliable read operation in nanoscaled SRAMs. In this research we address these concerns and propose a read-assist sense amplifier (SA) in 65nm CMOS technology that expedites the process of developing differential voltage to be sensed by sense amplifier while reducing voltage swing on the BLs which will result in increased sensing speed, lower power and shorter WL activation time. A complete comparison is made between the proposed scheme, conventional SA and a state of the art design which shows speed improvement and power reduction of 56.1% and 25.9%, respectively over the conventional scheme at the expense of negligible area overhead. Also, the proposed scheme enables us to reduce cell VDD for having the same sensing speed which results in considerable reduction in leakage power dissipation

    myCACTI: A new cache design tool for pipelined nanometer caches

    Get PDF
    TThe presence of caches in microprocessors has always been one of the most important techniques in bridging the memory wall, or the speed gap between the microprocessor and main memory. This importance is continuously increasing especially as we enter the regime of nanometer process technologies (i.e. 90nm and below), as industry has favored investing a larger and larger fraction of a chip.s transistor budget to improving the on-chip cache. This is the case in practice, as it has proven to be an efficient way to utilize the increasing number of transistors available with each succeeding technology. Consequently, it becomes even more important to have cache design tools that give accurate representations of designs that exist in actual microprocessors. The prevalent cache design tools that are the most widely used in academe are CACTI [Wilton1996] and eCACTI [Mamidipaka2004], and these have proven to be very useful tools not just for cache designers, but also for computer architects. This dissertation will show that both CACTI and eCACTI still contain major limitations and even flaws in their design, making them unsuitable for use in very-deep submicron and nanometer caches, especially pipelined designs. These limitations and flaws will be discussed in detail. This dissertation then introduces a new tool, called myCACTI, that addresses all these limitations and, in addition, introduces major enhancements to the simulation framework. This dissertation then demonstrates the use of myCACTI in the cache design process. Detailed design space explorations are done on multiple cache configurations to produce pareto optimal curves of the caches to show optimal implementations. Detailed studies are also performed to characterize the delay and power dissipation of different cache configurations and implementations. Finally, future directions to the development of myCACTI are identified to show possible ways that the tool can be improved in such a way as to allow even more different kinds of studies to be performed

    A low-power cache system for high-performance processors

    Get PDF
    制度:新 ; 報告番号:甲3439号 ; 学位の種類:博士(工学) ; 授与年月日:12-Sep-11 ; 早大学位記番号:新576
    corecore