53 research outputs found

    Low-power and high-performance SRAM design in high variability advanced CMOS technology

    Get PDF
    As process technologies shrink, the size and number of memories on a chip are exponentially increasing. Embedded SRAMs are a critical component in modern digital systems, and they strongly impact the overall power, performance, and area. To promote memory-related research in academia, this dissertation introduces OpenRAM, a flexible, portable and open-source memory compiler and characterization methodology for generating and verifying memory designs across different technologies.In addition, SRAM designs, focusing on improving power consumption, access time and bitcell stability are explored in high variability advanced CMOS technologies. To have a stable read/write operation for SRAM in high variability process nodes, a differential-ended single-port 8T bitcell is proposed that improves the read noise margin, write noise margin and readout bitcell current by 45%, 48% and 21%, respectively, compared to a conventional 6T bitcell. Also, a differential-ended single-port 12T bitcell for subthreshold operation is proposed that solves the half-select disturbance and allows efficient bit-interleaving. 12T bitcell has a leakage control mechanism which helps to reduce the power consumption and provides operation down to 0.3 V. Both 8T and 12T bitcells are analyzed in a 64 kb SRAM array using 32 nm technology. Besides, to further improve the access time and power consumption, two tracking circuits (multi replica bitline delay and reconfigurable replica bitline delay techniques) are proposed to aid the generation of accurate and optimum sense amplifier set time.An error tolerant SRAM architecture suitable for low voltage video application with dynamic power-quality management is also proposed in this dissertation. This memory uses three power supplies to improve the SRAM stability in low voltages. The proposed triple-supply approach achieves 63% improvement in image quality and 69% reduction in power consumption compared to a single-supply 64 kb SRAM array at 0.70 V

    Subthreshold SRAM Design for Energy Efficient Applications in Nanometric CMOS Technologies

    Get PDF
    Embedded SRAM circuits are vital components in a modern system on chip (SOC) that can occupy up to 90% of the total area. Therefore, SRAM circuits heavily affect SOC performance, reliability, and yield. In addition, most of the SRAM bitcells are in standby mode and significantly contribute to the total leakage current and leakage power consumption. The aggressive demand in portable devices and billions of connected sensor networks requires long battery life. Therefore, careful design of SRAM circuits with minimal power consumption is in high demand. Reducing the power consumption is mainly achieved by reducing the power supply voltage in the idle mode. However, simply reducing the supply voltage imposes practical limitations on SRAM circuits such as reduced static noise margin, poor write margin, reduced number of cells per bitline, and reduced bitline sensing margin that might cause read/write failures. In addition, the SRAM bitcell has contradictory requirements for read stability and writability. Improving the read stability can cause difficulties in a write operation or vice versa. In this thesis, various techniques for designing subthreshold energy-efficient SRAM circuits are proposed. The proposed techniques include improvement in read margin and write margin, speed improvement, energy consumption reduction, new bitcell architecture and utilizing programmable wordline boosting. A programmable wordline boosting technique is exploited on a conventional 6T SRAM bitcell to improve the operational speed. In addition, wordline boosting can reduce the supply voltage while maintaining the operational frequency. The reduction of the supply voltage allows the memory macro to operate with reduced power consumption. To verify the design, a 16-kb SRAM was fabricated using the TSMC 65 nm CMOS technology. Measurement results show that the maximum operational frequency increases up to 33.3% when wordline boosting is applied. Besides, the supply voltage can be reduced while maintaining the same frequency. This allows reducing the energy consumption to be reduced by 22.2%. The minimum energy consumption achieved is 0.536 fJ/b at 400 mV. Moreover, to improve the read margin, a 6T bitcell SRAM with a PMOS access transistor is proposed. Utilizing a PMOS access transistor results in lower zero level degradation, and hence higher read stability. In addition, the access transistor connected to the internal node holding V DD acts as a stabilizer and counterbalances the effect of zero level degradation. In order to improve the writability, wordline boosting is exploited. Wordline boosting also helps to compensate for the lower speed of the PMOS access transistor compared to a NMOS transistor. To verify our design, a 2kb SRAM is fabricated in the TSMC 65 nm CMOS technology. Measurement results show that the maximum operating frequency of the test chip is at 3.34 MHz at 290 mV. The minimum energy consumption is measured as 1.1 fJ/b at 400 mV

    Ultra low-power fault-tolerant SRAM design in 90nm CMOS technology

    Get PDF
    With the increment of mobile, biomedical and space applications, digital systems with low-power consumption are required. As a main part in digital systems, low-power memories are especially desired. Reducing the power supply voltages to sub-threshold region is one of the effective approaches for ultra low-power applications. However, the reduced Static Noise Margin (SNM) of Static Random Access Memory (SRAM) imposes great challenges to the subthreshold SRAM design. The conventional 6-transistor SRAM cell does not function properly at sub-threshold supply voltage range because it has no enough noise margin for reliable operation. In order to achieve ultra low-power at sub-threshold operation, previous research work has demonstrated that the read and write decoupled scheme is a good solution to the reduced SNM problem. A Dual Interlocked Storage Cell (DICE) based SRAM cell was proposed to eliminate the drawback of conventional DICE cell during read operation. This cell can mitigate the singleevent effects, improve the stability and also maintain the low-power characteristic of subthreshold SRAM, In order to make the proposed SRAM cell work under different power supply voltages from 0.3 V to 0.6 V, an improved replica sense scheme was applied to produce a reference control signal, with which the optimal read time could be achieved. In this thesis, a 2K ~8 bits SRAM test chip was designed, simulated and fabricated in 90nm CMOS technology provided by ST Microelectronics. Simulation results suggest that the operating frequency at VDD = 0.3 V is up to 4.7 MHz with power dissipation 6.0 ƒÊW, while it is 45.5 MHz at VDD = 0.6 V dissipating 140 ƒÊW. However, the area occupied by a single cell is larger than that by conventional SRAM due to additional transistors used. The main contribution of this thesis project is that we proposed a new design that could simultaneously solve the ultra low-power and radiation-tolerance problem in large capacity memory design

    Challenges and Directions for Low-Voltage SRAM

    Full text link

    Design of Variation-Tolerant Circuits for Nanometer CMOS Technology: Circuits and Architecture Co-Design

    Get PDF
    Aggressive scaling of CMOS technology in sub-90nm nodes has created huge challenges. Variations due to fundamental physical limits, such as random dopants fluctuation (RDF) and line edge roughness (LER) are increasing significantly with technology scaling. In addition, manufacturing tolerances in process technology are not scaling at the same pace as transistor's channel length due to process control limitations (e.g., sub-wavelength lithography). Therefore, within-die process variations worsen with successive technology generations. These variations have a strong impact on the maximum clock frequency and leakage power for any digital circuit, and can also result in functional yield losses in variation-sensitive digital circuits (such as SRAM). Moreover, in nanometer technologies, digital circuits show an increased sensitivity to process variations due to low-voltage operation requirements, which are aggravated by the strong demand for lower power consumption and cost while achieving higher performance and density. It is therefore not surprising that the International Technology Roadmap for Semiconductors (ITRS) lists variability as one of the most challenging obstacles for IC design in nanometer regime. To facilitate variation-tolerant design, we study the impact of random variations on the delay variability of a logic gate and derive simple and scalable statistical models to evaluate delay variations in the presence of within-die variations. This work provides new design insight and highlights the importance of accounting for the effect of input slew on delay variations, especially at lower supply voltages. The derived models are simple, scalable, bias dependent and only require the knowledge of easily measurable parameters. This makes them useful in early design exploration, circuit/architecture optimization as well as technology prediction (especially in low-power and low-voltage operation). The derived models are verified using Monte Carlo SPICE simulations using industrial 90nm technology. Random variations in nanometer technologies are considered one of the largest design considerations. This is especially true for SRAM, due to the large variations in bitcell characteristics. Typically, SRAM bitcells have the smallest device sizes on a chip. Therefore, they show the largest sensitivity to different sources of variations. With the drastic increase in memory densities, lower supply voltages and higher variations, statistical simulation methodologies become imperative to estimate memory yield and optimize performance and power. In this research, we present a methodology for statistical simulation of SRAM read access yield, which is tightly related to SRAM performance and power consumption. The proposed flow accounts for the impact of bitcell read current variation, sense amplifier offset distribution, timing window variation and leakage variation on functional yield. The methodology overcomes the pessimism existing in conventional worst-case design techniques that are used in SRAM design. The proposed statistical yield estimation methodology allows early yield prediction in the design cycle, which can be used to trade off performance and power requirements for SRAM. The methodology is verified using measured silicon yield data from a 1Mb memory fabricated in an industrial 45nm technology. Embedded SRAM dominates modern SoCs and there is a strong demand for SRAM with lower power consumption while achieving high performance and high density. However, in the presence of large process variations, SRAMs are expected to consume larger power to ensure correct read operation and meet yield targets. We propose a new architecture that significantly reduces array switching power for SRAM. The proposed architecture combines built-in self-test (BIST) and digitally controlled delay elements to reduce the wordline pulse width for memories while ensuring correct read operation; hence, reducing switching power. A new statistical simulation flow was developed to evaluate the power savings for the proposed architecture. Monte Carlo simulations using a 1Mb SRAM macro from an industrial 45nm technology was used to examine the power reduction achieved by the system. The proposed architecture can reduce the array switching power significantly and shows large power saving - especially as the chip level memory density increases. For a 48Mb memory density, a 27% reduction in array switching power can be achieved for a read access yield target of 95%. In addition, the proposed system can provide larger power saving as process variations increase, which makes it a very attractive solution for 45nm and below technologies. In addition to its impact on bitcell read current, the increase of local variations in nanometer technologies strongly affect SRAM cell stability. In this research, we propose a novel single supply voltage read assist technique to improve SRAM static noise margin (SNM). The proposed technique allows precharging different parts of the bitlines to VDD and GND and uses charge sharing to precisely control the bitline voltage, which improves the bitcell stability. In addition to improving SNM, the proposed technique also reduces memory access time. Moreover, it only requires one supply voltage, hence, eliminates the need of large area voltage shifters. The proposed technique has been implemented in the design of a 512kb memory fabricated in 45nm technology. Results show improvements in SNM and read operation window which confirms the effectiveness and robustness of this technique

    Digital Timing Control in SRAMs for Yield Enhancement and Graceful Aging Degradation

    Get PDF
    Embedded SRAMs can occupy the majority of the chip area in SOCs. The increase in process variation and aging degradation due to technology scaling can severely compromise the integrity of SRAM memory cells, hence resulting in cell failures. Enough cell failures in a memory can lead to it being rejected during initial testing, and hence decrease the manufacturing yield. Or, as a result of long-term applied stress, lead to in-field system failures. Certain types of cell failures can be mitigated through improved timing control. Post-fabrication programmable timing can allow for after-the-fact calibration of timing signals on a per die basis. This allows for a SRAM's timing signals to be generated based on the characteristics specific to the individual chip, thus allowing for an increase in yield and reduction in in-field system failures. In this thesis, a delay line based SRAM timing block with digitally programmable timing signals has been implemented in a 180 nm CMOS technology. Various timing-related cell failure mechanisms including: 1). Operational Read Failures, 2). Cell Stability Failures, and 3). Power Envelope Failures are investigated. Additionally, the major contributing factors for process variation and device aging degradation are discussed in the context of SRAMs. Simulations show that programmable timing can be used to reduce cell failure rates by over 50%

    Design and Analysis of Low-power SRAMs

    Get PDF
    The explosive growth of battery operated devices has made low-power design a priority in recent years. Moreover, embedded SRAM units have become an important block in modern SoCs. The increasing number of transistor count in the SRAM units and the surging leakage current of the MOS transistors in the scaled technologies have made the SRAM unit a power hungry block from both dynamic and static perspectives. Owing to high bitline voltage swing during write operation, the write power consumption is dominated the dynamic power consumption. The static power consumption is mainly due to the leakage current associated with the SRAM cells distributed in the array. Moreover, as supply voltage decreases to tackle the power consumption, the data stability of the SRAM cells have become a major concern in recent years. To reduce the write power consumption, several schemes such as row based sense amplifying cell (SAC) and hierarchical bitline sense amplification (HBLSA) have been proposed. However, these schemes impose architectural limitations on the design in terms of the number of words on a row. Beside, the effectiveness of these methods is limited to the dynamic power consumption. Conventionally, reduction of the cell supply voltage and exploiting the body effect has been suggested to reduce the cell leakage current. However, variation of the supply voltage of the cell associates with a higher dynamic power consumption and reduced cell data stability. Conventionally qualified by Static Noise Margin (SNM), the ability of the cell to retain the data is reduced under a lower supply voltage conditions. In this thesis, we revisit the concept of data stability from the dynamic perspective. A new criteria for the data stability of the SRAM cell is defined. The new criteria suggests that the access time and non-access time (recovery time) of the cell can influence the data stability in a SRAM cell. The speed vs. stability trade-off opens new opportunities for aggressive power reduction for low-power applications. Experimental results of a test chip implemented in a 130 nm CMOS technology confirmed the concept and opened a ground for introduction of a new operational mode for the SRAM cells. We introduced a new architecture; Segmented Virtual Grounding (SVGND) to reduce the dynamic and static power reduction in SRAM units at the same time. Thanks to the new concept for the data stability in SRAM cells, we introduced the new operational mode of Accessed Retention Mode (AR-Mode) to the SRAM cell. In this mode, the accessed SRAM cell can retain the data, however, it does not discharge the bitline. The new architecture outperforms the recently reported low-power schemes in terms of dynamic power consumption, thanks to the exclusive discharge of the bitline and the cell virtual ground. In addition, the architecture reduces the leakage current significantly since it uses the back body biasing in both load and drive transistors. A 40Kb SRAM unit based on SVGND architecture is implemented in a 130 nm CMOS technology. Experimental results exhibit a remarkable static and dynamic power reduction compared to the conventional and previously reported low-power schemes as expect from the simulation results

    Circuit design for embedded memory in low-power integrated circuits

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 141-152).This thesis explores the challenges for integrating embedded static random access memory (SRAM) and non-volatile memory-based on ferroelectric capacitor technology-into lowpower integrated circuits. First considered is the impact of process variation in deep-submicron technologies on SRAM, which must exhibit higher density and performance at increased levels of integration with every new semiconductor generation. Techniques to speed up the statistical analysis of physical memory designs by a factor of 100 to 10,000 relative to the conventional Monte Carlo Method are developed. The proposed methods build upon the Importance Sampling simulation algorithm and efficiently explore the sample space of transistor parameter fluctuation. Process variation in SRAM at low-voltage is further investigated experimentally with a 512kb 8T SRAM test chip in 45nm SOI CMOS technology. For active operation, an AC coupled sense amplifier and regenerative global bitline scheme are designed to operate at the limit of on current and off current separation on a single-ended SRAM bitline. The SRAM operates from 1.2 V down to 0.57 V with access times from 400ps to 3.4ns. For standby power, a data retention voltage sensor predicts the mismatch-limited minimum supply voltage without corrupting the contents of the memory. The leakage power of SRAM forces the chip designer to seek non-volatile memory in applications such as portable electronics that retain significant quantities of data over long durations. In this scenario, the energy cost of accessing data must be minimized. This thesis presents a ferroelectric random access memory (FRAM) prototype that addresses the challenges of sensing diminishingly small charge under conditions favorable to low access energy with a time-to-digital sensing scheme. The 1 Mb IT1C FRAM fabricated in 130 nm CMOS operates from 1.5 V to 1.0 V with corresponding access energy from 19.2 pJ to 9.8 pJ per bit. Finally, the computational state of sequential elements interspersed in CMOS logic, also restricts the ability to power gate. To enable simple and fast turn-on, ferroelectric capacitors are integrated into the design of a standard cell register, whose non-volatile operation is made compatible with the digital design flow. A test-case circuit containing ferroelectric registers exhibits non-volatile operation and consumes less than 1.3 pJ per bit of state information and less than 10 clock cycles to save or restore with no minimum standby power requirement in-between active periods.by Masood Qazi.Ph.D

    ULTRALOW-POWER, LOW-VOLTAGE DIGITAL CIRCUITS FOR BIOMEDICAL SENSOR NODES

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Deep in-memory computing

    Get PDF
    There is much interest in embedding data analytics into sensor-rich platforms such as wearables, biomedical devices, autonomous vehicles, robots, and Internet-of-Things to provide these with decision-making capabilities. Such platforms often need to implement machine learning (ML) algorithms under stringent energy constraints with battery-powered electronics. Especially, energy consumption in memory subsystems dominates such a system's energy efficiency. In addition, the memory access latency is a major bottleneck for overall system throughput. To address these issues in memory-intensive inference applications, this dissertation proposes deep in-memory accelerator (DIMA), which deeply embeds computation into the memory array, employing two key principles: (1) accessing and processing multiple rows of memory array at a time, and (2) embedding pitch-matched low-swing analog processing at the periphery of bitcell array. The signal-to-noise ratio (SNR) is budgeted by employing low-swing operations in both memory read and processing to exploit the application level's error immunity for aggressive energy efficiency. This dissertation first describes the system rationale underlying the DIMA's processing stages by identifying the common functional flow across a diverse set of inference algorithms. Based on the analysis, this dissertation presents a multi-functional DIMA to support four algorithms: support vector machine (SVM), template matching (TM), k-nearest neighbor (k-NN), and matched filter. The circuit and architectural level design techniques and guidelines are provided to address the challenges in achieving multi-functionality. A prototype integrated circuit (IC) of a multi-functional DIMA was fabricated with a 16 KB SRAM array in a 65 nm CMOS process. Measurement results show up to 5.6X and 5.8X energy and delay reductions leading to 31X energy delay product (EDP) reduction with negligible (<1%) accuracy degradation as compared to the conventional 8-b fixed-point digital implementation optimally designed for each algorithm. Then, DIMA also has been applied to more complex algorithms: (1) convolutional neural network (CNN), (2) sparse distributed memory (SDM), and (3) random forest (RF). System-level simulations of CNN using circuit behavioral models in a 45 nm SOI CMOS demonstrate that high probability (>0.99) of handwritten digit recognition can be achieved using the MNIST database, along with a 24.5X reduced EDP, a 5.0X reduced energy, and a 4.9X higher throughput as compared to the conventional system. The DIMA-based SDM architecture also achieves up to 25X and 12X delay and energy reductions, respectively, over conventional SDM with negligible accuracy degradation (within 0.4%) for 16X16 binary-pixel image classification. A DIMA-based RF was realized as a prototype IC with a 16 KB SRAM array in a 65 nm process. To the best of our knowledge, this is the first IC realization of an RF algorithm. The measurement results show that the prototype achieves a 6.8X lower EDP compared to a conventional design at the same accuracy (94%) for an eight-class traffic sign recognition problem. The multi-functional DIMA and extension to other algorithms naturally motivated us to consider a programmable DIMA instruction set architecture (ISA), namely MATI. This dissertation explores a synergistic combination of the instruction set, architecture and circuit design to achieve the programmability without losing DIMA's energy and throughput benefits. Employing silicon-validated energy, delay and behavioral models of deep in-memory components, we demonstrate that MATI is able to realize nine ML benchmarks while incurring negligible overhead in energy (< 0.1%), and area (4.5%), and in throughput, over a fixed four-function DIMA. In this process, MATI is able to simultaneously achieve enhancements in both energy (2.5X to 5.5X) and throughput (1.4X to 3.4X) for an overall EDP improvement of up to 12.6X over fixed-function digital architectures
    corecore