5 research outputs found

    Low-power and application-specific SRAM design for energy-efficient motion estimation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 181-189).Video content is expected to account for 70% of total mobile data traffic in 2015. High efficiency video coding, in this context, is crucial for lowering the transmission and storage costs for portable electronics. However, modern video coding standards impose a large hardware complexity. Hence, energy-efficiency of these hardware blocks is becoming more critical than ever before for mobile devices. SRAMs are critical components in almost all SoCs affecting the overall energy-efficiency. This thesis focuses on algorithm and architecture development as well as low-power and application-specific SRAM design targeting motion estimation. First, a motion estimation design is considered for the next generation video standard, HEVC. Hardware cost and coding efficiency trade-offs are quantified and an optimum design choice between hardware complexity and coding efficiency is proposed. Hardware-efficient search algorithm, shared search range across CU engines and pixel pre-fetching algorithms provide 4.3x area, 56x on-chip bandwidth and 151 x off-chip bandwidth reduction. Second, a highly-parallel motion estimation design targeting ultra-low voltage operation and supporting AVC/H.264 and VC-1 standards are considered. Hardware reconfigurability along with frame and macro-block parallel processing are implemented for this engine to maximize hardware sharing between multiple standards and to meet throughput constraints. Third, in the context of low-power SRAMs, a 6T and an 8T SRAM are designed in 28nm and 45nm CMOS technologies targeting low voltage operation. The 6T design achieves operation down to 0.6V and the 8T design achieves operation down to 0.5V providing ~ 2.8x and ~ 4.8x reduction in energy/access respectively. Finally, an application-specific SRAM design targeted for motion estimation is developed. Utilizing the correlation of pixel data to reduce bit-line switching activity, this SRAM achieves up to 1.9x energy savings compared to a similar conventional 8T design. These savings demonstrate that application-specific SRAM design can introduce a new dimension and can be combined with voltage scaling to maximize energy-efficiency.by Mahmut Ersin Sinangil.Ph.D

    Subthreshold SRAM Design for Energy Efficient Applications in Nanometric CMOS Technologies

    Get PDF
    Embedded SRAM circuits are vital components in a modern system on chip (SOC) that can occupy up to 90% of the total area. Therefore, SRAM circuits heavily affect SOC performance, reliability, and yield. In addition, most of the SRAM bitcells are in standby mode and significantly contribute to the total leakage current and leakage power consumption. The aggressive demand in portable devices and billions of connected sensor networks requires long battery life. Therefore, careful design of SRAM circuits with minimal power consumption is in high demand. Reducing the power consumption is mainly achieved by reducing the power supply voltage in the idle mode. However, simply reducing the supply voltage imposes practical limitations on SRAM circuits such as reduced static noise margin, poor write margin, reduced number of cells per bitline, and reduced bitline sensing margin that might cause read/write failures. In addition, the SRAM bitcell has contradictory requirements for read stability and writability. Improving the read stability can cause difficulties in a write operation or vice versa. In this thesis, various techniques for designing subthreshold energy-efficient SRAM circuits are proposed. The proposed techniques include improvement in read margin and write margin, speed improvement, energy consumption reduction, new bitcell architecture and utilizing programmable wordline boosting. A programmable wordline boosting technique is exploited on a conventional 6T SRAM bitcell to improve the operational speed. In addition, wordline boosting can reduce the supply voltage while maintaining the operational frequency. The reduction of the supply voltage allows the memory macro to operate with reduced power consumption. To verify the design, a 16-kb SRAM was fabricated using the TSMC 65 nm CMOS technology. Measurement results show that the maximum operational frequency increases up to 33.3% when wordline boosting is applied. Besides, the supply voltage can be reduced while maintaining the same frequency. This allows reducing the energy consumption to be reduced by 22.2%. The minimum energy consumption achieved is 0.536 fJ/b at 400 mV. Moreover, to improve the read margin, a 6T bitcell SRAM with a PMOS access transistor is proposed. Utilizing a PMOS access transistor results in lower zero level degradation, and hence higher read stability. In addition, the access transistor connected to the internal node holding V DD acts as a stabilizer and counterbalances the effect of zero level degradation. In order to improve the writability, wordline boosting is exploited. Wordline boosting also helps to compensate for the lower speed of the PMOS access transistor compared to a NMOS transistor. To verify our design, a 2kb SRAM is fabricated in the TSMC 65 nm CMOS technology. Measurement results show that the maximum operating frequency of the test chip is at 3.34 MHz at 290 mV. The minimum energy consumption is measured as 1.1 fJ/b at 400 mV

    Low-Power, Low-Voltage SRAM Circuits Design For Nanometric CMOS Technologies

    Get PDF
    Embedded SRAM memory is a vital component in modern SoCs. More than 80% of the System-on-Chip (SoC) die area is often occupied by SRAM arrays. As such, system reliability and yield is largely governed by the SRAM's performance and robustness. The aggressive scaling trend in CMOS device minimum feature size, coupled with the growing demand in high-capacity memory integration, has imposed the use of minimal size devices to realize a memory bitcell. The smallest 6T SRAM bitcell to date occupies a 0.1um2 in silicon area. SRAM bitcells continue to benefit from an aggressive scaling trend in CMOS technologies. Unfortunately, other system components, such as interconnects, experience a slower scaling trend. This has resulted in dramatic deterioration in a cell's ability to drive a heavily-loaded interconnects. Moreover, the growing fluctuation in device properties due to Process, Voltage, and Temperature (PVT) variations has added more uncertainty to SRAM operation. Thus ensuring the ability of a miniaturized cell to drive heavily-loaded bitlines and to generate adequate voltage swing is becoming challenging. A large percentage of state-of-the-art SoC system failures are attributed to the inability of SRAM cells to generate the targeted bitline voltage swing within a given access time. The use of read-assist mechanisms and current mode sense amplifiers are the two key strategies used to surmount bitline loading effects. On the other hand, new bitcell topologies and cell supply voltage management are used to overcome fluctuations in device properties. In this research we tackled conventional 6T SRAM bitcell limited drivability by introducing new integrated voltage sensing schemes and current-mode sense amplifiers. The proposed schemes feature a read-assist mechanism. The proposed schemes' functionality and superiority over existing schemes are verified using transient and statistical SPICE simulations. Post-layout extracted views of the devices are used for realistic simulation results. Low-voltage operated SRAM reliability and yield enhancement is investigated and a wordline boost technique is proposed as a means to manage the cell's WL operating voltage. The proposed wordline driver design shows a significant improvement in reliability and yield in a 400-mV 6T SRAM cell. The proposed wordline driver design exploit the cell's Dynamic Noise Margin (DNM), therefore boost peak level and boost decay rate programmability features are added. SPICE transient and statistical simulations are used to verify the proposed design's functionality. Finally, at a bitcell-level, we proposed a new five-transistor (5T) SRAM bitcell which shows competitive performance and reliability figures of merit compared to the conventional 6T bitcell. The functionality of the proposed cell is verified by post-layout SPICE simulations. The proposed bitcell topology is designed, implemented and fabricated in a standard ST CMOS 65nm technology process. A 1.2_ 1.2 mm2 multi-design project test chip consisting of four 32-Kbit (256-row x 128-column) SRAM macros with the required peripheral and timing control units is fabricated. Two of the designed SRAM macros are dedicated for this work, namely, a 32-Kbit 5T macro and a 32-Kbit 6T macro which is used as a comparison reference. Other macros belong to other projects and are not discussed in this document

    Robust Design of Variation-Sensitive Digital Circuits

    Get PDF
    The nano-age has already begun, where typical feature dimensions are smaller than 100nm. The operating frequency is expected to increase up to 12 GHz, and a single chip will contain over 12 billion transistors in 2020, as given by the International Technology Roadmap for Semiconductors (ITRS) initiative. ITRS also predicts that the scaling of CMOS devices and process technology, as it is known today, will become much more difficult as the industry advances towards the 16nm technology node and further. This aggressive scaling of CMOS technology has pushed the devices to their physical limits. Design goals are governed by several factors other than power, performance and area such as process variations, radiation induced soft errors, and aging degradation mechanisms. These new design challenges have a strong impact on the parametric yield of nanometer digital circuits and also result in functional yield losses in variation-sensitive digital circuits such as Static Random Access Memory (SRAM) and flip-flops. Moreover, sub-threshold SRAM and flip-flops circuits, which are aggravated by the strong demand for lower power consumption, show larger sensitivity to these challenges which reduces their robustness and yield. Accordingly, it is not surprising that the ITRS considers variability and reliability as the most challenging obstacles for nanometer digital circuits robust design. Soft errors are considered one of the main reliability and robustness concerns in SRAM arrays in sub-100nm technologies due to low operating voltage, small node capacitance, and high packing density. The SRAM arrays soft errors immunity is also affected by process variations. We develop statistical design-oriented soft errors immunity variations models for super-threshold and sub-threshold SRAM cells accounting for die-to-die variations and within-die variations. This work provides new design insights and highlights the important design knobs that can be used to reduce the SRAM cells soft errors immunity variations. The developed models are scalable, bias dependent, and only require the knowledge of easily measurable parameters. This makes them useful in early design exploration, circuit optimization as well as technology prediction. The derived models are verified using Monte Carlo SPICE simulations, referring to an industrial hardware-calibrated 65nm CMOS technology. The demand for higher performance leads to very deep pipelining which means that hundreds of thousands of flip-flops are required to control the data flow under strict timing constraints. A violation of the timing constraints at a flip-flop can result in latching incorrect data causing the overall system to malfunction. In addition, the flip-flops power dissipation represents a considerable fraction of the total power dissipation. Sub-threshold flip-flops are considered the most energy efficient solution for low power applications in which, performance is of secondary importance. Accordingly, statistical gate sizing is conducted to different flip-flops topologies for timing yield improvement of super-threshold flip-flops and power yield improvement of sub-threshold flip-flops. Following that, a comparative analysis between these flip-flops topologies considering the required overhead for yield improvement is performed. This comparative analysis provides useful recommendations that help flip-flops designers on selecting the best flip-flops topology that satisfies their system specifications while taking the process variations impact and robustness requirements into account. Adaptive Body Bias (ABB) allows the tuning of the transistor threshold voltage, Vt, by controlling the transistor body voltage. A forward body bias reduces Vt, increasing the device speed at the expense of increased leakage power. Alternatively, a reverse body bias increases Vt, reducing the leakage power but slowing the device. Therefore, the impact of process variations is mitigated by speeding up slow and less leaky devices or slowing down devices that are fast and highly leaky. Practically, the implementation of the ABB is desirable to bias each device in a design independently, to mitigate within-die variations. However, supplying so many separate voltages inside a die results in a large area overhead. On the other hand, using the same body bias for all devices on the same die limits its capability to compensate for within-die variations. Thus, the granularity level of the ABB scheme is a trade-off between the within-die variations compensation capability and the associated area overhead. This work introduces new ABB circuits that exhibit lower area overhead by a factor of 143X than that of previous ABB circuits. In addition, these ABB circuits are resolution free since no digital-to-analog converters or analog-to-digital converters are required on their implementations. These ABB circuits are adopted to high performance critical paths, emulating a real microprocessor architecture, for process variations compensation and also adopted to SRAM arrays, for Negative Bias Temperature Instability (NBTI) aging and process variations compensation. The effectiveness of the new ABB circuits is verified by post layout simulation results and test chip measurements using triple-well 65nm CMOS technology. The highly capacitive nodes of wide fan-in dynamic circuits and SRAM bitlines limit the performance of these circuits. In addition, process variations mitigation by statistical gate sizing increases this capacitance further and fails in achieving the target yield improvement. We propose new negative capacitance circuits that reduce the overall parasitic capacitance of these highly capacitive nodes. These negative capacitance circuits are adopted to wide fan-in dynamic circuits for timing yield improvement up to 99.87% and to SRAM arrays for read access yield improvement up to 100%. The area and power overheads of these new negative capacitance circuits are amortized over the large die area of the microprocessor and the SRAM array. The effectiveness of the new negative capacitance circuits is verified by post layout simulation results and test chip measurements using 65nm CMOS technology

    Low-frequency noise in downscaled silicon transistors: Trends, theory and practice

    Get PDF
    By the continuing downscaling of sub-micron transistors in the range of few to one deca-nanometers, we focus on the increasing relative level of the low-frequency noise in these devices. Large amount of published data and models are reviewed and summarized, in order to capture the state-of-the-art, and to observe that the 1/area scaling of low-frequency noise holds even for carbon nanotube devices, but the noise becomes too large in order to have fully deterministic devices with area less than 10nm×10nm. The low-frequency noise models are discussed from the point of view that the noise can be both intrinsic and coupled to the charge transport in the devices, which provided a coherent picture, and more interestingly, showed that the models converge each to other, despite the many issues that one can find for the physical origin of each model. Several derivations are made to explain crossovers in noise spectra, variable random telegraph amplitudes, duality between energy and distance of charge traps, behaviors and trends for figures of merit by device downscaling, practical constraints for micropower amplifiers and dependence of phase noise on the harmonics in the oscillation signal, uncertainty and techniques of averaging by noise characterization. We have also shown how the unavoidable statistical variations by fabrication is embedded in the devices as a spatial “frozen noise”, which also follows 1/area scaling law and limits the production yield, from one side, and from other side, the “frozen noise” contributes generically to temporal 1/f noise by randomly probing the embedded variations during device operation, owing to the purely statistical accumulation of variance that follows from cause-consequence principle, and irrespectively of the actual physical process. The accumulation of variance is known as statistics of “innovation variance”, which explains the nearly log-normal distributions in the values for low-frequency noise parameters gathered from different devices, bias and other conditions, thus, the origin of geometric averaging in low-frequency noise characterizations. At present, the many models generally coincide each with other, and what makes the difference, are the values, which, however, scatter prominently in nanodevices. Perhaps, one should make some changes in the approach to the low-frequency noise in electronic devices, to emphasize the “statistics behind the numbers”, because the general physical assumptions in each model always fail at some point by the device downscaling, but irrespectively of that, the statistics works, since the low-frequency noise scales consistently with the 1/area law
    corecore