7 research outputs found
Low-power and high-performance SRAM design in high variability advanced CMOS technology
As process technologies shrink, the size and number of memories on a chip are exponentially increasing. Embedded SRAMs are a critical component in modern digital systems, and they strongly impact the overall power, performance, and area. To promote memory-related research in academia, this dissertation introduces OpenRAM, a flexible, portable and open-source memory compiler and characterization methodology for generating and verifying memory designs across different technologies.In addition, SRAM designs, focusing on improving power consumption, access time and bitcell stability are explored in high variability advanced CMOS technologies. To have a stable read/write operation for SRAM in high variability process nodes, a differential-ended single-port 8T bitcell is proposed that improves the read noise margin, write noise margin and readout bitcell current by 45%, 48% and 21%, respectively, compared to a conventional 6T bitcell. Also, a differential-ended single-port 12T bitcell for subthreshold operation is proposed that solves the half-select disturbance and allows efficient bit-interleaving. 12T bitcell has a leakage control mechanism which helps to reduce the power consumption and provides operation down to 0.3 V. Both 8T and 12T bitcells are analyzed in a 64 kb SRAM array using 32 nm technology. Besides, to further improve the access time and power consumption, two tracking circuits (multi replica bitline delay and reconfigurable replica bitline delay techniques) are proposed to aid the generation of accurate and optimum sense amplifier set time.An error tolerant SRAM architecture suitable for low voltage video application with dynamic power-quality management is also proposed in this dissertation. This memory uses three power supplies to improve the SRAM stability in low voltages. The proposed triple-supply approach achieves 63% improvement in image quality and 69% reduction in power consumption compared to a single-supply 64 kb SRAM array at 0.70 V
Circuit design for embedded memory in low-power integrated circuits
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 141-152).This thesis explores the challenges for integrating embedded static random access memory (SRAM) and non-volatile memory-based on ferroelectric capacitor technology-into lowpower integrated circuits. First considered is the impact of process variation in deep-submicron technologies on SRAM, which must exhibit higher density and performance at increased levels of integration with every new semiconductor generation. Techniques to speed up the statistical analysis of physical memory designs by a factor of 100 to 10,000 relative to the conventional Monte Carlo Method are developed. The proposed methods build upon the Importance Sampling simulation algorithm and efficiently explore the sample space of transistor parameter fluctuation. Process variation in SRAM at low-voltage is further investigated experimentally with a 512kb 8T SRAM test chip in 45nm SOI CMOS technology. For active operation, an AC coupled sense amplifier and regenerative global bitline scheme are designed to operate at the limit of on current and off current separation on a single-ended SRAM bitline. The SRAM operates from 1.2 V down to 0.57 V with access times from 400ps to 3.4ns. For standby power, a data retention voltage sensor predicts the mismatch-limited minimum supply voltage without corrupting the contents of the memory. The leakage power of SRAM forces the chip designer to seek non-volatile memory in applications such as portable electronics that retain significant quantities of data over long durations. In this scenario, the energy cost of accessing data must be minimized. This thesis presents a ferroelectric random access memory (FRAM) prototype that addresses the challenges of sensing diminishingly small charge under conditions favorable to low access energy with a time-to-digital sensing scheme. The 1 Mb IT1C FRAM fabricated in 130 nm CMOS operates from 1.5 V to 1.0 V with corresponding access energy from 19.2 pJ to 9.8 pJ per bit. Finally, the computational state of sequential elements interspersed in CMOS logic, also restricts the ability to power gate. To enable simple and fast turn-on, ferroelectric capacitors are integrated into the design of a standard cell register, whose non-volatile operation is made compatible with the digital design flow. A test-case circuit containing ferroelectric registers exhibits non-volatile operation and consumes less than 1.3 pJ per bit of state information and less than 10 clock cycles to save or restore with no minimum standby power requirement in-between active periods.by Masood Qazi.Ph.D
Power Management and SRAM for Energy-Autonomous and Low-Power Systems
We demonstrate the two first-known, complete, self-powered millimeter-scale computer systems.
These microsystems achieve zero-net-energy operation using solar energy harvesting and
ultra-low-power circuits. A medical implant for monitoring intraocular pressure (IOP) is presented
as part of a treatment for glaucoma. The 1.5mm3 IOP monitor is easily implantable because of its
small size and measures IOP with 0.5mmHg accuracy. It wirelessly transmits data to an external
wand while consuming 4.7nJ/bit. This provides rapid feedback about treatment efficacies to decrease
physician response time and potentially prevent unnecessary vision loss. A nearly-perpetual
temperature sensor is presented that processes data using a 2.1μW near-threshold ARM°R Cortex-
M3TM μP that provides a widely-used and trusted programming platform.
Energy harvesting and power management techniques for these two microsystems enable energy-autonomous
operation. The IOP monitor harvests 80nW of solar power while consuming only
5.3nW, extending lifetime indefinitely. This allows the device to provide medical information for
extended periods of time, giving doctors time to converge upon the best glaucoma treatment. The
temperature sensor uses on-demand power delivery to improve low-load dc-dc voltage conversion
efficiency by 4.75x. It also performs linear regulation to deliver power with low noise, improved
load regulation, and tight line regulation.
Low-power high-throughput SRAM techniques help millimeter-scale microsystems meet stringent
power budgets. VDD scaling in memory decreases energy per access, but also decreases stability
margins. These margins can be improved using sizing, VTH selection, and assist circuits,
as well as new bitcell designs. Adaptive Crosshairs modulation of SRAM power supplies fixes
70% of parametric failures. Half-differential SRAM design improves stability, reducing VMIN by
72mV.
The circuit techniques for energy autonomy presented in this dissertation enable millimeter-scale
microsystems for medical implants, such as blood pressure and glucose sensors, as well as
non-medical applications, such as supply chain and infrastructure monitoring. These pervasive
sensors represent the continuation of Bell’s Law, which accurately traces the evolution of computers
as they become smaller, more numerous, and more powerful. The development of
millimeter-scale massively-deployed ubiquitous computers ensures the continued expansion and
profitability of the semiconductor industry. NanoWatt circuit techniques will allow us to meet this
next frontier in IC design.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86387/1/grgkchen_1.pd
Design and Implementation of Low Power SRAM Using Highly Effective Lever Shifters
The explosive growth of battery-operated devices has made low-power design a priority in recent years. In high-performance Systems-on-Chip, leakage power consumption has become comparable to the dynamic component, and its relevance increases as technology scales. These trends are even more evident for SRAM memory devices since they are a dominant source of standby power consumption in low-power application processors. The on-die SRAM power consumption is particularly important for increasingly pervasive mobile and handheld applications where battery life is a key design and technology attribute. In the SRAM-memory design, SRAM cells also comprise the most significant portion of the total chip. Moreover, the increasing number of transistors in the SRAM memories and the MOSs\u27 increasing leakage current in the scaled technologies have turned the SRAM unit into a power-hungry block for both dynamic and static viewpoints. Although the scaling of the supply voltage enables low-power consumption, the SRAM cells\u27 data stability becomes a major concern. Thus, the reduction of SRAM leakage power has become a critical research concern.
To address the leakage power consumption in high-performance cache memories, a stream of novel integrated circuit and architectural level techniques are proposed by researchers including leakage-current management techniques, cell array leakage reduction techniques, bitline leakage reduction techniques, and leakage current compensation techniques. The main goal of this work was to improve the cell array leakage reduction techniques in order to minimize the leakage power for SRAM memory design in low-power applications.
This study performs the body biasing application to reduce leakage current as well. To adjust the NMOSs\u27 threshold voltage and consequently leakage current, a negative DC voltage could be applied to their body terminal as a second gate. As a result, in order to generate a negative DC voltage, this study proposes a negative voltage reference that includes a trimming circuit and a negative level shifter. These enhancements are employed to a 10kb SRAM memory operating at 0.3V in a 65nm CMOS process
Recommended from our members
In-situ and In-field temperature and transistor BTI sensing techniques with microprocessor level implementation
In modern deep-scaled CMOS technologies, various silicon-related pitfalls present challenges to the long-term performance of microprocessors. Such challenges include (1) local hot spots, which breach the thermal limitations of a microprocessor, and (2) transistor aging, especially NBTI, which degrades transistor threshold voltage, ultimately threatening the reliability of the entire memory block. In previous systems, the dummy circuit was placed next to the subject, where the dummy was frequently analyzed, and the readout was used to infer the condition of the target. Due to rapidly changing ambient conditions (e.g., temperature and voltage) and the potential scale of the target dimensions, such metrics may not accurately represent the condition of the target. Moreover, such temperature sensors and canary circuits occupy a significant area.
Therefore, it would be highly preferable to monitor the target circuit in-situ, i.e., to sense the precise transistor at operation. It is also important to achieve an accurate sensing metric. When the temperature is analyzed, the readout should account for voltage and process variations. While sensing the aging degradation, the readout should account for voltage and temperature fluctuations. This would allow testing during in-field operation, while the circuits achieve area-efficiency.
This research had two stages. One result of the first stage was a silicon test chip that was a compact temperature sensor. It involved a family of PTAT+CTAT sensor front-ends that unitized only 6 to 8 conventional CMOS logic devices, yielding a smaller sized chip. The sensor demonstrates accuracy within the target and achieves a 14.3x smaller foot print than preceding published designs. The second product of the first stage was a PMOS aging sensor used in 6T SRAM circuits. The test chip has a real SRAM array, integrated with the proposed PMOS NBTI sensor. It can sense real PMOS NBTI effects in any bit cell (in-situ) and provide robust readings of temperature and voltage (in-field). Intensive aging tests validated the proposed sensing technique.
The second stage was focused on implementing the in-situ and in-field sensing techniques in a real processor. The MIPS microprocessor had a modified instruction cache (I$) and instruction set architecture. With the addition of new instruction aging sensing and minor modification of the circuits, the processor can execute aging sensing opportunistically to evaluate the aging level of its instruction cache. A software framework was developed and verified to estimate the retention voltage of the instruction cache over the lifetime of the chip.
An area-efficient SoC was developed that could transform the instruction cache into an ambient temperature sensor. It had a physically unclonable function (PUF), and it was built with an area-saving technique similar to the earlier work.
This thesis has four chapters. They are presented in chronological and they are aligned with the research described above
Recommended from our members
Very-Large-Scale-Integration Circuit Techniques in Internet-of-Things Applications
Heading towards the era of Internet-of-things (IoT) means both opportunity and challenge for the circuit-design community. In a system where billions of devices are equipped with the ability to sense, compute, communicate with each other and perform tasks in a coordinated manner, security and power management are among the most critical challenges.
Physically unclonable function (PUF) emerges as an important security primitive in hardware-security applications; it provides an object-specific physical identifier hidden within the intrinsic device variations, which is hard to expose and reproduce by adversaries. Yet, designing a compact PUF robust to noise, temperature and voltage remains a challenge.
This thesis presents a novel PUF design approach based on a pair of ultra-compact analog circuits whose output is proportional to absolute temperature. The proposed approach is demonstrated through two works: (1) an ultra-compact and robust PUF based on voltage-compensated proportional-to-absolute-temperature voltage generators that occupies 8.3× less area than the previous work with the similar robustness and twice the robustness of the previously most compact PUF design and (2) a technique to transform a 6T-SRAM array into a robust analog PUF with minimal overhead. In this work, similar circuit topology is used to transform a preexisting on-chip SRAM into a PUF, which further reduces the area in (1) with no robustness penalty.
In this thesis, we also explore techniques for power management circuit design.
Energy harvesting is an essential functionality in an IoT sensor node, where battery replacement is cost-prohibitive or impractical. Yet, existing energy-harvesting power management units (EH PMU) suffer from efficiency loss in the two-step voltage conversion: harvester-to-battery and battery-to-load. We propose an EH PMU architecture with hybrid energy storage, where a capacitor is introduced in addition to the battery to serve as an intermediate energy buffer to minimize the battery involvement in the system energy flow. Test-case measurements show as much as a 2.2× improvement in the end-to-end energy efficiency.
In contrast, with the drastically reduced power consumption of IoT nodes that operates in the sub-threshold regime, adaptive dynamic voltage scaling (DVS) for supply-voltage margin removal, fully on-chip integration and high power conversion efficiency (PCE) are required in PMU designs. We present a PMU–load co-design based on a fully integrated switched-capacitor DC-DC converter (SC-DC) and hybrid error/replica-based regulation for a fully digital PMU control. The PMU is integrated with a neural spike processor (NSP) that achieves a record-low power consumption of 0.61 µW for 96 channels. A tunable replica circuit is added to assist the error regulation and prevent loss of regulation. With automatic energy-robustness co-optimization, the PMU can set the SC-DC’s optimal conversion ratio and switching frequency. The PMU achieves a PCE of 77.7% (72.2%) at VIN = 0.6 V (1 V) and at the NSP’s margin-free operating point
Low-voltage embedded biomedical processor design
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 180-190).Advances in mobile electronics are fueling new possibilities in a variety of applications, one of which is ambulatory medical monitoring with body-worn or implanted sensors. Digital processors on such sensors serve to analyze signals in real-time and extract key features for transmission or storage. To support diverse and evolving applications, the processor should be flexible, and to extend sensor operating lifetime, the processor should be energy-efficient. This thesis focuses on architectures and circuits for low power biomedical signal processing. A general-purpose processor is extended with custom hardware accelerators to reduce the cycle count and energy for common tasks, including FIR and median filtering as well as computing FFTs and mathematical functions. Improvements to classic architectures are proposed to reduce power and improve versatility: an FFT accelerator demonstrates a new control scheme to reduce datapath switching activity, and a modified CORDIC engine features increased input range and decreased quantization error over conventional designs. At the system level, the addition of accelerators increases leakage power and bus loading; strategies to mitigate these costs are analyzed in this thesis. A key strategy for improving energy efficiency is to aggressively scale the power supply voltage according to application performance demands. However, increased sensitivity to variation at low voltages must be mitigated in logic and SRAM design. For logic circuits, a design flow and a hold time verification methodology addressing local variation are proposed and demonstrated in a 65nm microcontroller functioning at 0.3V. For SRAMs, a model for the weak-cell read current is presented for near-V supply voltages, and a self-timed scheme for reducing internal bus glitches is employed with low leakage overhead. The above techniques are demonstrated in a 0.5-1.OV biomedical signal processing platform in 0.13p-Lm CMOS. The use of accelerators for key signal processing enabled greater than 10x energy reduction in two complete EEG and EKG analysis applications, as compared to implementations on a conventional processor.by Joyce Y. S. Kwong.Ph.D