6 research outputs found
On-chip Voltage Regulator– Circuit Design and Automation
Title from PDF of title page viewed May 24, 2021Dissertation advisors: Masud H Chowdhury and Yugyung LeeVitaIncludes bibliographical references (page 106-121)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2021With the increase of density and complexity of high-performance integrated circuits and systems, including many-core chips and system-on-chip (SoC), it is becoming difficult to meet the power delivery and regulation requirements with off-chip regulators. The off-chip regulators become a less attractive choice because of the higher overheads and complexity imposed by the additional wires, pins, and pads. The increased I2R loss makes it challenging to maintain the integrity of different voltage domains under a lower supply voltage environment in the smaller technology nodes. Fully integrated on-chip voltage regulators have proven to be an effective solution to mitigate power delivery and integrity issues. Two types of regulators are considered as most promising for on-chip implementation: (i) the low-drop-out (LDO) regulator and (ii) the switched-capacitor (SC)regulator. The first part of our research mainly focused on the LDO regulator. Inspired by the recent surge of interest for cap-less voltage regulators, we presented two fully on-chip external capacitor-less low-dropout voltage regulator design.
The second part of this proposal explores the complexity of designing each block of the regulator/analog circuit and proposed a design methodology for analog circuit synthesis using simulation and learning-based approach. As the complexity is increasing day-by-day in an analog circuit, hierarchical flow mostly uses for design automation. In this work, we focused mainly on Circuit-level, one of the significant steps in the flow. We presented a novel, efficient circuit synthesis flow based on simulation and learning-based optimization methods. The proposed methodology has two phases: the learning phase and the evaluation phase. Random forest, a supervised learning is used to reduce the sample points in the design space and iteration number during the learning phase. Additionally, symmetric constraints are used further to reduce the iteration number during the sizing process. We introduced a three-step circuit synthesis flow to automate the analog circuit design. We used H-spice as a simulation tool during the evaluation phase of the proposed methodology. The three most common analog circuits are chosen: single-stage differential amplifier, operational transconductance amplifier, and two-stage differential amplifier to verify the algorithm. The tool is developed in Python, and the technology we used is0.6um. We also verified the optimized result in Cadence Virtuoso.Introduction -- On-chip power delivery system -- Fundamentals of on-chip voltage regulator -- LDO design in 45NM technology -- LDO design in technology -- Analog design automation -- Proposed analog design methodology -- Energy efficient FDSOI and FINFET based power gating circuit using data retention transistor -- Conclusion and future wor
Modeling, Design and Optimization of IC Power Delivery with On-Chip Regulation
As IC technology continues to follow the Moore’s Law, IC designers have been constantly challenged with power delivery issues. While useful power must be reliably delivered to the on-die functional circuits to fulfill the desired functionality and performance, additional power overheads arise due to the loss associated with voltage conversion and parasitic resistance in the metal wires. Hence, one of the key IC power delivery design challenges is to develop voltage conversion/regulation circuits and the corresponding design strategies to provide a guaranteed level of power integrity while achieving high power efficiency and low area overhead.
On-chip voltage regulation, a significant ongoing design trend, offers appealing active supply noise suppression close to the loads and is well positioned to address many power delivery challenges. However, to realize the full potential of on-chip voltage regulation requires systemic optimization of and tradeoffs among settling time, steady-state error, power supply noise, power efficiency, stability and area overhead, which are the key focuses of this dissertation. First, we develop new low-dropout voltage regulators (LDOs) that are well optimized for low power applications. To this end, dropout voltage, bias current and speed are important competing design objectives. This dissertation presents new flipped voltage follower (FVF) based topologies of on-chip voltage regulators that handle ultra-fast load transients in nanoseconds while achieving significant improvement on bias current consumption. An active frequency compensation is embedded to achieve high area efficiency by employing a smaller amount of compensation capacitors, the major silicon area contributor. Furthermore, in one of the proposed topologies an auxiliary digital feedback loop is employed in order to lower quiescent power consumption further.
Second, coping with supply noise is becoming increasingly more difficult as design complexity grows, which leads to increased spatial and temporal load heterogeneity, and hence larger voltage variations in a given power domain. Addressing this challenge through a distributed methodology wherein multiple voltage regulators are placed across the same voltage domain is particularly promising. This distributive nature allows for even faster suppression of multiple hot spots by the nearby regulators within the power domain and can significantly boost power integrity. Nevertheless, reasoning about the stability of such distributively regulated power networks becomes rather complicated as a result of complex interactions between multiple active regulators and the large passive subnetwork. Coping with this stability challenge requires new theory and stability-ensuring design practice, as targeted by this dissertation. For the first time, we adopt and develop a hybrid stability framework for large power delivery networks with distributed voltage regulation. This framework is local in the sense that both the checking and assurance of network stability can be dealt with on the basis of each individual voltage regulator, leading to feasible design of large power delivery networks that would be computationally impossible otherwise. Accordingly, we propose a new hybrid stability margin concept, examine its tradeoffs with power efficiency, supply noise and silicon area, and demonstrate the resulted key design implications pertaining to new stability-ensuring LDO circuit design techniques and circuit topologies. Finally, we develop an automated hybrid stability design flow that is computationally efficient and provides a practical guarantee of network stability
Distributed IC Power Delivery: Stability-Constrained Design Optimization and Workload-Aware Power Management
ABSTRACT
Power delivery presents key design challenges in today’s systems ranging from high performance micro-processors to mobile systems-on-a-chips (SoCs). A robust power delivery system is essential to ensure reliable operation of on-die devices. Nowadays it has become an important design trend to place multiple voltage regulators on-chip in a distributive manner to cope with power supply noise. However, stability concern arises because of the complex interactions be-tween multiple voltage regulators and bulky network of the surrounding passive parasitics. The recently developed hybrid stability theorem (HST) is promising to deal with the stability of such system by efficiently capturing the effects of all interactions, however, large overdesign and hence severe performance degradation are caused by the intrinsic conservativeness of the underlying HST framework. To address such challenge, this dissertation first extends the HST by proposing a frequency-dependent system partitioning technique to substantially reduce the pessimism in stability evaluation. By systematically exploring the theoretical foundation of the HST framework, we recognize all the critical constraints under which the partitioning technique can be performed rigorously to remove conservativeness while maintaining key theoretical properties of the partitioned subsystems. Based on that, we develop an efficient stability-ensuring automatic design flow for large power delivery systems with distributed on-chip regulation. In use of the proposed approach, we further discover new design insights for circuit designers such as how regulator topology, on-chip decoupling capacitance, and the number of integrated voltage regulators can be optimized for improved system tradeoffs between stability and performances.
Besides stability, power efficiency must be improved in every possible way while maintaining high power quality. It can be argued that the ultimate power integrity and efficiency may be best achieved via a heterogeneous chain of voltage processing starting from on-board switching voltage regulators (VRs), to on-chip switching VRs, and finally to networks of distributed on-chip linear VRs. As such, we propose a heterogeneous voltage regulation (HVR) architecture encompassing regulators with complimentary characteristics in response time, size, and efficiency. By exploring the rich heterogeneity and tunability in HVR, we develop systematic workload-aware control policies to adapt heterogeneous VRs with respect to workload change at multiple temporal scales to significantly improve system power efficiency while providing a guarantee for power integrity. The proposed techniques are further supported by hardware-accelerated machine learning prediction of non-uniform spatial workload distributions for more accurate HVR adaptation at fine time granularity. Our evaluations based on the PARSEC benchmark suite show that the proposed adaptive 3-stage HVR reduces the total system energy dissipation by up to 23.9% and 15.7% on average compared with the conventional static two-stage voltage regulation using off- and on-chip switching VRs. Compared with the 3-stage static HVR, our runtime control reduces system energy by up to 17.9% and 12.2% on average. Furthermore, the proposed machine learning prediction offers up to 4.1% reduction of system energy
Toward realizing power scalable and energy proportional high-speed wireline links
Growing computational demand and proliferation of cloud computing has placed high-speed
serial links at the center stage. Due to saturating energy efficiency improvements over the
last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as
output drivers, receiver, or clock generation and distribution. However, this approach yields
very limited efficiency improvement. This dissertation takes an alternative approach toward
reducing the serial link power. Instead of optimizing the power of individual building blocks,
power of the entire serial link is reduced by exploiting serial link usage by the applications.
It has been demonstrated that serial links in servers are underutilized. On average, they
are used only 15% of the time, i.e. these links are idle for approximately 85% of the time.
Conventional links consume power during idle periods to maintain synchronization between
the transmitter and the receiver. However, by powering-off the link when idle and powering
it back when needed, power consumption of the serial link can be scaled proportionally to
its utilization. This approach of rapid power state transitioning is known as the rapid-on/off
approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power,
and power state transition energy must all be close to zero. However, in practice, it is very
difficult to achieve these ideal conditions. Work presented in this dissertation addresses these
challenges.
When this research work was started (2011-12), there were only a couple of research papers
available in the area of rapid-on/off links. Systematic study or design of a rapid power state
transitioning in serial links was not available in the literature. Since rapid-on/off with
nanoseconds granularity is not a standard in any wireline communication, even the popular
test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However,
these challenges provided a unique opportunity to explore new architectural techniques and
identify trade-offs. The key contributions of this dissertation are as follows.
The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to
find alternative ways to reduce the serial link power.
The second contribution is to identify potential power saving techniques and evaluate the
challenges they pose and the opportunities they present.
The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature.
The transmitter achieves rapid-on/off capability in voltage mode output driver by using
a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and
periodic reference insertion. To ease timing requirements, an improved edge replacement
logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time
as a function of various circuit parameters is also discussed. The proposed transmitter
demonstrates energy proportional operation over wide variations of link utilization, and is,
therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the
voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns,
respectively. This dissertation highlights key trade-off in the clock multiplier architecture,
to achieve fast power-on-lock capability at the cost of jitter performance.
The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi-
plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita-
tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves
power-on-lock in 1ns.
The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded
clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit-
ter and receiver. It was the first reported design of a complete transceiver, with an embedded
clock architecture, having rapid-on/off capability. Background phase calibration technique in
PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on.
The proposed transceiver demonstrates power scalability with a wide range of link utiliza-
tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver
achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by
only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes
by 100x (7Gb/s-to-70Mb/s).
The sixth and final contribution is the design of a temperature sensor to compensate
the frequency drifts due to temperature variations, during long power-off periods, in the
fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor
is designed with all digital logic gates and achieves low supply sensitivity. This sensor is
suitable for integration in processor and DRAM environments. The proposed sensor works
on the principle of directly converting temperature information to frequency and finally
to digital bits. A novel sensing technique is proposed in which temperature information
is acquired by creating a threshold voltage difference between the transistors used in the
oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and
the overhead of voltage regulators and an external ideal reference frequency is avoided. The
effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated
in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V
to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ±0.9oC
and ±2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity
correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 ,
and measurement (conversion) time of 6.5μs
Efficient and Scalable Computing for Resource-Constrained Cyber-Physical Systems: A Layered Approach
With the evolution of computing and communication technology, cyber-physical systems such as self-driving cars, unmanned aerial vehicles, and mobile cognitive robots are achieving increasing levels of multifunctionality and miniaturization, enabling them to execute versatile tasks in a resource-constrained environment. Therefore, the computing systems that power these resource-constrained cyber-physical systems (RCCPSs) have to achieve high efficiency and scalability. First of all, given a fixed amount of onboard energy, these computing systems should not only be power-efficient but also exhibit sufficiently high performance to gracefully handle complex algorithms for learning-based perception and AI-driven decision-making. Meanwhile, scalability requires that the current computing system and its components can be extended both horizontally, with more resources, and vertically, with emerging advanced technology. To achieve efficient and scalable computing systems in RCCPSs, my research broadly investigates a set of techniques and solutions via a bottom-up layered approach. This layered approach leverages the characteristics of each system layer (e.g., the circuit, architecture, and operating system layers) and their interactions to discover and explore the optimal system tradeoffs among performance, efficiency, and scalability. At the circuit layer, we investigate the benefits of novel power delivery and management schemes enabled by integrated voltage regulators (IVRs). Then, between the circuit and microarchitecture/architecture layers, we present a voltage-stacked power delivery system that offers best-in-class power delivery efficiency for many-core systems. After this, using Graphics Processing Units (GPUs) as a case study, we develop a real-time resource scheduling framework at the architecture and operating system layers for heterogeneous computing platforms with guaranteed task deadlines. Finally, fast dynamic voltage and frequency scaling (DVFS) based power management across the circuit, architecture, and operating system layers is studied through a learning-based hierarchical power management strategy for multi-/many-core systems
Addressing the RRAM Reliability and Radiation Soft-Errors in the Memory Systems
With the continuous and aggressive technology scaling, the design of memory systems becomes very challenging. The desire to have high-capacity, reliable, and energy efficient memory arrays is rising rapidly. However, from the technology side, the increasing leakage power and the restrictions resulting from the manufacturing limitations complicate the design of memory systems. In addition to this, with the new machine learning applications, which require tremendous amount of mathematical operations to be completed in a timely manner, the interest in neuromorphic systems has increased in recent years. Emerging Non- Volatile Memory (NVM) devices have been suggested to be incorporated in the design of memory arrays due to their small size and their ability to reduce leakage power since they can retain their data even in the absence of power supply.
Compared to other novel NVM devices, the Resistive Random Access Memory (RRAM) device has many advantages including its low-programming requirements, the large ratio between its high and low resistive states, and its compatibility with the Complementary Metal Oxide Semiconductor (CMOS) fabrication process. RRAM device suffers from other disadvantages including the instability in its switching dynamics and its sensitivity to process variations. Yet, one of the popular issues hindering the deployment of RRAM arrays in products are the RRAM reliability and radiation soft-errors. The RRAM reliability soft-errors result from the diffusion of oxygen vacations out of the conductive channels within the oxide material of the device. On the other hand, the radiation soft-errors are caused by the highly energetic cosmic rays incident on the junction of the MOS device used as a selector for the RRAM cell. Both of those soft-errors cause the unintentional change of the
resistive state of the RRAM device. While there is research work in literature to address some of the RRAM disadvantages such as the switching dynamic instability, there is no dedicated work discussing the impact of RRAM soft-errors on the various designs to which the RRAM device is integrated and how the soft-errors can be automatically detected and
fixed.
In this thesis, we bring the attention to the need of considering the RRAM soft-errors to avoid the degradation in design performance. In addition to this, using previously reported SPICE models, which were experimentally verified, and widely adapted system level simulators and test benches, various solutions are provided to automatically detect and
fix the degradation in design performance due to the RRAM soft-errors. The main focus in this work is to propose methodologies which solve or improve the robustness of memory systems to the RRAM soft-errors. These memories are expected to be incorporated in the current and futuristic platforms running the advanced machine learning applications. In
more details, the main contributions of this thesis can be summarized as:
- Provide in depth analysis of the impact of RRAM soft-errors on the performance of RRAM-based designs.
- Provide a new SRAM cell which uses the RRAM device to reduce the SRAM leakage power with minimal impact on its read and write operations. This new SRAM cell can be incorporated in the Graphical Processing Unit (GPU) design used currently
in the implementation of the machine learning platforms.
- Provide a circuit and system solutions to resolve the reliability and radiation soft-errors in the RRAM arrays. These solution can automatically detect and fix the soft-errors with minimum impact on the delay and energy consumption of the memory
array.
- A framework is developed to estimate the effect of RRAM soft-errors on the performance of RRAM-based neuromorphic systems. This actually provides, for the first time, a very generic methodology through which the device level RRAM soft-errors
are mapped to the overall performance of the neuromorphic systems. Our analysis show that the accuracy of the RRAM-based neuromorphic system can degrade by more than 48% due to RRAM soft-errors.
- Two algorithms are provided to automatically detect and restore the degradation in RRAM-based neuromorphic systems due to RRAM soft-errors. The system and circuit level techniques to implement these algorithms are also explained in this work.
In conclusion, this work offers initial steps for enabling the usage of RRAM devices in products by tackling one of its most known challenges: RRAM reliability and radiation soft-errors. Despite using experimentally verified SPICE models and widely popular system simulators and test benches, the provided solutions in this thesis need to be verified
in the future work through fabrication to study the impact of other RRAM technology shortcomings including: a) the instability in its switching dynamics due to the stochastic nature of oxygen vacancies movement, and b) its sensitivity to process variations