6 research outputs found

    On-chip Voltage Regulator– Circuit Design and Automation

    Get PDF
    Title from PDF of title page viewed May 24, 2021Dissertation advisors: Masud H Chowdhury and Yugyung LeeVitaIncludes bibliographical references (page 106-121)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2021With the increase of density and complexity of high-performance integrated circuits and systems, including many-core chips and system-on-chip (SoC), it is becoming difficult to meet the power delivery and regulation requirements with off-chip regulators. The off-chip regulators become a less attractive choice because of the higher overheads and complexity imposed by the additional wires, pins, and pads. The increased I2R loss makes it challenging to maintain the integrity of different voltage domains under a lower supply voltage environment in the smaller technology nodes. Fully integrated on-chip voltage regulators have proven to be an effective solution to mitigate power delivery and integrity issues. Two types of regulators are considered as most promising for on-chip implementation: (i) the low-drop-out (LDO) regulator and (ii) the switched-capacitor (SC)regulator. The first part of our research mainly focused on the LDO regulator. Inspired by the recent surge of interest for cap-less voltage regulators, we presented two fully on-chip external capacitor-less low-dropout voltage regulator design. The second part of this proposal explores the complexity of designing each block of the regulator/analog circuit and proposed a design methodology for analog circuit synthesis using simulation and learning-based approach. As the complexity is increasing day-by-day in an analog circuit, hierarchical flow mostly uses for design automation. In this work, we focused mainly on Circuit-level, one of the significant steps in the flow. We presented a novel, efficient circuit synthesis flow based on simulation and learning-based optimization methods. The proposed methodology has two phases: the learning phase and the evaluation phase. Random forest, a supervised learning is used to reduce the sample points in the design space and iteration number during the learning phase. Additionally, symmetric constraints are used further to reduce the iteration number during the sizing process. We introduced a three-step circuit synthesis flow to automate the analog circuit design. We used H-spice as a simulation tool during the evaluation phase of the proposed methodology. The three most common analog circuits are chosen: single-stage differential amplifier, operational transconductance amplifier, and two-stage differential amplifier to verify the algorithm. The tool is developed in Python, and the technology we used is0.6um. We also verified the optimized result in Cadence Virtuoso.Introduction -- On-chip power delivery system -- Fundamentals of on-chip voltage regulator -- LDO design in 45NM technology -- LDO design in technology -- Analog design automation -- Proposed analog design methodology -- Energy efficient FDSOI and FINFET based power gating circuit using data retention transistor -- Conclusion and future wor

    Modeling, Design and Optimization of IC Power Delivery with On-Chip Regulation

    Get PDF
    As IC technology continues to follow the Moore’s Law, IC designers have been constantly challenged with power delivery issues. While useful power must be reliably delivered to the on-die functional circuits to fulfill the desired functionality and performance, additional power overheads arise due to the loss associated with voltage conversion and parasitic resistance in the metal wires. Hence, one of the key IC power delivery design challenges is to develop voltage conversion/regulation circuits and the corresponding design strategies to provide a guaranteed level of power integrity while achieving high power efficiency and low area overhead. On-chip voltage regulation, a significant ongoing design trend, offers appealing active supply noise suppression close to the loads and is well positioned to address many power delivery challenges. However, to realize the full potential of on-chip voltage regulation requires systemic optimization of and tradeoffs among settling time, steady-state error, power supply noise, power efficiency, stability and area overhead, which are the key focuses of this dissertation. First, we develop new low-dropout voltage regulators (LDOs) that are well optimized for low power applications. To this end, dropout voltage, bias current and speed are important competing design objectives. This dissertation presents new flipped voltage follower (FVF) based topologies of on-chip voltage regulators that handle ultra-fast load transients in nanoseconds while achieving significant improvement on bias current consumption. An active frequency compensation is embedded to achieve high area efficiency by employing a smaller amount of compensation capacitors, the major silicon area contributor. Furthermore, in one of the proposed topologies an auxiliary digital feedback loop is employed in order to lower quiescent power consumption further. Second, coping with supply noise is becoming increasingly more difficult as design complexity grows, which leads to increased spatial and temporal load heterogeneity, and hence larger voltage variations in a given power domain. Addressing this challenge through a distributed methodology wherein multiple voltage regulators are placed across the same voltage domain is particularly promising. This distributive nature allows for even faster suppression of multiple hot spots by the nearby regulators within the power domain and can significantly boost power integrity. Nevertheless, reasoning about the stability of such distributively regulated power networks becomes rather complicated as a result of complex interactions between multiple active regulators and the large passive subnetwork. Coping with this stability challenge requires new theory and stability-ensuring design practice, as targeted by this dissertation. For the first time, we adopt and develop a hybrid stability framework for large power delivery networks with distributed voltage regulation. This framework is local in the sense that both the checking and assurance of network stability can be dealt with on the basis of each individual voltage regulator, leading to feasible design of large power delivery networks that would be computationally impossible otherwise. Accordingly, we propose a new hybrid stability margin concept, examine its tradeoffs with power efficiency, supply noise and silicon area, and demonstrate the resulted key design implications pertaining to new stability-ensuring LDO circuit design techniques and circuit topologies. Finally, we develop an automated hybrid stability design flow that is computationally efficient and provides a practical guarantee of network stability

    Distributed IC Power Delivery: Stability-Constrained Design Optimization and Workload-Aware Power Management

    Get PDF
    ABSTRACT Power delivery presents key design challenges in today’s systems ranging from high performance micro-processors to mobile systems-on-a-chips (SoCs). A robust power delivery system is essential to ensure reliable operation of on-die devices. Nowadays it has become an important design trend to place multiple voltage regulators on-chip in a distributive manner to cope with power supply noise. However, stability concern arises because of the complex interactions be-tween multiple voltage regulators and bulky network of the surrounding passive parasitics. The recently developed hybrid stability theorem (HST) is promising to deal with the stability of such system by efficiently capturing the effects of all interactions, however, large overdesign and hence severe performance degradation are caused by the intrinsic conservativeness of the underlying HST framework. To address such challenge, this dissertation first extends the HST by proposing a frequency-dependent system partitioning technique to substantially reduce the pessimism in stability evaluation. By systematically exploring the theoretical foundation of the HST framework, we recognize all the critical constraints under which the partitioning technique can be performed rigorously to remove conservativeness while maintaining key theoretical properties of the partitioned subsystems. Based on that, we develop an efficient stability-ensuring automatic design flow for large power delivery systems with distributed on-chip regulation. In use of the proposed approach, we further discover new design insights for circuit designers such as how regulator topology, on-chip decoupling capacitance, and the number of integrated voltage regulators can be optimized for improved system tradeoffs between stability and performances. Besides stability, power efficiency must be improved in every possible way while maintaining high power quality. It can be argued that the ultimate power integrity and efficiency may be best achieved via a heterogeneous chain of voltage processing starting from on-board switching voltage regulators (VRs), to on-chip switching VRs, and finally to networks of distributed on-chip linear VRs. As such, we propose a heterogeneous voltage regulation (HVR) architecture encompassing regulators with complimentary characteristics in response time, size, and efficiency. By exploring the rich heterogeneity and tunability in HVR, we develop systematic workload-aware control policies to adapt heterogeneous VRs with respect to workload change at multiple temporal scales to significantly improve system power efficiency while providing a guarantee for power integrity. The proposed techniques are further supported by hardware-accelerated machine learning prediction of non-uniform spatial workload distributions for more accurate HVR adaptation at fine time granularity. Our evaluations based on the PARSEC benchmark suite show that the proposed adaptive 3-stage HVR reduces the total system energy dissipation by up to 23.9% and 15.7% on average compared with the conventional static two-stage voltage regulation using off- and on-chip switching VRs. Compared with the 3-stage static HVR, our runtime control reduces system energy by up to 17.9% and 12.2% on average. Furthermore, the proposed machine learning prediction offers up to 4.1% reduction of system energy

    Toward realizing power scalable and energy proportional high-speed wireline links

    Get PDF
    Growing computational demand and proliferation of cloud computing has placed high-speed serial links at the center stage. Due to saturating energy efficiency improvements over the last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as output drivers, receiver, or clock generation and distribution. However, this approach yields very limited efficiency improvement. This dissertation takes an alternative approach toward reducing the serial link power. Instead of optimizing the power of individual building blocks, power of the entire serial link is reduced by exploiting serial link usage by the applications. It has been demonstrated that serial links in servers are underutilized. On average, they are used only 15% of the time, i.e. these links are idle for approximately 85% of the time. Conventional links consume power during idle periods to maintain synchronization between the transmitter and the receiver. However, by powering-off the link when idle and powering it back when needed, power consumption of the serial link can be scaled proportionally to its utilization. This approach of rapid power state transitioning is known as the rapid-on/off approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power, and power state transition energy must all be close to zero. However, in practice, it is very difficult to achieve these ideal conditions. Work presented in this dissertation addresses these challenges. When this research work was started (2011-12), there were only a couple of research papers available in the area of rapid-on/off links. Systematic study or design of a rapid power state transitioning in serial links was not available in the literature. Since rapid-on/off with nanoseconds granularity is not a standard in any wireline communication, even the popular test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However, these challenges provided a unique opportunity to explore new architectural techniques and identify trade-offs. The key contributions of this dissertation are as follows. The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to find alternative ways to reduce the serial link power. The second contribution is to identify potential power saving techniques and evaluate the challenges they pose and the opportunities they present. The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature. The transmitter achieves rapid-on/off capability in voltage mode output driver by using a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and periodic reference insertion. To ease timing requirements, an improved edge replacement logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time as a function of various circuit parameters is also discussed. The proposed transmitter demonstrates energy proportional operation over wide variations of link utilization, and is, therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns, respectively. This dissertation highlights key trade-off in the clock multiplier architecture, to achieve fast power-on-lock capability at the cost of jitter performance. The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi- plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita- tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves power-on-lock in 1ns. The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit- ter and receiver. It was the first reported design of a complete transceiver, with an embedded clock architecture, having rapid-on/off capability. Background phase calibration technique in PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on. The proposed transceiver demonstrates power scalability with a wide range of link utiliza- tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes by 100x (7Gb/s-to-70Mb/s). The sixth and final contribution is the design of a temperature sensor to compensate the frequency drifts due to temperature variations, during long power-off periods, in the fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor is designed with all digital logic gates and achieves low supply sensitivity. This sensor is suitable for integration in processor and DRAM environments. The proposed sensor works on the principle of directly converting temperature information to frequency and finally to digital bits. A novel sensing technique is proposed in which temperature information is acquired by creating a threshold voltage difference between the transistors used in the oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and the overhead of voltage regulators and an external ideal reference frequency is avoided. The effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ±0.9oC and ±2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 , and measurement (conversion) time of 6.5μs

    Efficient and Scalable Computing for Resource-Constrained Cyber-Physical Systems: A Layered Approach

    Get PDF
    With the evolution of computing and communication technology, cyber-physical systems such as self-driving cars, unmanned aerial vehicles, and mobile cognitive robots are achieving increasing levels of multifunctionality and miniaturization, enabling them to execute versatile tasks in a resource-constrained environment. Therefore, the computing systems that power these resource-constrained cyber-physical systems (RCCPSs) have to achieve high efficiency and scalability. First of all, given a fixed amount of onboard energy, these computing systems should not only be power-efficient but also exhibit sufficiently high performance to gracefully handle complex algorithms for learning-based perception and AI-driven decision-making. Meanwhile, scalability requires that the current computing system and its components can be extended both horizontally, with more resources, and vertically, with emerging advanced technology. To achieve efficient and scalable computing systems in RCCPSs, my research broadly investigates a set of techniques and solutions via a bottom-up layered approach. This layered approach leverages the characteristics of each system layer (e.g., the circuit, architecture, and operating system layers) and their interactions to discover and explore the optimal system tradeoffs among performance, efficiency, and scalability. At the circuit layer, we investigate the benefits of novel power delivery and management schemes enabled by integrated voltage regulators (IVRs). Then, between the circuit and microarchitecture/architecture layers, we present a voltage-stacked power delivery system that offers best-in-class power delivery efficiency for many-core systems. After this, using Graphics Processing Units (GPUs) as a case study, we develop a real-time resource scheduling framework at the architecture and operating system layers for heterogeneous computing platforms with guaranteed task deadlines. Finally, fast dynamic voltage and frequency scaling (DVFS) based power management across the circuit, architecture, and operating system layers is studied through a learning-based hierarchical power management strategy for multi-/many-core systems

    Addressing the RRAM Reliability and Radiation Soft-Errors in the Memory Systems

    Get PDF
    With the continuous and aggressive technology scaling, the design of memory systems becomes very challenging. The desire to have high-capacity, reliable, and energy efficient memory arrays is rising rapidly. However, from the technology side, the increasing leakage power and the restrictions resulting from the manufacturing limitations complicate the design of memory systems. In addition to this, with the new machine learning applications, which require tremendous amount of mathematical operations to be completed in a timely manner, the interest in neuromorphic systems has increased in recent years. Emerging Non- Volatile Memory (NVM) devices have been suggested to be incorporated in the design of memory arrays due to their small size and their ability to reduce leakage power since they can retain their data even in the absence of power supply. Compared to other novel NVM devices, the Resistive Random Access Memory (RRAM) device has many advantages including its low-programming requirements, the large ratio between its high and low resistive states, and its compatibility with the Complementary Metal Oxide Semiconductor (CMOS) fabrication process. RRAM device suffers from other disadvantages including the instability in its switching dynamics and its sensitivity to process variations. Yet, one of the popular issues hindering the deployment of RRAM arrays in products are the RRAM reliability and radiation soft-errors. The RRAM reliability soft-errors result from the diffusion of oxygen vacations out of the conductive channels within the oxide material of the device. On the other hand, the radiation soft-errors are caused by the highly energetic cosmic rays incident on the junction of the MOS device used as a selector for the RRAM cell. Both of those soft-errors cause the unintentional change of the resistive state of the RRAM device. While there is research work in literature to address some of the RRAM disadvantages such as the switching dynamic instability, there is no dedicated work discussing the impact of RRAM soft-errors on the various designs to which the RRAM device is integrated and how the soft-errors can be automatically detected and fixed. In this thesis, we bring the attention to the need of considering the RRAM soft-errors to avoid the degradation in design performance. In addition to this, using previously reported SPICE models, which were experimentally verified, and widely adapted system level simulators and test benches, various solutions are provided to automatically detect and fix the degradation in design performance due to the RRAM soft-errors. The main focus in this work is to propose methodologies which solve or improve the robustness of memory systems to the RRAM soft-errors. These memories are expected to be incorporated in the current and futuristic platforms running the advanced machine learning applications. In more details, the main contributions of this thesis can be summarized as: - Provide in depth analysis of the impact of RRAM soft-errors on the performance of RRAM-based designs. - Provide a new SRAM cell which uses the RRAM device to reduce the SRAM leakage power with minimal impact on its read and write operations. This new SRAM cell can be incorporated in the Graphical Processing Unit (GPU) design used currently in the implementation of the machine learning platforms. - Provide a circuit and system solutions to resolve the reliability and radiation soft-errors in the RRAM arrays. These solution can automatically detect and fix the soft-errors with minimum impact on the delay and energy consumption of the memory array. - A framework is developed to estimate the effect of RRAM soft-errors on the performance of RRAM-based neuromorphic systems. This actually provides, for the first time, a very generic methodology through which the device level RRAM soft-errors are mapped to the overall performance of the neuromorphic systems. Our analysis show that the accuracy of the RRAM-based neuromorphic system can degrade by more than 48% due to RRAM soft-errors. - Two algorithms are provided to automatically detect and restore the degradation in RRAM-based neuromorphic systems due to RRAM soft-errors. The system and circuit level techniques to implement these algorithms are also explained in this work. In conclusion, this work offers initial steps for enabling the usage of RRAM devices in products by tackling one of its most known challenges: RRAM reliability and radiation soft-errors. Despite using experimentally verified SPICE models and widely popular system simulators and test benches, the provided solutions in this thesis need to be verified in the future work through fabrication to study the impact of other RRAM technology shortcomings including: a) the instability in its switching dynamics due to the stochastic nature of oxygen vacancies movement, and b) its sensitivity to process variations
    corecore