39 research outputs found
Recommended from our members
Integrated circuits for efficient power delivery using pulse-width-modulation
Circuits and architectures for efficient power delivery have become crucial in emerging smart systems. Switching power amplifiers (PA) are very attractive for such applications, because they exhibit better efficiency compared to linear PA designs, due to saturated operation. Switching PAs also allow for utilization of deep submicron CMOS technologies, due to which these designs can be easily integrated with digital circuits, and can benefit from process scaling, in performance as well as in area.
Pulse-width-modulation (PWM) is commonly used with switching PAs. A PWM signal typically employs a high-frequency switching pulse waveform as a carrier signal, wherein the pulse-width or duty-cycle of each pulse is modulated by a given low-frequency input signal. The carrier frequency can vary from several kHz to GHz, and is typically determined by the target application.
In this thesis, efficient power-delivery circuits that use PWM with switching class-D stages are presented. Advanced circuit techniques, as well as architectures for PWM are proposed to enhance efficiency and circumvent the limitations of conventional architectures.
A digitally-intensive transmitter using RF-PWM with a class-D PA is described in the first part of the thesis. The use of carrier switching for alleviating the dynamic range limitation that can be observed in classical RF-PWM implementations is introduced. The approach employs the full carrier frequency for half of the amplitude range, and the second harmonic of half of the carrier frequency, for the remainder of the amplitude range. This concept not only allows the transmitter to drive modulated signals with large peak-to-average power ratio (PAPR), but also improves the back-off efficiency due to reduced switching losses in the half carrier-frequency mode. A glitch-free phase selector is proposed that removes the deleterious glitches that can occur at the input data transitions. The phase-selector also prevents D flip-flop setup-and-hold time violations. The transmitter has been implemented in a 130-nm CMOS process. The measured peak output power and power-added-efficiency (PAE) are 25.6 dBm and 34%, respectively. While driving 802.11g 20-MHz 64-QAM OFDM signals, the average measured output power is 18.3 dBm and the PAE is 16%, with an EVM of -25.5 dB.
The second part of the thesis describes a high-speed driver that provides a PWM output using a class-D PA. A PLL-based architecture is employed which eliminates the requirement for a precise ramp or triangular signal generator, and a high-speed comparator, which are typically used for PWM generation. Multi-level signaling is proposed to enhance back-off as well as peak efficiency, which is critical for signals with high PAPR. A differential, folded PWM scheme is introduced to achieve highly linear operation. 3-level operation is achieved without the requirement for additional supply source or sink paths, while 5-level operation is achieved with additional supply source and sink paths, compared to 2-level operation. The PWM driver has been implemented in a 130-nm CMOS process and can operate with a switching frequency of 40-to-170 MHz. For 2/3/5-level PA operation, with a 500 kHz sinusoidal input and 60 MHz switching frequency, the measured THD is -61/-62/-53 dB and corresponding efficiency is 71/83/86% with 175/200/220 mW output power level, respectively. Performance has also been verified for 2/3-level PA operation with a high PAPR signal with 500 kHz bandwidth. While intended as a general purpose amplifier, the approach is well-suited for applications such as power-line communications (PLC).
The final part of the thesis introduces an efficient buck/buck-boost reconfigurable LED driver that supports PWM and PFM operation. The driver is based on peak current control. Rectified sin as well as sinΒ² functions are employed in the reference signal to improve the power factor (PF) and total harmonic distortion (THD) of the buck and buck-boost converters. The design ensures that the peak of the inductor current maintains a constant level that is invariant for different AC line voltages. The operating mode of the design can be changed between PWM and PFM. The LED driver has been implemented in a 130-nm CMOS process. PF and THD are improved when the proposed reference is employed, and peak PF and lowest THD are 0.995/0.983/0.996 and 7.8/6.2/3.5% for the buck (PWM), buck (PFM), buck-boost (PFM) cases, respectively. The corresponding peak efficiency for the three cases is 88/92/91%, respectively.Electrical and Computer Engineerin
High-Performance, Energy-Efficient CMOS Arithmetic Circuits
In a modern microprocessor, datapath/arithmetic circuits have always been an important building block in delivering high-performance, energy-efficient computing, because arithmetic operations such as addition and binary number comparison are two of the most commonly used computing instructions. Besides the manufacturing CMOS process, the two most critical design considerations for arithmetic circuits are the logic style and micro-architecture. In this thesis, a constant-delay (CD) logic style is proposed targeting full-custom high-speed applications. The constant delay characteristic of this logic style (regardless of the logic type) makes it suitable for implementing complicated logic expressions such as addition. CD logic exhibits a unique characteristic where the output is pre-evaluated before the inputs from the preceding stage are ready. This feature enables a performance advantage over static and dynamic domino logic styles in a single cycle, multi-stage circuit block. Several design considerations including timing window width adjustment and clock distribution are discussed. Using a 65-nm general-purpose CMOS technology, the proposed logic style demonstrates an average speedup of 94% and 56% over static and dynamic domino logic, respectively, in five different logic gates. Simulation results of 8-bit ripple carry adders conclude that CD logic is 39% and 23% faster than the static and dynamic-based adders, respectively. CD logic also demonstrates 39% speedup and 64% (22%) energy-delay product reduction from static logic at 100% (10%) data activity in 32-bit carry lookahead adders. To confirm CD logic's potential, a 148 ps, single-cycle 64-bit adder with CD logic implemented in the critical path is fabricated in a 65-nm, 1-V CMOS process. A new 64-bit Ling adder micro-architecture, which utilizes both inversion and absorption properties to minimize the number of CD logic and the number of logic stage in the critical path, is also proposed. At 1-V supply, this adder's measured worst-case power and leakage power are 135 mW and 0.22 mW, respectively. A single-cycle 64-bit binary comparator utilizing a radix-2 tree structure is also proposed. This comparator architecture is specifically designed for static logic to achieve both low-power and high-performance operation, especially in low input data activity environments. At 65-nm technology with 25% (10%) data activity, the proposed design demonstrates 2.3x (3.5x) and 3.7x (5.8x) power and energy-delay product efficiency, respectively. This comparator is also 2.7x faster at iso-energy (80 fJ) or 3.3x more energy-efficient at iso-delay (200 ps) than existing designs. An improved comparator, where CD logic is utilized in the critical path to achieve high performance without sacrificing the overall energy efficiency, is also realized in a 65-nm 1-V CMOS process. At 1-V supply, the proposed comparator's measured delay is 167 ps, and has an average power and a leakage power of 2.34 mW and 0.06 mW, respectively. At 0.3-pJ iso-energy or 250-ps iso-delay budget, the proposed comparator with CD logic is 20% faster or 17% more energy-efficient compared to a comparator implemented with just the static logic
κ³ μ DRAM μΈν°νμ΄μ€λ₯Ό μν μ μ λ° μ¨λμ λκ°ν ν΄λ‘ ν¨μ€μ μμ μ€λ₯ κ΅μ κΈ° μ€κ³
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2021. 2. μ λκ· .To cope with problems caused by the high-speed operation of the dynamic random access memory (DRAM) interface, several approaches are proposed that are focused on the clock path of the DRAM. Two delay-locked loop (DLL) based schemes, a forwarded-clock (FC) receiver (RX) with self-tracking loop and a quadrature error corrector, are proposed. Moreover, an open-loop based scheme is presented for drift compensation in the clock distribution. The open-loop scheme consumes less power consumption and reduces design complexity.
The FC RX uses DLLs to compensate for voltage and temperature (VT) drift in unmatched memory interfaces. The self-tracking loop consists of two-stage cascaded DLLs to operate in a DRAM environment. With the write training and the proposed DLL, the timing relationship between the data and the sampling clock is always optimal. The proposed scheme compensates for delay drift without relying on data transitions or re-training. The proposed FC RX is fabricated in 65-nm CMOS process and has an active area containing 4 data lanes of 0.0329 mm2. After the write training is completed at the supply voltage of 1 V, the measured timing margin remains larger than 0.31-unit interval (UI) when the supply voltage drifts in the range of 0.94 V and 1.06 V from the training voltage, 1 V. At the data rate of 6.4 Gb/s, the proposed FC RX achieves an energy efficiency of 0.45 pJ/bit.
Contrary to the aforementioned scheme, an open-loop-based voltage drift compensation method is proposed to minimize power consumption and occupied area. The overall clock distribution is composed of a current mode logic (CML) path and a CMOS path. In the proposed scheme, the architecture of the CML-to-CMOS converter (C2C) and the inverter is changed to compensate for supply voltage drift. The bias generator provides bias voltages to the C2C and inverters according to supply voltage for delay adjustment. The proposed clock tree is fabricated in 40 nm CMOS process and the active area is 0.004 mm2. When the supply voltage is modulated by a sinusoidal wave with 1 MHz, 100 mV peak-to-peak swing from the center of 1.1 V, applying the proposed scheme reduces the measured root-mean-square (RMS) jitter from 3.77 psRMS to 1.61 psRMS. At 6 GHz output clock, the power consumption of the proposed scheme is 11.02 mW.
A DLL-based quadrature error corrector (QEC) with a wide correction range is proposed for the DRAM whose clocks are distributed over several millimeters. The quadrature error is corrected by adjusting delay lines using information from the phase error detector. The proposed error correction method minimizes increased jitter due to phase error correction by setting at least one of the delay lines in the quadrature clock path to the minimum delay. In addition, the asynchronous calibration on-off scheme reduces power consumption after calibration is complete. The proposed QEC is fabricated in 40 nm CMOS process and has an active area of 0.048 mm2. The proposed QEC exhibits a wide correctable error range of 101.6 ps and the remaining phase errors are less than 2.18Β° from 0.8 GHz to 2.3 GHz clock. At 2.3 GHz, the QEC contributes 0.53 psRMS jitter. Also, at 2.3 GHz, the power consumption is reduced from 8.89 mW to 3.39 mW when the calibration is off.λ³Έ λ
Όλ¬Έμμλ λμ λλ€ μ‘μΈμ€ λ©λͺ¨λ¦¬ (DRAM)μ μλκ° μ¦κ°ν¨μ λ°λΌ ν΄λ‘ ν¨μ€μμ λ°μν μ μλ λ¬Έμ μ λμ²νκΈ° μν μΈ κ°μ§ νλ‘λ€μ μ μνμλ€. μ μν νλ‘λ€ μ€ λ λ°©μλ€μ μ§μ°λ기루ν (delay-locked loop) λ°©μμ μ¬μ©νμκ³ λλ¨Έμ§ ν λ°©μμ λ©΄μ κ³Ό μ λ ₯ μλͺ¨λ₯Ό μ€μ΄κΈ° μν΄ μ€ν 루ν λ°©μμ μ¬μ©νμλ€. DRAMμ λΉμ ν© μμ κΈ° ꡬ쑰μμ λ°μ΄ν° ν¨μ€μ ν΄λ‘ ν¨μ€ κ°μ μ§μ° λΆμΌμΉλ‘ μΈν΄ μ μ λ° μ¨λ λ³νμ λ°λΌ μ
μ
νμ λ° νλ νμμ΄ μ€μ΄λλ λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν΄ μ§μ°λ기루νλ₯Ό μ¬μ©νμλ€. μ μν μ§μ°λ기루ν νλ‘λ DRAM νκ²½μμ λμνλλ‘ λ κ°μ μ§μ°λ기루νλ‘ λλμλ€. λν μ΄κΈ° μ°κΈ° νλ ¨μ ν΅ν΄ λ°μ΄ν°μ ν΄λ‘μ νμ΄λ° λ§μ§ κ΄μ μμ μ΅μ μ μμΉμ λ μ μλ€. λ°λΌμ μ μνλ λ°©μμ λ°μ΄ν° μ²μ΄ μ λ³΄κ° νμνμ§ μλ€. 65-nm CMOS 곡μ μ μ΄μ©νμ¬ λ§λ€μ΄μ§ μΉ©μ 6.4 Gb/sμμ 0.45 pJ/bitμ μλμ§ ν¨μ¨μ κ°μ§λ€. λν 1 Vμμ μ°κΈ° νλ ¨ λ° μ§μ°λ기루νλ₯Ό κ³ μ μν€κ³ 0.94 Vμμ 1.06 VκΉμ§ κ³΅κΈ μ μμ΄ λ°λμμ λ νμ΄λ° λ§μ§μ 0.31 UIλ³΄λ€ ν° κ°μ μ μ§νμλ€.
λ€μμΌλ‘ μ μνλ νλ‘λ ν΄λ‘ λΆν¬ νΈλ¦¬μμ μ μ λ³νλ‘ μΈν΄ ν΄λ‘ ν¨μ€μ μ§μ°μ΄ λ¬λΌμ§λ κ²μ μμ μ μν λ°©μκ³Ό λ¬λ¦¬ μ€ν 루ν λ°©μμΌλ‘ 보μνμλ€. κΈ°μ‘΄ ν΄λ‘ ν¨μ€μ μΈλ²ν°μ CML-to-CMOS λ³νκΈ°μ ꡬ쑰λ₯Ό λ³κ²½νμ¬ λ°μ΄μ΄μ€ μμ± νλ‘μμ μμ±ν κ³΅κΈ μ μμ λ°λΌ λ°λλ λ°μ΄μ΄μ€ μ μμ κ°μ§κ³ μ§μ°μ μ‘°μ ν μ μκ² νμλ€. 40-nm CMOS 곡μ μ μ΄μ©νμ¬ λ§λ€μ΄μ§ μΉ©μ 6 GHz ν΄λ‘μμμ μ λ ₯ μλͺ¨λ 11.02 mWλ‘ μΈ‘μ λμλ€. 1.1 V μ€μ¬μΌλ‘ 1 MHz, 100 mV νΌν¬ ν¬ νΌν¬λ₯Ό κ°μ§λ μ¬μΈν μ±λΆμΌλ‘ κ³΅κΈ μ μμ λ³μ‘°νμμ λ μ μν λ°©μμμμ μ§ν°λ κΈ°μ‘΄ λ°©μμ 3.77 psRMSμμ 1.61 psRMSλ‘ μ€μ΄λ€μλ€.
DRAMμ μ‘μ κΈ° ꡬ쑰μμ λ€μ€ μμ ν΄λ‘ κ°μ μμ μ€μ°¨λ μ‘μ λ λ°μ΄ν°μ λ°μ΄ν° μ ν¨ μ°½μ κ°μμν¨λ€. μ΄λ₯Ό ν΄κ²°νκΈ° μν΄ μ§μ°λ기루νλ₯Ό λμ
νκ² λλ©΄ μ¦κ°λ μ§μ°μΌλ‘ μΈν΄ μμμ΄ κ΅μ λ ν΄λ‘μμ μ§ν°κ° μ¦κ°νλ€. λ³Έ λ
Όλ¬Έμμλ μ¦κ°λ μ§ν°λ₯Ό μ΅μννκΈ° μν΄ μμ κ΅μ μΌλ‘ μΈν΄ μ¦κ°λ μ§μ°μ μ΅μννλ μμ κ΅μ νλ‘λ₯Ό μ μνμλ€. λν μ ν΄ μνμμ μ λ ₯ μλͺ¨λ₯Ό μ€μ΄κΈ° μν΄ μμ μ€μ°¨λ₯Ό κ΅μ νλ νλ‘λ₯Ό μ
λ ₯ ν΄λ‘κ³Ό λΉλκΈ°μμΌλ‘ λ μ μλ λ°©λ² λν μ μνμλ€. 40-nm CMOS 곡μ μ μ΄μ©νμ¬ λ§λ€μ΄μ§ μΉ©μ μμ κ΅μ λ²μλ 101.6 psμ΄κ³ 0.8 GHz λΆν° 2.3 GHzκΉμ§μ λμ μ£Όνμ λ²μμμ μμ κ΅μ κΈ°μ μΆλ ₯ ν΄λ‘μ μμ μ€μ°¨λ 2.18Β°λ³΄λ€ μλ€. μ μνλ μμ κ΅μ νλ‘λ‘ μΈν΄ μΆκ°λ μ§ν°λ 2.3 GHzμμ 0.53 psRMSμ΄κ³ κ΅μ νλ‘λ₯Ό κ»μ λ μ λ ₯ μλͺ¨λ κ΅μ νλ‘κ° μΌμ‘μ λμΈ 8.89 mWμμ 3.39 mWλ‘ μ€μ΄λ€μλ€.Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Thesis Organization 4
Chapter 2 Background on DRAM Interface 5
2.1 Overview 5
2.2 Memory Interface 7
Chapter 3 Background on DLL 11
3.1 Overview 11
3.2 Building Blocks 15
3.2.1 Delay Line 15
3.2.2 Phase Detector 17
3.2.3 Charge Pump 19
3.2.4 Loop filter 20
Chapter 4 Forwarded-Clock Receiver with DLL-based Self-tracking Loop for Unmatched Memory Interfaces 21
4.1 Overview 21
4.2 Proposed Separated DLL 25
4.2.1 Operation of the Proposed Separated DLL 27
4.2.2 Operation of the Digital Loop Filter in DLL 31
4.3 Circuit Implementation 33
4.4 Measurement Results 37
4.4.1 Measurement Setup and Sequence 38
4.4.2 VT Drift Measurement and Simulation 40
Chapter 5 Open-loop-based Voltage Drift Compensation in Clock Distribution 46
5.1 Overview 46
5.2 Prior Works 50
5.3 Voltage Drift Compensation Method 52
5.4 Circuit Implementation 57
5.5 Measurement Results 61
Chapter 6 Quadrature Error Corrector with Minimum Total Delay Tracking 68
6.1 Overview 68
6.2 Prior Works 70
6.3 Quadrature Error Correction Method 73
6.4 Circuit Implementation 82
6.5 Measurement Results 88
Chapter 7 Conclusion 96
Bibliography 98
μ΄λ‘ 102Docto