



**Master's Thesis** 

# Design of a Quadrature Error Corrector for DQS in HBM3

# HBM3에서 DQS 신호를 위한 4-위상 에러 교 정기의 설계

by

Seoyeong Jo

February 2022

Department of Electrical and Computer Engineering College of Engineering Seoul National University

# Design of a Quadrature Error Corrector for DQS in HBM3

Thesis Advisor: Prof. Jaeha Kim

A thesis submitted to the department of electrical and computer engineering

February 2022

Department of Electrical and Computer Engineering

College of Engineering

Seoul National University

Seoyeong Jo

Confirming the master's thesis written by

Seoyeong Jo

February 2022

| Chair      | (Seal) |
|------------|--------|
| Vice Chair | (Seal) |
| Examiner   | (Seal) |

### Abstract

As the speed of high bandwidth memory (HBM) increased, the skew of the quadrature data strobe (DQS) signals started to affect the internal operation of HBM. Only the skew of the quadrature clock signals sent from memory needed to be corrected before. Previously suggested quadrature error correctors are applicable only to periodic clock signals and not to aperiodic DQS signals. Therefore, a new circuit for correcting phase skew of DQS signals is needed.

This thesis presents a design methodology of a quadrature error corrector for HBM3 that can correct the phase skew of DQS signals. The proposed quadrature error corrector can correct aperiodic signals using a clock signal of the same frequency, which detects 1/4 point of the clock period in a capacitor charging method. The quadrature error corrector uses a 4:1 ratio capacitor to detect whether the phase difference of DQS signals is 1/4 of clock period. The quadrature error is corrected by adjusting delay lines using information from the phase error detector. After the calibration, the feedback loop is off to save power. Implemented in 40-nm CMOS, the post-layout simulation results demonstrate the operation range from 1.0 to 2-GHz and a corrected phase error of less than 8.69-ps for the DQS signal while consuming maximum power of 2.42-mW from a 1.6-GHz frequency and a 1.1-V supply.

**Keywords** : quadrature error corrector (QEC), clock skew, DQS, capacitor charging, low-power, small-area.

#### Student Number: 2020-27297

# Contents

| ABSTRACT                                   | Ι   |
|--------------------------------------------|-----|
| CONTENTS                                   | II  |
| LIST OF FIGURES                            | III |
| LIST OF TABLES                             | V   |
| CHAPTER 1 INTRODUCTION                     | 1   |
| 1.1 MOTIVATION                             | 1   |
| 1.2 THESIS ORGANIZATION                    | 4   |
| CHAPTER 2 OPERATION AND ARCHITECTURE OF TH | IE  |
| QUADRATURE ERROR CORRECTOR                 | 5   |
| 2.1 PRINCIPLES OF OPERATION                | 5   |
| 2.2 OVERALL STRUCTURE                      | 8   |
| CHAPTER 3 CIRCUIT IMPLEMENTATION           | 10  |
| 3.1 Pulse-width Detector                   | 10  |
| 3.1.1 Three Modes of Pulse-width Detector  | 10  |
| 3.1.2 DESIGN CONSIDERATION                 | 13  |
| 3.1.3 CURRENT DIGITAL TO ANALOG CONVERTER  | 16  |
| 3.1.4 Switch Logics                        | 19  |

| 3.1.5 SIMULATION OF PULSE-WIDTH DETECTOR |    |
|------------------------------------------|----|
| 3.2 Pulse Generator                      | 23 |
| 3.3 DIGITALLY CONTROLLED DELAY LINES     |    |
| 3.4 DIV8 & 3-bit Counter                 |    |
| 3.5 DIGITAL LOOP FILTER                  |    |
| CHAPTER 4 SIMULATION RESULTS             | 33 |
| 4.1 Test Circuits                        |    |
| 4.1.1 DQS GENERATOR                      |    |
| 4.1.2 SAMPLER                            |    |
| 4.1.3 TEST MUX                           |    |
| 4.2 CHIP LAYOUT                          |    |
| 4.3 SIMULATION RESULTS                   |    |
| 4.4 Performance Summary                  |    |
| CHAPTER 5 CONCLUSION                     | 46 |
| BIBLIOGRAPHY                             | 47 |
| ABSTRACT IN KOREAN                       | 49 |

# **List of Figures**

| FIG. 1.1 THE WAVEFORM OF QUADRATURE DQS SIGNALS (A) IN A SEAMLESS MODE AND (B),  |
|----------------------------------------------------------------------------------|
| (C) IN BURST MODE2                                                               |
| FIG. 2.1 Two Modes of the Proposed Quadrature Error Corrector                    |
| FIG. 2.2 BLOCK DIAGRAM OF THE PROPOSED OVERALL ARCHITECTURE                      |
| FIG. 3.1 THREE MODES OF PULSE-WIDTH DETECTOR                                     |
| FIG. 3.2 DETAILED CIRCUIT OF PULSE-WIDTH DETECTOR                                |
| FIG. 3.3 SIMPLIFIED CIRCUIT OF MAIN PART IN PULSE-WIDTH DETECTOR                 |
| FIG. 3.4 DETAILED CIRCUIT OF IDAC                                                |
| FIG. 3.5 THE CURRENT FLOWING THROUGH M11 AND DNL CURVE VERSUS IDAC CONTROL       |
| CODE IN (A), (B) MONTE CARLO SIMULATION (C), (D) CORNER SIMULATION17             |
| FIG. 3.6 DETAILED CIRCUIT OF SWITCH IN PULSE-WIDTH DETECTOR                      |
| FIG. 3.7 (A) THE SCHEMATIC USING ONE SWITCH IN DQS MODE AND (B) THE HISTOGRAM    |
| OFCAPACITOR VOLTAGE DIFFERENCE IN CLK AND DQS MODE20                             |
| FIG. 3.8 (A) THE SCHEMATIC USING FOUR SWITCHES IN DQS MODE AND (B) THE HISTOGRAM |
| OFCAPACITOR VOLTAGE DIFFERENCE IN CLK AND DQS MODE21                             |
| FIG. 3.9 THE DIAGRAM OF PULSE GENERATOR                                          |
| FIG. 3.10 THE SCHEMATIC OF (A) DQS PULSE GENERATOR (B) CLK PULSE GENERATOR24     |
| FIG. 3.11 (A) THE SCHEMATIC AND (B) TIMING CHART OF GLITCH-FREE CIRCUIT25        |
| FIG. 3.12 THE SCHEMATIC OF DCDL                                                  |
| FIG. 3.13 THE DELAY RANGE AND DNL CURVE VERSUS DCDL CONTROL CODE IN              |
| (A), (B) MONTE CARLO SIMULATION (C), (D) CORNER SIMULATION27                     |

| FIG. 3.14 THE SCHEMATIC OF (A) DIV8 AND (B) 3-BIT COUNTER                        |
|----------------------------------------------------------------------------------|
| FIG. 3.15 THE TIMING DIAGRAM OF DIV8 & 3-BIT COUNTER                             |
| FIG. 3.16 FLOWCHART OF DIGITAL LOOP FILTER                                       |
| FIG. 4.1 (A) THE SCHEMATIC AND (B) WAVEFORM OF DQS GENERATOR                     |
| FIG. 4.2 THE SCHEMATIC OF SAMPLER                                                |
| FIG. 4.3 THE DIAGRAM OF TEST MUX                                                 |
| FIG. 4.4 CHIP LAYOUT OF ANALOG PART OF THE PROPOSED QEC                          |
| FIG. 4.5 CHIP LAYOUT OF THE PROPOSED QEC                                         |
| FIG. 4.6 THE WAVEFORMS OF DQS BEFORE AND AFTER CORRECTION (A), (B) IN BURST MODE |
| AND (C) IN SEAMLESS MODE                                                         |
| FIG. 4.7 TRACKING BEHAVIOR OF IDAC AND DCDL CONTROL CODES                        |
| FIG. 4.8 PHASE SWEEP SIMULATION OF DQS (A) IN BURST MODE AND (B) IN SEAMLESS     |
| Mode                                                                             |
| FIG. 4.9 MAXIMUM INPUT PHASE ERROR VERSUS FREQUENCY OF DQS42                     |
| FIG. 4.10 POWER CONSUMPTION OF THE PROPOSED QEC IN VARIOUS MODES                 |

## **List of Tables**

| TABLE 4.1 PHASE ERROR BEFORE AND AFTER CORRECTION IN FIG. 4.1             | .40 |
|---------------------------------------------------------------------------|-----|
| TABLE 4.2 POWER CONSUMPTION OF THE PROPOSED QEC IN VARIOUS MODES          | .43 |
| TABLE 4.3 PERFORMANCE SUMMARY AND COMPARISON WITH RECENT STATE-OF-THE-ART | •   |
| QUADRATURE ERROR CORRECTORS                                               | .45 |

## Chapter 1

## Introduction

#### **1.1 Motivation**

For high performance of HBM3, a quadrature error corrector (QEC) that guarantees quadrature phase of data strobe signal (DQS) operating at 1.6 GHz is required. As the speed of HBM improves by 15~20% every year [1], the requirements for signal skew become more stringent. Previously, only the quadrature clock signals sent out from the memory was corrected to meet the specifications. However, the internal operation margin reduced due to speed improvement of data signal (DQ) and DQS which toggles only when data is sent affects the read operation. The quadrature DQS signals passed through through-silicon via (TSV) suffer from phase imbalance due to long travel distances and mismatch of associated buffers. Fig. 1.1 shows the waveform of DQS signals in various modes. In Fig 1.1(a), DQS signals operate in seamless mode and are periodic as clock signals. However, when DQS signals operate in burst mode like Fig. 1.1(b) and (c), the periodicity of signals can no longer be used to correct the phase error. This paper presents a quadrature error corrector that can eliminate the phase imbalance among quadrature DQS signals in any mode.



Fig. 1.1 The waveform of quadrature DQS signals (a) in seamless mode and (b), (c) in burst mode.

Previous works that address the phase imbalance of quadrature signals include quadrature error correctors in a shared feedback structure with digitally controlled delay lines [2],[3], replica serializers and pulse-shrinking delay lines [4] and a relaxation oscillator-based PD [5]. However, these methods can correct only periodic clock signals since they use the adjacent next period pulse for phase correction. As mentioned in the previous paragraph, the DQS operating in burst mode does not always toggle such that the periodicity of signals cannot be used.

To correct the quadrature phase of the DQS signals, this paper proposes a quadrature error corrector that makes the phase interval of the DQS signals 1/4 point of the clock period. The phase interval of the DQS signals becomes uniform by charging a 4:1 ratio capacitor. If calibration is off and only main path operates, power can be saved. The simulation results with post-layout netlists demonstrate correctable input phase error range of 90° and maximum phase error after correction of 8.69-ps while consuming 2.42-mW at 1.6-GHz.

#### **1.2 Thesis Organization**

This thesis is organized as follows. In Chapter 2, the operation and architecture of the proposed quadrature error corrector are explained. Two modes of the quadrature error corrector and each operation are described. Also, the roles of each block are represented. Chapter 3 describes the circuit implementation of each block in QEC. In particular, for the pulse-width detector, which is the main block, three modes of the block and design method considering charge sharing, channel-length modulation and capacitor mismatch are described. Chapter 4 provides test circuits for making input signals and measuring output signals, chip layout and simulated results of the QEC designed on a 40nm CMOS process. Chapter 5 summarizes the proposed work and concludes this thesis.

## Chapter 2

# **Operation and Architecture of the Quadrature Error Corrector**

#### **2.1 Principles of Operation**

The proposed quadrature error corrector (QEC) operates in 2 modes, CLK mode and DQS mode. The simplified diagram and operation of main block in two modes are shown in Fig. 2.1. In CLK mode, the QEC stores the period information of the clock signal in current digital to analog converter (IDAC) control code while charging four identical capacitors for the period of clock ( $T_{clk}$ ) which is pulse-width of clk\_pulse signal. The IDAC control code is adjusted to make the voltage of capacitors and threshold of comparator same by digital filter. Next, in DQS mode, it adjusts the phase of DQS signals to a 1/4 point of the clock period while charging

one of the four capacitors for the time  $(T_{dqs})$  corresponding to the phase difference of adjacent DQS signal.  $T_{dqs}$  is adjusted to make the voltage of capacitor and threshold of comparator same by controlling digitally controlled delay lines (DCDL) control codes. Since the capacitor ratio is 4:1,  $T_{dqs}$  become 1/4 of  $T_{clk}$ , the period of clock. It is possible to correct aperiodic signals by using a clock signal of the same frequency. Since it stores the information of the period, it is not affected by a duty of the signal.

After the calibration, a calibration-off scheme is applied to reduce the power



Fig. 2.1 Two modes of the proposed quadrature error corrector.

consumption [3]. When an external signal, cal\_on, goes low, DCDL control codes are fixed and feedback loop is stopped. When the quadrature error corrector is reactivated by cal\_on going high, the previous state is recovered and the calibration starts again.

#### 2.2 Overall Structure

Fig. 2.2 illustrates the overall block diagram of the proposed QEC comprised of the main path and feedback loop. The main path is composed of one delay line with fixed delay time for the signal of I and three 8-bit digital controlled delay lines (DCDLs) to adjust the delay of the signals of Q, IB, and QB, respectively. The feedback loop consists of a pulse generator, a pulse-width detector that detects 1/4 point of the clock period, a DIV8 & 3bit-counter, and a digital loop filter. The operation is as follows. The pulse generator alternately generates a pulse whose pulse-width corresponds to the clock period or phase difference of two neighboring DQS signals (I-Q, Q-IB, IB-QB). The pulse-width detector stores clock period information in current digital-to-analog converter (IDAC) control code using the pulse whose pulse-width is the clock period. Also, the pulse-width detector outputs a 1-bit digital signal which determines whether the DQS phase difference is smaller or bigger than 1/4 of the clock period using the stored clock period information. To save the power of digital loop filter, DIV8 block makes the clock for digital filter whose frequency is up to 1/8 of clock frequency. Also, 3bit-counter counts 1-bit output of pulse-width detector eight times and converts to 3-bit output. The digital loop filter controls IDAC control codes in pulse-width detector to save clock period information in CLK mode. In DQS mode, the digital filter controls DCDL control codes using the majority vote method to adjust the delay of quadrature signals.



Fig. 2.2 Block diagram of the proposed overall architecture.

## Chapter 3

## **Circuit Implementation**

#### **3.1 Pulse-width Detector**

#### 3.1.1 Three modes of pulse-width detector

The main block of the quadrature error corrector is a pulse-width detector that detects the 1/4 point of the clock period. The pulse-width detector operates in 3 modes as shown in Fig. 3.1. The pulse-width detector is composed of IDAC, four identical capacitors, and a comparator. In CLK mode, four capacitors are charged for the time corresponding to the clock period and 1-bit output is output by comparing the comparator's threshold and the voltage of capacitors affected by the amount of current. In DQS mode, only one capacitor is charged with the fixed current DAC control code and as same as the CLK mode a 1-bit output is output is output



Fig. 3.1 Three modes of pulse-width detector.

by comparing the threshold voltage of the comparator and the voltage of the capacitor affected by the charging time, the phase difference of two neighboring DQS. In REST mode, the current flows in another path to reduce errors due to charge injection, which is explained in the next paragraph.

#### **3.1.2 Design Consideration**



Fig. 3.2 Detailed circuit of pulse-width detector.

An important design issue is that the starting voltage and the amount of current must be the same in CLK and DQS modes. The detailed main part of pulse-width detector implementation is shown in Fig. 3.2. The size of each capacitor is 24 fF. When the switch is turned on to charge capacitors, the starting voltage is non-zero because of charge sharing with node A and node V<sub>c</sub>. To minimize the effect of charge sharing by keeping the voltage of node A constant, the current flows

through another path when the capacitor is not charged (REST mode) as shown in Fig. 3.1. To flow the same amount of current in CLK and DQS modes, the number of switches turned on in CLK and DQS modes is the same due to channel-length modulation. In the beginning of charging, the switch operates in saturation mode. In Fig. 3.3, simplified circuit of main part in pulse-width detector which comprised of a current mirror, a switch and a capacitor is presented. The current flowing into a capacitor is expressed as Equations (2.1) and (2.2). Equation (2.1) is in terms of a current mirror, and Equation (2.2) is in terms of a switch.  $V_{th,p}$  is the threshold voltage of PMOS. As expressed in Equation (2.1) and known as channel length modulation, the current is affected by drain voltage of current mirror,  $V_x$ . Since Equations (2.1) and (2.2) are the same and other parameters are fixed, it can be seen that  $V_X$  is affected by the  $(W/L)_{sw}$  and the voltage of the charging capacitor, V<sub>c</sub>. In other words, the current flowing into a capacitor is determined by the (W/L)sw and the voltage of the charging capacitor, V<sub>C</sub>. Therefore, in order to make the charging current equal at the same voltage of capacitor, the width and length ratio of the switch must be equal. By using the same number of turned-on switches, the current flowing in CLK and DQS mode is same. To reduce mismatch of capacitors in CLK mode the switch between node V<sub>c</sub> and V<sub>c\_clk</sub> is on. While power can be saved by reducing the current and the capacitor size at the same time, there is a trade-off with noise.

$$I_{C} = \frac{1}{2} \mu_{p} C_{ox} \left(\frac{W}{L}\right)_{cm} \left(V_{DD} - V_{CM} - V_{th,p}\right)^{2} \left(1 + \lambda (V_{DD} - V_{X})\right)$$
(2.1)

$$I_{C} = \frac{1}{2} \mu_{p} C_{ox} \left(\frac{W}{L}\right)_{sw} \left(V_{X} - V_{sw} - V_{th,p}\right)^{2} \left(1 + \lambda (V_{X} - V_{C})\right)$$
(2.2)



Fig. 3.3 Simplified circuit of main part in pulse-width detector.

#### **3.1.3** Current Digital to Analog Converter (IDAC)

As shown in Fig. 3.4, the current digital to analog converter (IDAC) uses a cascode current mirror to improve the current mirror linearity. The offset current always flows through  $M_7$ ,  $M_8$  and the amount of the current is controlled through the 32-bit thermometer digital control code. Main current mirror ( $M_{11}$ ,  $M_{12}$ ) connecting to main current source of pulse-width detector uses wide-swing current mirror which has an excellent accuracy with high output swing. The role of  $M_3$ ,  $M_4$ ,  $M_5$ ,  $M_6$  is to make gate voltage of current mirror,  $M_{12}$  when enable signal is on.



Fig. 3.4 Detailed circuit of IDAC.



Fig. 3.5 The current flowing through M<sub>11</sub> and DNL curve versus IDAC control code in (a), (b) Monte Carlo simulation (c), (d) Corner simulation.

The simulation results of IDAC with post-layout netlists are shown in Fig. 3.5. The current flowing through  $M_{11}$  and its DNL curve versus IDAC control code are plotted. The post-layout Monte Carlo simulations are performed, as the results shown in Fig. 3.5(a), (b). The results of post-layout Corner simulations are shown in Fig. 3.5(c), (d). The ideal current value that flows into main part of pulse-width detector is 84.48  $\mu$ A using the relation CV=IT where C is capacitance, V is voltage of the capacitor, Q is charge and I is current. In all cases, 84.48  $\mu$ A is included in the current range and DNL is under 1 LSB.

#### **3.1.4 Switch Logics**

The switch logics represented in Fig. 3.6 make switch pulses for switches in pulse-width detector when the input pulse comes. The sw signal becomes 0 when pulse comes and turns on the PMOS switch. The sw\_dqs and sw\_clk signal becomes 0 when pulse comes in DQS mode and CLK mode, respectively. The dclk signal acts as sampling clock of flip-flop in comparator for storing output data. After charging finished, the output data is stored and then reset begins. To make delay between finishing charging and storing data, and storing data and reset, there are buffers between sw and dclk signals, and dclk and rst<sub>0</sub> signals.



Fig. 3.6 Detailed circuit of switch in pulse-width detector.

#### 3.1.5 Simulation of Pulse-width Detector



<sup>(</sup>b)

Fig. 3.7 (a) The schematic using one switch in DQS mode and (b) the histogram of capacitor voltage difference in CLK and DQS mode.



(b)

Fig. 3.8 (a) The schematic using four switches in DQS mode and (b) the histogram of capacitor voltage difference in CLK and DQS mode.

The Monte-Carlo simulation of the pulse-width detector is performed using the

PEX netlist. In CLK mode, the capacitor is charged for the period of the clock signal. In DQS mode, the capacitor is charged for one quarter period of the clock signal. To see the effect of the number of the switches, one simulation is performed with one switch turned on in DQS mode as shown in Fig. 3.7(a) and the other simulation is performed with four switches in DQS mode as shown in Fig. 3.8(a). The difference of the capacitor voltage (Vc) in CLK and DQS mode over 200 sweeps is plotted in Fig. 3.7(b) and Fig. 3.8(b), respectively. Ideally, there should be no voltage difference. When using one switch in DQS mode, the average of voltage difference is 35.38 mV and the standard deviation is 5.23 mV. However, when using four switches in DQS mode, the average of voltage difference is -4.37 mV and standard deviation is 4.2 mV. Therefore using 4 switches works more as intended.

#### **3.2 Pulse Generator**

The structure of pulse generator is shown in Fig. 3.9. The pulse generator consists of a CLK pulse generator, a DQS pulse generator and a glitch-free circuit. The CLK pulse generator consists of a divide-by-two circuit and the DQS pulse generator consists of two MUXs and a logic gate as shown in Fig. 3.10. In CLK mode, the clock signal is divided by 2 to create the clk\_pulse whose pulse-width is the period of the clock signal. In DQS mode, a pair of two neighboring quadrature DQS signals is selected and a dqs\_pulse with a pulse-width corresponding to the phase difference of adjacent DQS signal is generated. Since the delay of the inverter is added to the path of one DQS signal, a transmission gate is added to the path of the other DQS signal to match delay.



Fig. 3.9 The diagram of pulse generator.



Fig. 3.10 The schematic of (a) DQS pulse generator (b) CLK pulse generator.

The output of the pulse generator should not be glitched when CLK and DQS modes are changed to each other, so the glitch-free circuit shown in Fig. 3.11(a) is introduced. Its structure is a MUX with select signal retimed to negative edge of each pulse. As the waveform in the Fig. 3.11(b), when selection signal (sel) is off, clk\_pulse is selected. If the sel is on without a glitch-free circuit, the glitch can occur like above waveform. However, the glitch-free circuit outputs the selected signal, dqs\_pulse if clk\_pulse is falling and then the dqs\_pulse is falling.





(b)

Fig. 3.11 (a) The schematic and (b) timing chart of glitch-free circuit.

#### **3.3 Digitally Controlled Delay Lines (DCDL)**

Fig. 3.12 shows the block diagram of DCDL. To cover the wide error correction range, DCDL employs NAND chain coarse delay lines merged with MOSCAP fine delay lines. In coarse delay lines, A NAND chain and a MUX are used. A NAND chain rather than an inverter chain is used to turn off the NAND gates behind the signal selected from the MUX to save the power consumption. Coarse delay lines and fine delay lines use a 15-bit thermometer code and coarse delay



Fig. 3.12 The schematic of DCDL.



lines use an additional 16-bit selection signal for MUX.

Fig. 3.13 The delay range and DNL curve versus DCDL control code in (a), (b) Monte Carlo simulation (c), (d) Corner simulation.

The simulation results with post-layout netlists of DCDL are shown in Fig. 3.13. The total delay from input to output signal and DNL versus DCDL control code are plotted. Since the resolution of coarse delay line and delay range of fine delay line cannot be always same, the resolution of coarse delay lines is smaller than the delay range of fine delay lines to cover all delays in delay range. The average resolution and delay range of DCDL is 0.91 ps and 153 ps, respectively from the result of Monte Carlo simulation with post-layout netlists. The average resolution and delay range of DCDL is 0.92 ps and 169 ps, respectively from the result of Corner simulation with post-layout netlists.

## 3.4 DIV8 & 3-bit Counter



(a)



(b)

Fig. 3.14 The schematic of (a) DIV8 and (b) 3-bit counter.

To reduce digital power, a divided clock is used for digital loop filter and 1-bit output of pulse-width detector is accumulated into 3-bit data. The DIV8 block and 3-bit counter are shown in Fig. 3.14(a), (b). DIV8 block divides pulse by 8, clk\_pulse in CLK mode and dqs\_pulse in DQS mode and generates clk\_lf, clock of digital loop filter. The 3-bit counter counts d, the output of the pulse-width detector eight times when pulse is on the rising edge and sends the 3-bit output, dout[2:0] to the digital loop filter. Fig. 3.15 represents timing diagram of DIV8 and 3-bit counter. The reset signal of 3-bit counter, rst\_d is generated from DIV8 block. 3-bit output data is synchronized to negative edge of pulse to avoid setup and hold time violation to clk\_lf.



Fig. 3.15 The timing diagram of DIV8 & 3-bit counter.

### **3.5 Digital Loop Filter**



Fig. 3.16 Flowchart of digital loop filter.

The role of digital loop filter is to update the IDAC control code in CLK mode and DCDL control code in DQS mode. The overall flowchart of digital loop filter is shown in Fig. 3.16. Since the IDAC control code directly affects the DCDL control code, the SAR algorithm is applied to quickly find optimal amount of current. Afterwards, the digital filter updates the DAC control code through the majority vote method. Next, DCDL control codes responsible to delay of Q, IB, and QB signal are updated in order in a majority vote method. After, the whole process is repeated until the calibration is off.

## Chapter 4

# **Simulation Results**

## **4.1 Test Circuits**

#### 4.1.1 DQS generator

The DQS generator in Fig. 4.1(a) is comprised of four D flip-flops and four DCDLs. The DQS generator receives a data and a clk\_4x that is 4 times faster than the clock from outside the chip and generates quadrature DQS signals. As shown in Fig. 4.1(b), four D flip-flops generate quadrature signals by sampling an input at rising edge of clk\_4x. To give arbitrary phase skew, these four signals are delayed according to control code of DCDL, respectively.



Fig. 4.1 (a) The schematic and (b) waveform of DQS generator.

#### 4.1.2 Sampler

The schematic of sampler is shown in Fig. 4.2. To accurately measure the phase error, sampler outputs a magnified signal whose frequency is the beat frequency(= $|f_{clk\_sampler}-f_{dqs\_in,dqs\_out}|$ ). The dqs\_in[3:0] and dqs\_out[3:0] are DQS signals before correction and after correction, respectively. The enable signals, en[7:0] are one-hot code that enable one of eight inputs, the dqs\_in[3:0] and dqs\_out[3:0] to be sampled when clk\_sampler is on. This is a method using equivalent-time sampling.



Fig. 4.2 The schematic of sampler.

#### 4.1.3 Test MUX

The circuit diagram of test MUX is shown in Fig. 4.3. The test MUX takes dqs\_in[3:0], dqs\_out[3:0] as inputs and sends only one signal to the output pad according to the selection signals. Jitter can be measured through output of test MUX. The mismatch of MUX4 can be corrected by selecting clk as input of MUX3 and comparing the output.



Fig. 4.3 The diagram of test MUX.

## 4.2 Chip layout

The prototype chip was fabricated in a 40nm GP CMOS process. The layout of analog part in the proposed QEC and the whole chip layout of the proposed QEC are shown in Fig. 4.4 and Fig. 4.5. The area of total chip including pad is about 0.8 mm<sup>2</sup>, and analog part is 100.275  $\mu$ m wide and 41.8  $\mu$ m long, with an area of 0.004 mm<sup>2</sup>. The digital part of the QEC is spread in the area of 170  $\mu$ m wide and 140  $\mu$ m long. The active area of the proposed QEC is 0.01 mm<sup>2</sup>. There is also an I2C block for giving control signals.



Fig. 4.4 Chip layout of analog part of the proposed QEC.



Fig. 4.5 Chip layout of the proposed QEC.

### **4.3 Simulation Results**

The simulation is performed using post-layout netlists. The waveforms of the input quadrature signals with phase errors and the corrected quadrature signals in burst mode and seamless mode at 1.6 GHz are shown in Fig. 4.6. The ideal phase



Fig. 4.6 The waveforms of DQS before and after correction (a), (b) in burst mode and

(c) in seamless mode.

|                   | I-Q              | Q-IB            | IB-QB           |
|-------------------|------------------|-----------------|-----------------|
| before correction | 41.87 <u>ps</u>  | -20.62 ps       | -40.63 ps       |
| (a)               | -3.326 ps        | -2.108 ps       | 0.575 <u>ps</u> |
| (b)               | -2.659 <u>ps</u> | 1.219 <u>ps</u> | 4.625 <u>ps</u> |
| (c)               | -2.452 ps        | -3.416 ps       | 3.034 <u>ps</u> |

Table 4.1 Phase error before and after correction in Fig. 4.1.

difference is 156.25 ps, which is 1/4 of the clock period, 625 ps. The phase error is calculated by subtracting the phase difference measured through simulation from the ideal phase difference, 156.25 ps. The phase errors before and after correction in burst mode and seamless mode are summarized in Table 4.1.

The tracking behavior IDAC and DCDL control codes are shown in Fig. 4.7. The IDAC control code is adjusted by majority vote method after SAR algorithm. Four DCDL control codes are sequentially adjusted in a majority vote method to reduce phase error of DQS signals.



Fig. 4.7 Tracking behavior of IDAC and DCDL control codes.



(b)

Fig. 4.8 Phase sweep simulation of DQS (a) in burst mode and (b) in seamless mode.

Fig. 4.8 shows the results of input phase sweep simulation. One signal of quadrature DQS signals is swept and the phase error of QEC's output signals is plotted. Input correctable phase range is from -75 ps to 75 ps in both modes. In burst mode, the output phase error ranges from -8.69 ps to 2.47 ps in correctable input phase range. In seamless mode, the output phase error ranges from -7.21 ps to 6.68 ps. The proposed circuit corrects the phase error to less than 8.69 ps for all cases.

The maximum correctable input phase error versus the frequency of DQS signals is shown in Fig. 4.9. As the frequency of DQS signals increases, the correctable input phase error decreases. The higher the frequency of DQS signals, the smaller the width of dqs\_pulse whose pulse-width is phase difference of two neighboring DQS signals. The clock of digital loop filter generated from dqs\_pulse may narrow or even disappear and affects the operation of the QEC.



Fig. 4.9 Maximum input phase error versus frequency of DQS.

The power consumption of the proposed QEC in various modes is summarized in Table 4.2 and Fig. 4.10. When QEC is off, only DCDL operates and power can be saved. In burst mode, the power is reduced because DQS signals do not always toggle. The total power consumption is 2.42 mW, where digital loop filter and other blocks consume 0.53 mW and 1.89 mW, respectively.

|            |                                                                   | seamless | burst bl=32 | burst bl=8 |
|------------|-------------------------------------------------------------------|----------|-------------|------------|
| QEC<br>ON  | DCDL+Pulse<br>generator+DIV8&3-bit<br>counter+Pulsewidth detector | 1.887mW  | 1.106mW     | 0.712mW    |
|            | Digital Loop Filter                                               | 0.532mW  | 0.438mW     | 0.390mW    |
|            | Total                                                             | 2.419mW  | 1.544mW     | 1.102mW    |
| QEC<br>OFF | DCDL+Pulse<br>generator+DIV8&3-bit<br>counter+Pulsewidth detector | 0.936mW  | 0.510mW     | 0.271mW    |
|            | DLF                                                               | 0.330mW  | 0.329mW     | 0.328mW    |
|            | Total                                                             | 1.266mW  | 0.839mW     | 0.599mW    |

Table 4.2 Power consumption of the proposed QEC in various modes.



Fig. 4.10 Power consumption of the proposed QEC in various modes.

### **4.4 Performance Summary**

The performance of the proposed quadrature error corrector is summarized and compared with the recent state-of-the-art phase correctors in Table 4.3. The proposed QEC is the only structure which can correct quadrature DQS signals. It can correct wide range of input phase error,  $\pm 75$  ps. Although the maximum phase error after correction is large, it is enough to sample the data at 1.6 GHz.

|                                              | This work                      | ISSCC'20                      | TVLSI'19              | TCAS2'17    |
|----------------------------------------------|--------------------------------|-------------------------------|-----------------------|-------------|
| Technology                                   | 40nm                           | 40nm                          | 55nm                  | 65nm        |
| VDD(V)                                       | 1.1                            | 1.1                           | 1.2                   | 1           |
| Architecture                                 | Digital DLL+I&F                | Digital DLL                   | Relaxation oscillator | Digital DLL |
| DQS correction                               | О                              | Х                             | Х                     | Х           |
| Clock frequency<br>(GHz)                     | 1.0-2.0                        | 0.8-2.3                       | 1-3                   | 1.25        |
| Correctable input phase error range          | 88° (+-75ps)                   | 84.1° @2.3 GHz                | <27°                  | 8.7°        |
| Maximum phase<br>error after<br>correction   | 5.01°<br>(-8.69ps,6.68ps)      | <2.18°                        | 1.1°                  | 0.48°       |
| Power (mW)                                   | 2.42(cal_on)/<br>1.26(cal_off) | 8.89(cal_on)/<br>3.9(cal_off) | 2.08                  | 2.27        |
| Area (mm <sup>2</sup> )                      | 0.01                           | 0.0428                        | 0.003                 | 0.01        |
| Energy<br>consumption per<br>unit pulse (pJ) | 1.513/0.788                    | 3.865/1.696                   | 0.693                 | 1.816       |

 Table 4.3 Performance summary and comparison with recent state-of-the-art

 quadrature error correctors.

## **Chapter 5**

# Conclusion

In this thesis, the quadrature error corrector which can correct phase error of quadrature DQS signals is proposed using the capacitor charging method. The quadrature error corrector uses 4:1 ratio capacitor and detects 1/4 point of the clock period. The quadrature error is corrected by adjusting delay lines using information from the phase error detector. This thesis suggests the design of pulse-width detector considering capacitor mismatch, charge sharing and channel-length modulation. The proposed quadrature error corrector is designed in a 40-nm CMOS process and achieves a phase error of less than 8.69 ps while consuming maximum power of 2.42 mW at a frequency of 1.6 GHz. After the correction is done, the quadrature error corrector can be off while fixing the delay of delay lines. This design is the only structure than can apply to aperiodic DQS signals.

## **Bibliography**

- [1] C. Oh, et al., "A 1.1V 16GB 640GB/s HBM2E DRAM with a Data-Bus Window-Extension Technique and a Synergetic On-Die ECC Scheme," in *IEEE ISSCC Dig. Tech. Papers*, pp. 330-332, Feb. 2020.
- [2] Y. Kim, et al., "A 2.3-mW 0.01-mm<sup>2</sup> 1.25-GHz Quadrature Signal Corrector With 1.1-ps Error for Mobile DRAM Interface in 65-nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, pp. 397-401, Apr. 2017.
- [3] S. Shin, et al., "A 0.8-to-2.3GHz Quadrature Error Corrector with Correctable Error Range of 101.6ps Using Minimum Total Delay Tracking and Asynchronous Calibration On-Off Scheme for DRAM Interface," in *IEEE ISSCC Dig. Tech. Papers*, pp. 340-342, Feb. 2020.
- [4] J. Chae, et al., "A Quadrature Clock Corrector for DRAM Interfaces, With a Duty-Cycle and Quadrature Phase Detector Based on a Relaxation Oscillator," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, pp. 978-982, Apr. 2019.
- [5] H. Ko, et al., "A 3.2-GHz Quadrature Error Corrector for DRAM Transmitters, Using Replica Serializers and Pulse-Shrinking Delay Lines," *IEEE Solid-State Circuits Letters*, pp. 38-41, Feb. 2020.
- [6] H. Ko, et al., "A 370-fJ/b, 0.0056mm<sup>2</sup>/DQ, 4.8-Gb/s DQ Receiver for HBM3 with a Baud-Rate Self-Tracking Loop," in *Symp. VLSI Circuits*, pp. C94-C95, Jun. 2019.
- [7] S. Lee, et al., "A 0.83-pJ/Bit 6.4-Gb/s HBM Base Die Receiver Using a 45° Strobe Phase for Energy-Efficient Skew Compensation," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, pp. 1735-1739, Oct. 2020.
- [8] Y. Lee, T. Kang and J. Kim, "A 9-11-Bit Phase-Interpolating Digital Pulsewidth Modulator With 1000x Frequency Range," *IEEE Trans. Industry Applications*, pp. 3376-3384, Mar. 2015.

- [9] H. Liu, et al., "A Sub-mW Fractional-N ADPLL With FOM of -246 dB for IoT Applications," *IEEE J. Solid-State Circuits*, pp. 3540-3552, Nov. 2018.
- [10] M. Choi, et al., "A 110nW Resistive Frequency Locked On-Chip Oscillator with 34.3 ppm/°C Temperature Stability for System-on-Chip Designs," *IEEE J. Solid-State Circuits*, pp. 2106-2118, Jul. 2016.
- [11] J. Kim, "On-Chip Measurement of Jitter Transfer and Supply Sensitivity of PLL/DLLs," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, pp. 449-453, May. 2009.
- [12] R. Ho, et al., "Applications of On-Chip Samplers for Test and Measurement of Integrated Circuits," in *Dig. Symp. VLSI Circuits*, pp. 138-139, Jun. 1998.
- [13] K. Lee, et al., "A 1.5-V 3.2 Gb/s/pin Graphic DDR4 SDRAM With Dual-Clock System, Four-Phase Input Strobing, and Low-Jitter Fully Analog DLL," *IEEE J. Solid-State Circuits*, pp. 2369-2377, Oct. 2007.
- [14] K. Chun, et al., "A 16-GB 640-GB/s HBM2E DRAM With a Data-Bus Window Extension Technique and a Synergetic On-Dic ECC Scheme," *IEEE J. Solid-State Circuits*, pp. 199-211, Oct. 2020.

초 록

고대역폭 메모리(High Bandwidth Memory)의 속도가 빨라지면서 쿼드 러쳐 데이터 스트로브(DQS) 신호의 스큐가 내부 동작에 영향을 미치기 시작한다. 이전에는 메모리에서 내보내는 쿼드러쳐 클락 신호의 스큐 만 수정하면 되었다. 이전에 제안된 쿼드러쳐 에러 교정기는 주기적인 클락 신호에만 적용 가능하며 비주기적인 데이터 스트로브 신호에는 적용할 수 없다. 따라서 데이터 스트로브 신호의 위상 에러를 교정하 기 위한 새로운 회로가 필요하다.

본 논문에서는 HBM3 에서 데이터 스트로브 신호의 위상 에러를 수 정할 수 있는 쿼드러쳐 에러 교정기의 설계 방법을 제안한다. 제안하 는 쿼드러쳐 에러 교정기는 동일한 주파수의 클락 신호를 사용하여 커 패시터 충전 방식으로 비주기적인 신호를 교정할 수 있다. 쿼드러쳐 에러 교정기는 4:1 비율의 커패시터를 사용하여 데이터 스트로브 신호 의 위상 차이가 클락 주기의 1/4 인지 여부를 감지한다. 쿼드러쳐 에러 는 위상 에러를 감지한 정보를 이용하여 디지털 제어 딜레이 라인 (digitally controlled delay line)을 통해 보정된다. 보정 후에는 전력 소모를 줄이기 위해 디지털 제어 딜레이 라인만 동작시키는 것이 가능하다. 40 나노미터 공정으로 구현되었으며, 시뮬레이션 결과 1-2 GHz 에서 동작 가능하며 1.6 GHz 와 1.1 V 공급 전원으로 동작하였을 때 최대 2.42 mW 의 전력을 소비하며 보정 결과 8.69 ps 이하의 오류를 나타낸다. 주요어 : 쿼드러쳐 에러 교정기, 클락 스큐, 데이터 스트로브, 커페 시터 충전, 저전력, 소면적.

학 번 : 2020-27297