





Master's Thesis

# A 3.5 Gsymbol/lane Receiver Design for MIPI C-PHY Layer v2.0

Khamzat Nugmanuly

Department of Electrical Engineering

Ulsan National Institute of Science and Technology

2022



# A 3.5 Gsymbol/lane Receiver Design for MIPI C-PHY Layer v2.0

Khamzat Nugmanuly

Department of Electrical Engineering

Ulsan National Institute of Science and Technology



1

# A 3.5 Gsymbol/lane Receiver Design for MIPI C-PHY Layer v2.0

A thesis/dissertation submitted to Ulsan National Institute of Science and Technology in partial fulfillment of the requirements for the degree of Master of Science

Khamzat Nugmanuly

06.16.2022 of submission

Approved by

Advisor

Jae Joon Kim



nj T

# A 3.5 Gsymbol/lane Receiver Design for MIPI C-PHY Layer v2.0

Khamzat Nugmanuly

This certifies that the thesis/dissertation of Khamzat Nugmanuly is approved.

06.16.2022 of submission

Signature

Advisor: Jae Joon Kim

Signature

2

Myunghee Lee

Signature 15 111

Heein Yoon



#### Abstract

Semiconductor process technologies are the backbone of information driven era, where one can have an access to immense amount of data on a daily basis. As an amount of information continually increases, demand for advanced technology nodes follows the similar trend by evolving down to 3 nm process in 2022 [1]. Need for more information directly correlates with a need for higher speed of communication between data canters and clients. Higher operating speed brings up requirement for high power and surgical precision into a play. This power and speed trade-off can be the limiting factor in many systems, and designing highspeed system while maintaining low or moderate power consumption requires engineers to invent more elaborate schemes that employ the above trade-off in the most efficient way. Other crucial aspects such as noise, communication efficiency, budget and area are also parameters in power and speed trade-off function that should be taken into careful consideration. Therefore, bridge systems are being developed in an effort to deliver vast of amount information between two or more communicating system, meanwhile aiming to optimize the aforementioned parameters.

Recently, a great deal of researches has been conducted to implement interfaces that provides high throughput and performance over bandlimited communication mediums. Due to manufacturing cost [2], designed systems architecture should be standardized to allow cross-compatibility over various devices. One of the such industry standard interface systems that found use in smartphones, smart watches, smart meters, video game devices, etc., is Mobile Industry Processor Interface Display/Camera Serial Interface (MIPI DSI/CSI). MIPI DSI/CSI enables high performance, low power solution while ensuring interoperability across different vendors.

This thesis presents design of front end at 7.98 Gb/s for C-PHY (MIPI DSI/CSI physical layer) serial interface in TSMC-28 nm HPC CMOS technology. High-speed front end consists of termination resistor (RT), continuous time linear equalizer (CTLE), high-speed receiver (HSRX), clock and data recovery (CDR) circuit, decoder (DEC) and 7x21 de-serializer (DESER). RT block employs parallel trimming technique to ensure operation across PVT corners. Active CTLE improves signal integrity and accommodates trimming option to allow operation with different channel lengths. In order to recover the clock embedded into signal according to the C-PHY specification, CDR block is designed. DEC decodes output signals from HSRX in a fashion consistent with C-PHY specifications. As a result, analog frond end achieves less than 0.2 pJ/but efficiency with 0.9 V supply voltage.





## Contents

| I. Introduction1                        |
|-----------------------------------------|
| 1.1 Motivation and Background1          |
| 1.2 Overview of SerDes High-speed Link  |
| 1.3 MIPI PHY Layer Overview             |
| 1.4 Thesis Structure7                   |
| II. C-PHY High-Speed Operation Overview |
| 2.1 C-PHY Motivation and Overview       |
| 2.2 3-Wire Wire States                  |
| 2.3 Decoding                            |
| 2.4 Data Reception and Decoding         |
| 2.5 Clock recovery concept in C-PHY 12  |
| 2.6 HS and LP mode Line Levels          |
| III. High-Speed Receiver Lane Design13  |
| 3.1 Termination Requirements and Design |
| 3.2 Signal Integrity                    |
| 3.3 Equalizer and Receiver Design       |
| 3.4 Clock Recovery                      |
| 3.5 Decoder                             |
| 3.6 Top HS-RX Lane                      |
| IV. Simulation and Experimental Results |
| 4.1 Simulation Results                  |
| 4.2 Experimental Results                |
| V. Summary                              |
| 5.1 Conclusion                          |
| 5.2 Future Works                        |
| REFERENCES                              |



### **List of Figures**

Figure 1.1 (a) Emerging technologies (b) IP traffic forecast (c) number of devices forecast

Figure 1.2 SerDes link general structure

Figure 1.3 (a) Transmission Line (b) wave reflection & standing waves (c) KMM cable insertion loss (d)

signal attenuation & distortion

Figure 1.4 (a) Common clock (b) forwarded clock (c) embedded clock

Figure 1.5 MIPI multimedia specifications

Figure 1.6 (a) C-PHY and D-PHY example (b) C-PHY and D-PHY applications depending on resolution

Figure 2.1 C-PHY Universal Architecture

Figure 2.2 (a) Six PHY Wire States (b) A, B, C lines signal voltages (c) Receiver output depending on Wire States

Figure 2.3 Wire State diagram

Figure 2.4 Process of transmitting and receiving 16-bit data

Figure 2.5 Example of data transmission and clock recovery

Figure 2.6 HS and LP Line levels

Figure 3.1 Transmission line model

Figure 3.2 Open and Short transmission line terminations

Figure 3.3 Designed Termination Front-End

Figure 3.4 Nonidealities in transmission medium

Figure 3.5. ISI effect on channel pulse response

Figure 3.6 Eye diagram construction of ideal and non-ideal signals

Figure 3.7 Jitter Histogram and Sources

Figure 3.8 BER relation to jitter type

Figure 3.9 Comparison of channel pulse response to signal w/o FFE and w/ FFE

Figure 3.10 FFE architecture based on FIR filter

Figure 3.11 (a) DFE architecture (b) waveform equalization effect of DFE

Figure 3.12 (a) Passive CTLE (b) Frequency response

Figure 3.13 (a) Active CTLE (b) Design equations (c) Frequency response

Figure 3.14 MIPI C-PHY v2.0 reference channel differential insertion losses

Figure 3.15 High-speed front-end receiver

Figure 3.16 (a) Schematics (b) Waveform example

Figure 3.17 (a) Receiver back-end diagram (IEEE/TCAS 2020) (b) waveform examples

Figure 3.18 Wire State Decode (a) truth table (b) logic diagram



Figure 3.19 Symbol Decoder (a) truth table (b) logic diagram

- Figure 3.20 Back-end of HS-RX Lane
- Figure 3.21 Top diagram of HS-RX Lane
- Figure 4.1 MIPI C-PHY single lane layout
- Figure 4.2 (a) Recovered clock eye diagram (b) CDR waveforms (c) Decoder waveforms
- Figure 4.3 (a) De-serializer clock and shift register waveforms (b) synchronization clock waveform
- Figure 4.4 (a) Die photography (b) Top layout picture
- Figure 4.5 PHY only chip test setup
- Figure 4.6 HSRX output waveform (a) at 1.5 GS/s (b) 2.0 GS/s (c) 3.0 GS/s (d) 3.5 GS/s

Figure 4.7 (a) HSRX output at 3.5 GS/s (b) Clock output at 3.5 GS/s (c) Clock output at 4.5 GS/s



## List of Tables

- Table 1.1 Comparison of different architectures
- Table 2.1 Truth table for WS Decoding
- Table 3.1 Resistor value range
- Table 3.2 Equalizers comparison
- Table 3.3 Required Specification
- Table 3.4 CTLE and comparator design procedure
- Table 3.5 Device value and size information
- Table 3.6 FLIP, ROTATE, POLARITY symbol truth table
- Table 4.1 Termination Resistor corner simulation results
- Table 4.2 Extracted simulation results for HS-RX receiver



## Nomenclature

| IP     | Internet Protocol                       |
|--------|-----------------------------------------|
| CAGR   | Compound Annual Growth Rate             |
| EB     | Exabyte                                 |
| SerDes | Serializer-De-Serializer                |
| MIPI   | Mobile Industry Processor Interface     |
| DSI    | Display Serial Interface                |
| CSI    | Camera Serial Interface                 |
| РНҮ    | Physical Layer                          |
| RT     | Resistor Termination                    |
| CTLE   | Continuous Time Linear Equalizer        |
| HSRX   | High Speed Receiver                     |
| HSTX   | High Speed Transmitter                  |
| CDR    | Clock Data Recovery                     |
| PVT    | Process Voltage Temperature             |
| DEC    | Decoder                                 |
| DESER  | De-serializer                           |
| EM     | Electro Magnetic                        |
| VESA   | Video Electronics Standards Association |
| EDP    | Electronic Display Port                 |
| RX     | Receiver                                |
| ТХ     | Transmitter                             |
| PLL    | Phase Locked Loop                       |
| EMI    | Electro-Magnetic Interference           |
| LP     | Low Power                               |
| CD     | Contention Detector                     |
| WS     | Wire State                              |
| SLVS   | Scalable Low Voltage Signaling          |
| UI     | Unit Interval                           |
| ENC    | Encoder                                 |
| HS     | High Speed                              |
| UB     | Unit Block                              |



| Signal Integrity                     |
|--------------------------------------|
| Intersymbol Interference             |
| Random Jitter                        |
| Deterministic Jitter                 |
| Bit Error Rate                       |
| Feed-Forward Equalization            |
| Finite Impulse Response              |
| Decision Feedback Equalizer          |
| Analog-to-Digital Converter          |
| Input Common Mode                    |
| Flip-Rotate-Polarity                 |
| Proportional to Absolute Temperature |
|                                      |



### I. Introduction

#### **1.1 Motivation and Background**

The expansion of communication network influenced human life in a way that information transaction between a person and whole world become the norm in every aspect of lifestyle. From the demand for more information emerges new technologies such as smartwatches, virtual reality (VR) and augmented reality (AR) shown in Figure 1.1(a). Forecasted data shown in Figure 1.1(c), implies that number of devices will reach approximately 28 billion in 2022, and estimated to be kept growing with the rate of 10%. As the request for information and advanced gadgets rises, required resolution and data-rate will have the same tendency to increase. According Figure 1.1(b) from Cisco Visual Networking Index, Internet Protocol (IP) traffic was growing with Compound Annual Growth Rate (CAGR) of 26% between 2017 and 2022 from 122 to 396 EB (EB: exabyte =  $10^{18}$  byte) per month accordingly [3].



Figure 1.1 (a) Emerging technologies (b) IP traffic forecast (c) number of devices forecast

To accommodate for high-speed and high data-rate demand, Serializer/De-serializer (SerDes) links often employed between communicating modules. For example, USB4 requires 40 Gbit/s communication speed [4], HDMI 2.1 streams with data-rate of up to 48 Gbit/s [5] and Ethernet data transfer rates can achieve 400 Gbit/s. Figure 1.2 illustrates physical layer of such SerDes link. Peripheral devices are usually main processors and sensors that sends and receives data of special targeted kind ranging from audio/video streams to ethernet communication. High-speed transmitter (HSTX) and HSRX pair, are the central elements that allows high-speed serial communication to take place through non-ideal channel.





Figure 1.2 SerDes link general structure

Even though semiconductor technology advance allows high-speed signalling [6], still careful measures have to be taken when dealing with over giga-hertz frequencies. Major hold back at such high frequencies are transmission line effects, since Electro-Magnetic (EM) waves above 1 GHz has maximum wavelength of 30 cm which is comparable to the size of modern cables. Therefore, lumped circuit model fails to fully represent the circuit operation, and transmission line theory should be utilized to accurately meet the design specifications. Reflection due to load mismatch, causes overshoot and standing waves, insertion loss attenuates and distorts the signal. These consequences are illustrated in Figure 1.3. HSTX and HSRX and front-end blocks are designed in way to alleviate these issues are required.



Figure 1.3 (a) Transmission Line (b) wave reflection & standing waves (c) KMM cable insertion loss (d) signal attenuation & distortion



Manufacturing process is getting more convoluted and requires advanced technologies, hence increasing in cost is unavoidable. Cross-compatibility is necessary to take a full advantage of manufacturing process and keep the budget limit as low as possible. Since, SerDes links are used extensively, ensuring its interoperability is then a must have requirement. So as to achieve this, several industry renown standards exist, which gives engineers set of rules and design specifications to follow and meet, for correct operation across different vendors. Video Electronics Standards Association (VESA) [7], which defines technical standards for Electronic Display Port (EDP) used in computer displays and Mobile Industry Processor Interface (MIPI) [8], that provides camera and display interfaces (CSI/DSI) using various PHY layers are some of them. Focus of this thesis is the C-PHY layer standardized by MIPI, that enables low power, high throughput solution as a SerDes serial link, by alleviating abovementioned issues.

#### **1.2 Overview of SerDes High-speed Link**

SerDes link is a vastly used design blocks that are used for high-speed communication through bandlimited channels. General architecture consists of Ser and Des functional blocks on physical layer peripheries and RX and TX blocks for serial data transmission. Termination resistors are required to minimize the wave reflection at the both ends of channel. Other necessary blocks are employed depending on data-rate, channel length, application purposes. Ignoring protocol layers and its application, SerDes structure can be divided into three main categories depending on clocking schemes. Common clock architecture [9] uses the same reference clock for both ends. Afterwards Phase Locked Loops (PLL) are used at each side for further clock generation. Important drawback of this architecture, is the sensitivity of  $t_1$  and  $t_2$ , therefore tight control of length and hence delay is required. Despite that, common clock is widely used in on-chip communication where single clock is used to sample the data. Different propagation delay problem is avoided using forwarded clock architecture [10]-[12] where clock is transmitted along with data, therefore enabling high-speed chip to chip communication. There are still skew problems between clock signals and data signals due to loading or interconnect length mismatches and de-skewing techniques [13] are used to align data and clock in a correct phase relation. Furthermore, forwarded clock architecture requires one more lane for clock transmission. Embedding clock [14] into a data, and using coding schemes, will further increases data throughput, and eliminates clock lane emission and additional lane. One of such data codes are 8b/10b encoding and Manchester encoding. In both cases, original data is modified in a such way that there is no DC component and further clock recovery is possible. At the receiver side, CDR [15]-[17] circuit are used to extract clock out of the data, and DEC block reverses the data to its original state. Simplified block diagrams of three architectures are illustrated in Figure 1.5.



| Туре       | Common clock                        | Forwarded clock                                                                               | Embedded Clock                                           |
|------------|-------------------------------------|-----------------------------------------------------------------------------------------------|----------------------------------------------------------|
| Operation  | Same clock                          | Clock lane usage                                                                              | Clock is coded into serial data                          |
| Usage      | - Low speed(~100Mb/s)<br>- On chip  | <ul> <li>Processor-memory</li> <li>Camera</li> <li>Display</li> <li>Drones, robots</li> </ul> | - Ethernet<br>- Camera<br>- Displays<br>- Drones, robots |
| Advantages | - Simple circuitry                  | - High-speed                                                                                  | - High-speed<br>- Low EMI                                |
| Problems   | - Delay mismatch<br>- Limited speed | <ul> <li>Requires clock lane</li> <li>Clock lane EMI</li> <li>De-skewing circuit</li> </ul>   | - CDR circuitry<br>- Coding (if required)                |

Table 1.1 Comparison of different architectures



Figure 1.4 (a) Common clock (b) forwarded clock (c) embedded clock



### **1.3 MIPI PHY Layer Overview**

PHY layer is the lowest hardware layer in communicating network, hence SerDes circuit, cable, and other electrical, mechanical interface are part of this layer. Standardized electrical specifications for PHY layer, helps companies and vendors to create their own unique design, while maintaining cross-compatibility and high-volume low-cost production. MIPI offers several types of PHY interface layer, depending on channel bandwidth, data throughput, power consumption range, and targeted mobile-driven devices. A-PHY is specifically targeted to driver assistance systems, autonomous cars, and other related sensors like lidar, radar, cameras or displays in vehicles and can provide 4 Gb/s data rate with a reach of 15 meters with high noise immunity. M-PHY interface is purposed for accessing latest generation flash memory storages. And widely used D-PHY and C-PHY are high-speed layers for interfacing the smartphone cameras, displays, smartwatches, robots and surveillance device sensors with data-rate of up to 9 and 13.68 Gb/s respectively. Due to their electrical specifications, pin counts, and application, D-PHY and C-PHY can be exploited in combination with each-other, therefore providing increased flexibility. Figure 1.6 fully demonstrated the MIPI alliance multimedia ecosystem. Despite clocking difference, D-PHY and C-PHY shares some design blocks and can be designed to have same interface, making combined design possible [18].



Figure 1.5 MIPI multimedia specifications



D-PHY can be attributed to the forwarded clock link type, therefore uses additional lane for clock signalling. C-PHY on the other hand is specified to be an embedded clock architecture. Complex PLL based CDR architecture is avoided in C-PHY due to utilization of 3 wires for high-speed serial data transmission and data coding scheme. Therefore, relatively straightforward CDR circuit is enough to ensure clock recovery, meanwhile making clock lane obsolete and solving clock EMI problem. Usage of 3 wires not only helps with clock recovery, but more bits of information can be sent per unit time thanks to the coding scheme, therefore decreasing the data toggle rate while increasing the data-efficiency (mW/GB/s). For example, to achieve 4 GB/s data-rate D-PHY might use 4 data lanes with 1 GB/s each and 1 clock lane in total. For the same 4 GB/s data-rate C-PHY can use 2 lanes only with 0.875 GB/s data-rate, but due to the coding schemes full date-rate is equal to 4 GB/s. Overall C-PHY reaches the same results with 60% fewer lanes, 40% fewer data lines, alleviating clock EMI, and reducing data toggle rate. Figure 1.7(a) illustrates the above example and Figure 1.7(b) shows C-PHY and D-PHY application for different resolution settings.



Figure 1.6 (a) C-PHY and D-PHY example (b) C-PHY and D-PHY applications depending on resolution



#### **1.4 Thesis Structure**

The main objective of this thesis is to design 3.5 GS/s C-PHY layer oriented toward high power efficiency, by following electrical and timing specifications given in MIPI C-PHY standard version 2.0. Thesis divided into 5 chapters, with the first one being an introduction.

- Chapter II. C-PHY High-Speed Operation Overview. Throughout explanation of C-PHY high-speed operation, 3 wire data transmission and clock recovery with successive data decoding.
- Chapter III. High-Speed Receiver Lane Design. Approach on designing C-PHY high-speed lane, and key building blocks for high-speed data receiving. Design and functions of RT, CTLE, HSRX, CDR, DEC, DESER are described.
- Chapter IV. Simulation and Experiment Results. Presents the simulation and prototype chip testing results, with description, comparison, and explanation for observed diversions.
- Chapter V. Summary. Concludes the whole thesis and gives additional insights on future improvements.



### **II. C-PHY High-Speed Operation Overview**

#### 2.1 C-PHY Motivation and Overview

C-PHY layer featured to provide flexible high throughput performance over band limited channel with low power consumption and low EMI for interfacing camera, displays in mobile applications especially for new generation of pixel cell-based displays. Embedded clock architecture, allows C-PHY flexibility through more optimized lane placement and layout. Communication via wire trios, assists by reducing the actual data-speed meanwhile increasing the amount of data transferred. Hence, further development and research of C-PHY is inevitable. Shown in Figure 2.1 C-PHY architecture constitutes of Protocol, Control, Lane side [19]. Protocol side contains raw data from/to sensors, processors or software. Control side is a hardware that accounts for necessary control operations and signals for PHY lane communications. At the periphery, Lane side transmits and receives the data via channel link. Lane side accompanies LP transmitter/receiver pair for rail to rail, low frequency control signaling purposes. Contention detector (CD) block inspects voltage at lines for false voltage levels. At the core, HSTX or/and HSRX are responsible for high-speed, efficient data transmission over a long wire.



Figure 2.1 C-PHY Universal Architecture



#### 2.2 3-Wire Wire States

Initially it is important to define a meaning of wire state (WS) in C-PHY high-speed data transmission and unit interval (UI). UI is the time interval at which 1 data value is defined. For example, 1 UI of 1 GB/s data rate is 1 ns, indicating that each 1 ns interval includes 1-bit value. There are 3 wires A, B, C involved in data transmission and 3 voltage levels are utilized, therefore during 1 UI each wire sits on different amplitudes. WS +x is defined when line A is high, line B is low and line C is common. Similarly -x WS is where line A driven low, line B driven high and line C driven to common level. Depending of A, B, C line levels there are total of six WS, namely +x, -x, +y, -y, +z, -z which are defined in the same manner. 1 UI constitutes of 1-line being high, 1-line boing low and 1-line being common level. Note that high, middle, low levels are compatible with scalable low voltage level (SLVS), where common voltage level is 200 mV. After that receiver, transfers 3 level voltage to 2 level voltage by comparing the A with B, B with C and C with A. Those states are illustrated in Figure 2.2.



Figure 2.2 (a) Six PHY Wire States (b) A, B, C lines signal voltages (c) Receiver output depending on Wire States



#### 2.3 Decoding

Encoding rule of C-PHY is that between each WS, transition exist, and those transition between WS are defined as symbol (S). Hence there are 5 symbols, 1 symbol per transition from current WS to the other 5 WS. 1 symbol can encode  $\log_2 5 = 2.3219$  bits, and data throughput is multiplied by 2.3219, meaning that 1 UI of GS/s (giga symbols per second) is equal to 1 UI of 2.3219 GB/s. Since there are 5 symbols, then 3 bits are required to represent 1 symbol electrically, and those bits are Flip, Rotate and Polarity bits. Flip bit indicates whether previous and present WS are complementary of each other. Rotate indicates alphabetical order or clockwise rotation from previous to the present WS. Sign change from previous to present WS represented by Polarity bit. The range of 5 symbols represented by 3 bits are 000, 001, 010, 011, 100. The symbol 000 indicates that no Flip, no Polarity change has been done, and counter clockwise rotation occurs, from +(-)y  $\rightarrow$  +(-)x or +(-)z  $\rightarrow$  +(-)y or +(-)x  $\rightarrow$  +(-)z. Value of 001 means that only Polarity changes by counter clockwise rotation, from +(-)z  $\rightarrow$  -(+)y or +(-)x  $\rightarrow$  -(+)z or +(-)y  $\rightarrow$  -(+)x. 010 corresponds to clockwise rotation only, from +(-)x  $\rightarrow$  +(-)y or +(-)y  $\rightarrow$  +(-)z or +(-)z  $\rightarrow$  +(-)x. Rotating clockwise while changing polarity designated by 011, from +(-)x  $\rightarrow$  -(+)y or +(-)y  $\rightarrow$  -(+)z or +(-)z  $\rightarrow$  -(+)x. Lastly 100 means Flipping operation, from +(-)x  $\rightarrow$  -(+)x or +(-)y  $\rightarrow$  -(+)y or +(-)z  $\rightarrow$  -(+)z. Graphically decoding process illustrated in Figure 2.3. From circuit perspective Encoder (ENC) block is used to convert from required symbols to required previous and present WS values.



Figure 2.3 Wire State diagram



### 2.4 Data Reception and Decoding

Figure 2.5 demonstrates processes that took place for high-speed data transmission. At the start 16-bit input data should be converted to the several symbols. 7 symbols, in total of 21 bits are used to represent 16-bit data, since  $5^7 = 78,125$  permutations of 7 symbols are available for  $2^{16} = 65,536$  data range. 16-bit to 7-symbol Mapper block does this operation of transforming bits to symbols. Thereafter 21 parallel bits (7-symbols) are serialized to 3 parallel bits (Tx\_Flip, Tx\_Rotation, Tx\_Polarity). Symbol ENC then produces present WS value depending on previous WS and present symbol. At the transmitter end, 3-Wire Driver drives the channel according to the WS rule. Receiver first converts the 3-level signals to the rail-rail 2-level digital signals, which are after used to decode the present symbol value depending on received present and previous WS. Those 7 serial symbols then parallelized back and de-mapped to its initial 16-bit form.



Figure 2.4 Process of transmitting and receiving 16-bit data

| Present Wire State                                          | Previous Wire State (AB, BC, CA) received during (N-1) <sup>th</sup> state |          |          |          | state    |          |
|-------------------------------------------------------------|----------------------------------------------------------------------------|----------|----------|----------|----------|----------|
| (AB, BC, CA)<br>received during N <sup>th</sup><br>interval | +x [100]                                                                   | -x [011] | +y [010] | -y [101] | +z [001] | -z [110] |
| +x [100]                                                    | N/A                                                                        | 1xx      | 000      | 001      | 010      | 011      |
| -x [011]                                                    | 1xx                                                                        | N/A      | 001      | 000      | 011      | 010      |
| +y [010]                                                    | 010                                                                        | 011      | N/A      | 1xx      | 000      | 001      |
| -у [101]                                                    | 011                                                                        | 010      | 1xx      | N/A      | 001      | 000      |
| +z [001]                                                    | 000                                                                        | 001      | 010      | 011      | N/A      | 1xx      |
| -z [110]                                                    | 001                                                                        | 000      | 011      | 010      | 1xx      | N/A      |

Table 2.1 Truth table for WS Decoding



#### 2.5 Clock recovery concept in C-PHY

Previously stated encoding rule not only helps to increase the data throughput, but it also facilitates the clock recovery due to existence of transition during 1 UI. By detecting transition at each 1 UI, clock can be restored properly without the need for PLL based topologies. Figure 2.5 demonstrates the example of data transmission and reception and shows transitions at each 1 UI and clock recovery due to transitions.



Figure 2.5 Example of data transmission and clock recovery

#### 2.6 HS and LP mode Line Levels

C-PHY system operates in one of the two different modes that distinguished by voltage levels at A, B, C lines. During HS mode A, B, C are driven to SLVS level with typical common level of 200 mV. LP mode is designated to send rail-rail control signals between controllers and limited by data rate of 200 MHz. As seen in Figure 2.6 HS mode signaling lies underneath the low threshold for LP blocks.



Figure 2.6 HS and LP Line levels



## III. High-Speed Receiver Lane Design 3.1 Termination Requirements and Design

Wavelength of 1 GHz signal is 30 cm and for 3 GHz signal is 10 cm, as frequency increases further, wavelength becomes more comparable with wire lines that are used for communication between chips. In a lumped circuit model, it is initially assumed that rise time  $(t_r)$  of signal is far greater than time of flight (propagation time  $t_f$ ), therefore same voltage is seen across the line at the given time and wire is modelled with parasitic capacitance only. However, when time of flight prevails rise time  $(t_r < 2.5t_f)$ , transmission line theory should be utilized to correctly approximate the circuit operation. In order to understand the requirement for termination block, fundamentals of transmission line theory should be considered.



Figure 3.1 Transmission line model

By solving the Telegrapher Equation for unit length of transmission line, frequency domain voltage and current expressions are obtained as:

$$V(z) = V_0^+ e^{-j\gamma z} + V_0^- e^{j\gamma z}$$
(3.1)

$$I(z) = \frac{V_0^+}{Z_0} e^{-j\beta z} + \frac{V_0^-}{Z_0} e^{j\beta z}$$
(3.2)

$$Z_{0} = \sqrt{\frac{R + j\omega L}{G + j\omega C}} - characteristic impedance$$
(3.3)  
$$\gamma = \sqrt{(R + j\omega L)(G + j\omega C)} - propagation constant$$
(3.4)

Negative exponent term refers to the wave travelling toward the load, and positive term refers to the wave travelling from the load concluding that wave reflection occurs at the load. Reflection coefficient hence is developed to describe the amount of reflected waveform as:

$$\Gamma = \frac{V_0^-}{V_0^+} = \frac{Z_L - Z_0}{Z_L + Z_0} \quad (3.5)$$



Impact of reflection phenomenon stems from several observations. Non zero reflection coefficient implies that, reflected back voltage will superimpose with the transmitted voltage, therefore disturbing the measurement results. Some extreme examples are useful to fully understand the reflection effect on voltage. From the equation (x), if load is undermined with  $Z_L = \infty$ , then  $\Gamma = 1$ , meaning that voltage is fully reflected back. On the other hand, if load is shorted with  $Z_L = 0$ , then  $\Gamma = -1$ , implies that voltage is fully reflected but with reversed amplitude. Equating the termination impedance and characteristic impedances, ideally, reflection can be avoided altogether. Therefore, the need for termination is fully understood.



Figure 3.2 Open and Short transmission line terminations

Packaging a termination inside the same chip allows ease of combination with different boards, and reduces the parasitic effects if discrete components are used. Moreover, in designing termination block process-voltage-temperature (PVT) variation should be considered and trimming option will provide huge benefit. Parallel trimming method, where several resistors are connected in parallel manner and can be digitally controlled, provides the best approach while giving small step changes between subsequent bit values. Designed termination front-end is illustrated in Figure 3.3. The most basic block of termination resistor is unit block (UB) composing of 650  $\Omega$  resistor. Parallel combination of 10 UB's, 4 UB's, 2 UB's and 1 UB gives parallel combination of 65  $\Omega$ , 162.5  $\Omega$ , 325  $\Omega$ , 650  $\Omega$ . These values are chosen according to the iterative calculation and with a goal of giving at least 20% variation in both direction from optimal 50  $\Omega$  value and putting 50  $\Omega$  on the middle of the range. B2, B1, B0 controls 4 UB's, 2 UB's and 1 UB respectively with 10 UB's being controlled by enabled bit. Therefore, maximum resistor value available becomes 65  $\Omega$ , when B[2:0] = 000<sub>2</sub>, and minimum value is 38  $\Omega$  with B[2:0] = 111<sub>2</sub>, with 50  $\Omega$  corresponding to B[2:0] = 011<sub>2</sub> which is almost midpoint from 000<sub>2</sub> to 111<sub>2</sub>. As a result, 50  $\Omega$  + 30% and 50  $\Omega$  - 24% range is achieved with 5 to 12 % resolution between values.





Figure 3.3 Designed Termination Front-End

| <b>B0</b> | <b>B1</b> | <b>B0</b> | 65 Ω | 162.5 Ω | 325 Ω | 650 Ω | <b>R</b> <sub>TERM</sub> | %   |
|-----------|-----------|-----------|------|---------|-------|-------|--------------------------|-----|
| 0         | 0         | 0         | 65 Ω | Х       | Х     | Х     | 65 Ω                     | 130 |
| 0         | 0         | 1         | 65 Ω | Х       | Х     | 650 Ω | 59.1 Ω                   | 118 |
| 0         | 1         | 0         | 65 Ω | Х       | 325 Ω | Х     | 54.2 Ω                   | 108 |
| 0         | 1         | 1         | 65 Ω | Х       | 325 Ω | 650 Ω | 50.0 Ω                   | 100 |
| 1         | 0         | 0         | 65 Ω | 162.5 Ω | Х     | Х     | 46.4 Ω                   | 93  |
| 1         | 0         | 1         | 65 Ω | 162.5 Ω | Х     | 650 Ω | 43.3 Ω                   | 87  |
| 1         | 1         | 0         | 65 Ω | 162.5 Ω | 325 Ω | Х     | 40.6 Ω                   | 81  |
| 1         | 1         | 1         | 65 Ω | 162.5 Ω | 325 Ω | 650 Ω | 38.2 <b>Ω</b>            | 76  |

Table 3.1 Resistor value range

## **3.2 Signal Integrity**

Previously discussed transmission line effects like insertion loss, plus crosstalk and in addition packaging, PCB parasitic defects will dominate at over GB/s data-rate causing quality of transmitted signal to degrade. The measure of such high-speed electrical signal quality is called signal integrity (SI). The main task of SI is to analyze the signal quality and to improve the SI as much as possible. The most dominant effect at high-speed data transmission is intersymbol interference (ISI), where signal over 1 UI interferes with other UI's hence disturbing the overall quality of transmission. Main cursor of given data is the cursor which indicates the point where most important or most dominant information of data. Precursor indicates the information of data that affects the main cursor of previous data. Postcursor deteriorates the upcoming next data. Effects of ISI are demonstrated in Figure 3.5, with clear distinctions between Pre, Main and Post cursors.





Figure 3.4 Nonidealities in transmission medium



Figure 3.5. ISI effect on channel pulse response

$$y[n] = x[n] * h[n] = \sum_{m=0}^{M} x[m]h[n-m] =$$

$$= x[0]h[n] + \cdots x[C]h[n-C] + \cdots x[M]h[n-M]$$

$$x[n] - input data \qquad (3.7)$$

$$h[n] - medium impulse response \qquad (3.8)$$

$$y[n] - output data \qquad (3.9)$$

If x[C]h[n-C] represents the main cursor of the data y[n] then the terms before are Postcursor of previous data and the terms after are the Precursor of the next data. Therefore, it is mathematically clear that those cursors distort the currently received data. Mathematical intuition of this phenomenon aids the construction of system blocks that counterattacks cursor.

In total channel imperfections like ISI and insertion loss, crosstalk and noises lead to the two kind of distortion, phase distortion and amplitude distortion. Both distortions are united under the name jitter, which is basically measure of variance from ideal value. From time domain perspective jitter can be observed and measured using eye-diagram method, where all possible data values in UI's involved in transmission are overlapped into a single UI. Figure 3.6 demonstrates this method with ideal and distorted signals.





Figure 3.6 Eye diagram construction of ideal and non-ideal signals

Eye diagram reveals probabilistic nature of time jitter, when histogram plots as shown in Figure 3.7 of jitters are also introduced. From histogram it is visually intuitive that jitter can be divided into random jitter (RJ) and deterministic jitter (DJ). RJ composed of device noises which follows the  $\mu$ =0 and  $\sigma_{RMS}$ :

$$RJ(t) = \frac{1}{\sqrt{2\pi}\sigma_{RMS}} e^{-t^2/2\sigma_{RMS}^2}$$
(3.10)

DJ which stems from channel effects like losses, reflection, termination mismatches and ISI are mathematically described as bounded impulses:

$$DJ(t) = \frac{\delta(t - P/2)}{2} + \frac{\delta(t + P/2)}{2} \quad (3.11)$$

Total jitter is convolution of RJ and DJ in time domain. Behaviors and mathematical models of jitters visually seen from histogram where RJ corresponds to unbounded Gaussian tails, and DJ moves RJ to two different peaks. Bit error rate (BER) is another key measure in SI, that defined as the number of bits received in error divided by total number of bits transferred. Intuitively it is clear that minimizing the BER will maximize the SI and hence total quality of received signal. Even though BER defined as probabilistic model it is directly related to jitter through:

$$BER(t) = \rho_T \int_{-\infty}^{+\infty} Total \ Jitter(t) dt \ (3.12)$$
$$\rho_T - transition \ density \ (1/2) \ (3.13)$$

Hence it is clear that BER is the total area under jitter histogram curve, and in order to reduce BER, jitter itself should be reduced.





Figure 3.7 Jitter Histogram and Sources

From the jitter and BER relationship another visual relation as in Figure 3.8 can be observed if different sampling points are considered within eye diagram thus showing what type of jitter will have the most dominant effect on BER. Thus, BER will reduce as sampling point moves toward eye center. Additionally, DJ will dominate if data is timed at eye edges, and BER will increase subsequently, meanwhile RJ will overcome DJ toward the eye center.



Figure 3.8 BER relation to jitter type



#### 3.3 Equalizer and Receiver Design

Previous discussion reveals that in order to minimize the BER of system, jitter should be also minimized. This can be done by alleviating the effect of DJ which stems from the channel nonidealities. Circuit technique to counterfeit the channel effects is called equalization. Equalization methods can be subdivided into by type as discrete and continuous equalization, and by side as transmitter and receiver equalization. Despite its differences, the main goal of equalizer block is to create high-pass filter response to flatten the low-pass filter response of the whole medium. Employing discrete type allows to use compact standard cells, hence making full use of Moore's law with minimal cost. The main task of discreate equalizers is in dealing with Post and Pre cursor ISI's in manner that distorts the transmitted or received signal in order to invert channel distortion. Discrete equalization can be employed using technique called feed-forward equalization (FFE) [20], [21]. FFE equalization does main two things, it pre-distorts the signal so that it cancels the effects of its Pre and Post cursor effects and de-emphasizes low frequency part to optimize the power consumption as visualized in Figure 3.9. De-emphasize is used since in creating high-pass filter effect, fast transition signal containing high frequency components should be amplified, but discrete blocks used in FFE are always driven to its maximum current capacity to comply with speed requirements. Therefore, amplification of high frequency signal over the range of allowable swing is not used, and attenuation or de-emphasis of low-frequency components is rather preferable.



Figure 3.9 Comparison of channel pulse response to signal w/o FFE and w/ FFE

Most commonly used architecture for FFE is finite impulse response (FIR) filter, which is shown in Figure 3.10. FIR utilized delay block with value  $T_{DE}$  called tap spacing and filter taps that are coefficients of each delay blocks and differentiated as Precursor taps and Postcursor taps. During 1 cycle input voltage or current samples go through delay block and weighted with different coefficients and summarized at the end to produce transmitted output similar to waveform from Figure 3.9.





Figure 3.10 FFE architecture based on FIR filter

Another method for discrete implementation is decision feedback equalizer (DFE) [22], [23] shown in Figure 3.11, which basically utilizes already detected signal to cancel ISI from present input and DFE is located at receiver side. Due to its feedback nature and basis on previous detected signals, DFE can only be used to mitigate Postcursor ISI distortions. Using feedback and weight coefficients allows to detect error caused by previously detected signal and subtract them from current signal, hence reducing Postcursor errors.



Figure 3.11 (a) DFE architecture (b) waveform equalization effect of DFE

Frequency domain consideration of channel impairments provides other means for understanding the equalization process. As described previously equalizer functions as high-pass filter, therefore designing system with pole and zero, suffice equalization requirements. This method of equalization called as



continuous time linear equalizer (CTLE) [24], [25]. Simple RC network demonstrated in Figure 3.12 is one examples of pole/zero system. The main advantage of using CTLE as equalization, is power efficiency, adaptability and Pre and Post cursor ISI cancellation.



Figure 3.12 (a) Passive CTLE (b) Frequency response

| Equalization type | Advantage                     | Disadvantage                |
|-------------------|-------------------------------|-----------------------------|
| FFE               | -Simple architecture          | -Voltage swing reduction    |
|                   | -High-speed                   | -DAC requirements           |
|                   | -Noise immunity               |                             |
| DFE               | -Noise immunity               | -High complexity            |
|                   | -High-speed                   | -Error propagation          |
|                   |                               | -ADC requirements           |
|                   |                               | -Only Postcursor mitigation |
| CTLE              | -Simple structure             | -Precise tuning required    |
|                   | -Both cursor ISI cancellation | -Noise amplification        |
|                   | -Low power                    |                             |

Table 3.2 Equalizers comparison

This thesis focuses on active CTLE in Figure 3.13, due its amplification properties and good power efficiency, hence lessening the requirements for following amplifier. Further design consideration for active CTLE is considered with detailed analysis. Quick intuitive analysis can demonstrate that active CTLE given in Figure 3.13 indeed follows high-pass filter response. At low frequencies, CTLE behaves like simple amplifier with degenerative resistor  $R_S/2$  at source side. As frequency goes higher, capacitive effect of  $2C_S$  starts dominating, hence shortening the degenerative resistor and rising the overall gain. Prior to CTLE design, specification like channel type, input common level of C-PHY receiver should be retrieved from standard. In order to meet low power and high-speed requirements for modern processes, differential low



voltage signaling is used to transmit high frequency data. Average power and common mode voltage of transmitted signal comply to  $P_{TX} \propto V_{CMTX}^2$ , therefore lowering the common mode voltage is one of the priorities. Table 3.3 lists the most important specifications to be considered in designing input stage of high-speed receiver, which is CTLE in this work.



Figure 3.13 (a) Active CTLE (b) Design equations (c) Frequency response

| Design consideration | Min.       | Nom.         | Max.        | Units |
|----------------------|------------|--------------|-------------|-------|
| V <sub>ICM</sub>     | 95         | -            | 390         | mV    |
| Data-Rate            |            | 3.5          |             | GS/s  |
| Channel -3 dB BW     | 0.8 (long) | 1 (standard) | 1.4 (short) | GHz   |
| Required Peaking     | 2          | 5.3          | 8.5         | dB    |







Figure 3.14 MIPI C-PHY v2.0 reference channel differential insertion losses

| CTLE              |                                           |                                   |  |  |
|-------------------|-------------------------------------------|-----------------------------------|--|--|
| Parameter         | Process                                   | Comments                          |  |  |
| Input range       | From CPHY v2.0 specifications             |                                   |  |  |
| Peaking           | Determine from medium loss                | Reference channels insertion loss |  |  |
| ${g}_m$           | Approximate and iterate                   |                                   |  |  |
| I <sub>BIAS</sub> | Choose maximum required                   | $50 \mu\text{A}$ in this thesis   |  |  |
| R <sub>S</sub>    | $2 \times (Peaking - 1)/g_m$              | Parallel trimming option          |  |  |
| R <sub>D</sub>    | V <sub>OCM</sub> /I <sub>BIAS</sub>       |                                   |  |  |
| $f_z$             | From channel specification                | Reference channels pole           |  |  |
| $f_{p1}$          | $f_z 	imes Peaking$                       |                                   |  |  |
| $f_{p2}$          | $4 \times f_{p1}$                         |                                   |  |  |
| $C_S$             | $1/(2\pi R_S f_z)$                        |                                   |  |  |
| $C_p$             | $1/(2\pi R_D f_{p2})$                     | Maximum allowable parasitic       |  |  |
|                   | High-speed comparator                     |                                   |  |  |
| Gain              | $g_{m3}/g_{m5}$                           | 5 dB in this design               |  |  |
| Bandwidth (BW)    | Output rise time $\sim 1/10$ of data-rate | 12.25 GHz                         |  |  |
| C <sub>OP</sub>   | $\leq g_{m5}/(2\pi BW)$                   | Maximum allowable parasitic       |  |  |

Table 3.4 CTLE and comparator design procedure

Due to attenuation of differential signal, high-speed comparator block should be implemented to convert differential 3-level signal to rail-to-rail output voltage, for further operations. Design in this thesis is targeted to mobile devices, hence low power, high-speed applications. Therefore, high gain amplification is replaced with relatively moderate level gain and over GHz bandwidth high-speed amplifier. These requirements can be achieved with active load amplifier, since active load provides very low parasitic values. Assumption for



output rise and fall time of being one tenth of UI (1/3.5G = 285.7 ps), sets bandwidth to a 12.25 GHz range. In summary, architecture given in Figure 3.15 is designed to be a high-speed analog front-end for 3.5 GS/s symbol-rate.



Figure 3.15 High-speed front-end receiver

| Devices        | Values             | Comments                |  |  |  |
|----------------|--------------------|-------------------------|--|--|--|
| M1/M2          | 4.8µ/150n          |                         |  |  |  |
| R <sub>S</sub> | 5680 / 3878 / 1500 | Trimming implemented    |  |  |  |
| $C_{S}$        | 35 fF              |                         |  |  |  |
| R <sub>D</sub> | 3800               |                         |  |  |  |
| M3/M4          | 1700n/100n         |                         |  |  |  |
| M5/M6/M7/M8    | 120n/30n           | Chosen to provide equal |  |  |  |
| M9/M10         | 400n/30n           | rise-fall time          |  |  |  |
| MB1/MB2        | 20x500n/500n       | 50 µA                   |  |  |  |
| MB3            | 16x500n/500n       | 40 µA                   |  |  |  |

Table 3.5 Device value and size information



# 3.4 Clock Recovery

Most variety of Clock Recovery circuits are based on PLL topology, therefore requiring long tuning time, and considerable layout space for loop-filter. Chapter 2 illustrated that high-speed data transmission in C-PHY occurs via 3-wire trios, with the rule of at least one transition per UI, therefore allowing recovery of clock signal [26] without complex PLL architectures. To make use of C-PHY data transmission rule, transition detector circuit implemented as seen in Figure 3.16, where programmable delay block and XNOR gate is used to compare the original and delayed version of data. XNOR gate produces downward pulse in the vicinity of transition in original data with the width equal to delay and polarity of that pulse does not depend on the data changing from high to low or vice versa. During certain UI's only one of AB, BC, CA will have a transition, therefore three separate transition detectors are used. Moreover, due to presence of jitter and skew between three outputs, output pulses from transition detectors will also demonstrate the same jitter and skew in respect to each other's. Clock transition is then generated after the latest downward pulse, so that the latest data between AB, BC, CA that generated downward pulse, can be sampled properly. Combining three downward pulses via AND gate, guarantees that output of AND gate will transition to high level, only after high level transition of latest downward pulse. Time delay of transition detector should be ranged properly. If T<sub>DE</sub> is smaller than peak-to-peak jitter of AB, BC, CA, then PAB, PBC, PCA pulses will not be overlapped with each other properly, thus producing glitch at the PCLK point. Therefore  $T_{DE}$  is designed to be larger than total jitter, for CDR to operate properly.



Figure 3.16 (a) Schematics (b) Waveform example



#### **3.5 Decoder**

F

The high-speed data transmission in C-PHY lane ends with decoding the symbols from wire state transitions. After decoding, symbols are deserialized into 21 bits parallel data and sent to the control lane for further processing. Previously designed and proposed decoder and post-decoder blocks [26], [27] shown in Figure 3.17, utilizes additional blocks like 4-symbol shift registers, sync detectors, and symbol selectors, which will consume additional power. Moreover, since block diagram of symbol decoder cannot be seen, there might be ambiguity in discussing symbol decoder design comparisons. Sync detectors are used to detect synchronization 21-bits word, and to lock the data with the correct edge of the recovered clock, so that final 21 parallel bits were in correct order. Architecture in Figure 3.17, also operates using the half-rate clock naturally generated from CDR block from Chapter 3.4, therefore, 24-bits are generated instead of 21, and symbol selector is used to correctly grab the 21 required bits from 24-bits.



Figure 3.17 (a) Receiver back-end diagram (IEEE/TCAS 2020) (b) waveform examples

This thesis design, moves synchronization process into control lane domain, and without synchronization, direct half-rate clock usage is not feasible, since, with half-rate clock, symbol selector should be used to grab the correct symbols, which is mainly determined by synchronization word detectors. Afterwards, symbol selectors can be removed without sync detectors. This thesis also proposes symbol decoder design, which is based on truth table provided in Table 3.6 from C-PHY v2.0 standard. However, taking the direct approach in designing decoder from this table, results in logic equations:

$$FLIP = (a + d)(b + e)(c + f)(a' + d')(b' + e')(c' + f') (3.14)$$

$$R = b'c'd'e' + bcde + a'b'd'f' + abdf + a'c'e'f' + acef + a'b'e'f + ab'e'f' + a'c'd'e + a'cd'e' + b'c'df' + bc'd'f (3.15)$$

$$POLARITY = (a + e + f)(a' + e' + f')(b + d + f)(b' + d' + f')(c + e + d)(c' + e' + d') (3.16)$$



| Present                    | Previous (AB, BC, CA) = (d, e, f) |          |          |          |          |          |  |  |
|----------------------------|-----------------------------------|----------|----------|----------|----------|----------|--|--|
| (AB, BC, CA) $= (a, b, c)$ | +x [100]                          | -x [011] | +y [010] | -y [101] | +z [001] | -z [110] |  |  |
| +x state [100]             | NA                                | 100      | 000      | 001      | 010      | 011      |  |  |
| -x state [011]             | 100                               | NA       | 001      | 000      | 011      | 010      |  |  |
| +y state [010]             | 010                               | 011      | NA       | 100      | 000      | 001      |  |  |
| -y state [101]             | 011                               | 010      | 100      | NA       | 001      | 000      |  |  |
| +z state [001]             | 000                               | 001      | 010      | 011      | NA       | 100      |  |  |
| -z state [110]             | 001                               | 000      | 011      | 010      | 100      | NA       |  |  |

Table 3.6 FLIP, ROTATE, POLARITY symbol truth table

Even though, logic form above can be utilized optimally as it is, via TSMC 28nm standard cells, however direct construction of symbol decoder, results non-uniform delay lines, and requires several copies of the same block. Polarity bit can be simplified into three copies of AND (OR (i, j, k); NAND (i, j, k)) block where (i, j, k) is (a, e, f) and (b, d, f) and (c, e, d). At the end, three copies should be multiplied by AND gate to produce Polarity bit. Abundant duplication of same block and non-equal delay paths between inputs can be resolved, by taking different approach. Initially received AB, BC, CA outputs can be converted into wire state (WS) bits. Thereafter WS bits can be decoded into FRP symbol bits. The reason for this method stems from the idea that if there will be WS bits explicitly indicating the state of AB, BC, CA, then it will be simpler to extract FRP information from those WS bits. Parallel between AB, BC, CA and WS bits is the same as parallel between 3-bit binary counting and 7-bit ring counting, since it is simpler to extract state information from ting counting rather than from binary counting. However, in case of WS, there is no need for using 7 bits since AB, BC, CA will only take 6 values instead of 8 (2<sup>3</sup>). Moreover, by assigning sign bit to WS, since each wire state has complementary wire state (+x and -x), number of WS bits can be reduced further. Therefore only 4 bits for WS, namely X, Y, Z, S can be used to extract wire state from AB, BC, CA. Figure 3.18 illustrates the truth table and logic diagram of this procedure.



Figure 3.18 Wire State Decode (a) truth table (b) logic diagram



After extracting state information symbol decoding is designed based on state transition diagram of C-PHY from Figure 2.3. Equality of present and previous states, while ignoring the sign bit (|+x| = |-x|), implies the Flip operation. Polarity operation indicates on disparity between present and previous sign bits. Rotation will be high if previous state and present state are in alphabetical order (x to y). From observations done above, design of symbol decoder shown in Figure 3.19 is understood intuitively.

| Prev<br>W.S. | Pres<br>W.S. | Prev<br>Sign | Pres<br>Sign | Flip | Rotate | Polarity |  |
|--------------|--------------|--------------|--------------|------|--------|----------|--|
| X            | X            | 0            | 0            | NA   | NA     | NA       |  |
| Х            | Х            | 0            | 1            | 1    | 0      | 0        |  |
| Х            | X            | 1            | 0            | 1    | 0      | 0        |  |
| Х            | Х            | 1            | 1            | NA   | NA     | NA       |  |
| Х            | Y            | 0            | 0            | 0    | 1      | 0        |  |
| Х            | Y            | 0            | 1            | 0    | 1      | 1        |  |
| Х            | Y            | 1            | 0            | 0    | 1      | 1        |  |
| Х            | Y            | 1            | 1            | 0    | 1      | 0        |  |
| Х            | Ζ            | 0            | 0            | 0    | 0      | 0        |  |
| Х            | Z            | 0            | 1            | 0    | 0      | 1        |  |
| Х            | Z            | 1            | 0            | 0    | 0      | 1        |  |
| Х            | Ζ            | 1            | 1            | 0    | 0      | 0        |  |
| Y            | Х            | 0            | 0            | 0    | 0      | 0        |  |
| Y            | Х            | 0            | 1            | 0    | 0      | 1        |  |
| Y            | Х            | 1            | 0            | 0    | 0      | 1        |  |
| Y            | Х            | 1            | 1            | 0    | 0      | 0        |  |
| Y            | Y            | 0            | 0            | NA   | NA     | NA       |  |
| Y            | Y            | 0            | 1            | 1    | 0      | 0        |  |
| Y            | Y            | 1            | 0            | 1    | 0      | 0        |  |
| Y            | Y            | 1            | 1            | NA   | NA     | NA       |  |
| Y            | Ζ            | 0            | 0            | 0    | 1      | 0        |  |
| Y            | Ζ            | 0            | 1            | 0    | 1      | 1        |  |
| Y            | Ζ            | 1            | 0            | 0    | 1      | 1        |  |
| Y            | Ζ            | 1            | 1            | 0    | 1      | 0        |  |
| Ζ            | Х            | 0            | 0            | 0    | 1      | 0        |  |
| Ζ            | Х            | 0            | 1            | 0    | 1      | 1        |  |
| Ζ            | Х            | 1            | 0            | 0    | 1      | 1        |  |
| Ζ            | Х            | 1            | 1            | 0    | 1      | 0        |  |
| Ζ            | Y            | 0            | 0            | 0    | 0      | 0        |  |
| Ζ            | Y            | 0            | 1            | 0    | 0      | 1        |  |
| Ζ            | Y            | 1            | 0            | 0    | 0      | 1        |  |
| Ζ            | Y            | 1            | 1            | 0    | 0      | 0        |  |
| Ζ            | Z            | 0            | 0            | NA   | NA     | NA       |  |
| Ζ            | Z            | 0            | 1            | 1    | 0      | 0        |  |
| Ζ            | Z            | 1            | 0            | 1    | 0      | 0        |  |
| Ζ            | Z            | 1            | 1            | NA   | NA     | NA       |  |
|              |              |              | (a)          | )    |        |          |  |







(b)

Figure 3.19 Symbol Decoder (a) truth table (b) logic diagram





Figure 3.20 Back-end of HS-RX Lane

Top diagram of HS-RX lane Back-end is given in Figure 3.20. It consists of Wire State Decoder and Symbol Decoder discussed previously, and sampling register between them. Decoded symbols then parallelized via 1x7 shift registers timed by clock signal with twice the frequency of recovered clock. Parallel Flip, Rotate, Polarity signals then synchronized with the clock with the one seventh the frequency of recovered clock. By moving sync word detection into the logic side and decomposing symbol decoder into a wire state and symbol decode subblocks, results in power efficient and relatively simple Back-end architecture.

#### 3.6 Top HS-RX Lane

Figure 3.21 demonstrates whole diagram of HS-RX lane for C-PHY lane. Operation, motivation and design of each illustrated subblocks, were discussed in previous subchapters.



Figure 3.21 Top diagram of HS-RX Lane



# **IV. Simulation and Experimental Results**

# 4.1 Simulation Results

Table 4.1 lists extracted corner simulation of termination resistor. Via utilization of trimming scheme, 50  $\Omega$  with deviation of +0.04% and 0.025% is achieved from extracted corner simulation.

| Control - | -40 C |       |       |       | 27 C  |       |       |       | 85    |       |       |       |
|-----------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
|           | ff    | SS    | fs    | sf    | ff    | SS    | fs    | sf    | ff    | SS    | fs    | sf    |
| 000       | 50.01 | 73.14 | 61.23 | 62.39 | 54.91 | 80.49 | 67.35 | 68.56 | 59.32 | 87.14 | 72.89 | 74.11 |
| 001       | 45.24 | 66.26 | 55.44 | 56.49 | 49.69 | 72.95 | 61.01 | 62.10 | 53.7  | 78.99 | 66.05 | 67.15 |
| 010       | 41.58 | 60.85 | 50.93 | 51.89 | 45.66 | 66.98 | 56.03 | 57.03 | 49.33 | 72.52 | 60.66 | 61.66 |
| 011       | 38.33 | 56.12 | 46.97 | 47.85 | 42.1  | 61.78 | 51.68 | 52.59 | 45.49 | 66.89 | 55.94 | 56.87 |
| 100       | 36.38 | 52.90 | 44.36 | 45.22 | 39.88 | 58.16 | 48.77 | 49.63 | 43.03 | 62.91 | 52.72 | 53.6  |
| 101       | 33.88 | 49.30 | 41.36 | 42.13 | 37.15 | 54.21 | 45.45 | 46.25 | 40.09 | 58.65 | 49.15 | 49.96 |
| 110       | 31.96 | 46.42 | 38.98 | 39.7  | 35.03 | 51.01 | 42.81 | 43.56 | 37.78 | 55.18 | 46.28 | 47.03 |
| 111       | 30.08 | 43.69 | 36.69 | 37.36 | 32.97 | 48.02 | 40.29 | 40.99 | 35.57 | 51.93 | 43.56 | 44.26 |

Table 4.1 Termination Resistor corner simulation results

Figure 4.1 shows single lane layout, excluding power metals, pads and dummy fills. Single lane layout consists of three termination resistors (1), common mode capacitors (2), three HS-RX's (3), Back-end (4) PTAT current source (5), and three LP transmitters, receivers, contention detectors (6).



Figure 4.1 MIPI C-PHY single lane layout



Extracted output eye diagram results of designed CTLE is shown in Table 4.2, for long, standard reference channels, when 200 mV common mode 150 mV peak-to-peak 3 level voltage is transmitted via channels at 3.5 GS/s symbol rate. As expected CTLE improves signal quality by reducing the jitter and widening the eye interior. Frequency response also demonstrates that designed trimming range of CTLE from 1.4 dB to 13.4 dB includes peaking values required for reference channels (2 dB to 9 dB).



Table 4.2 Extracted simulation results for HS-RX receiver



Figure 4.2 includes Clock Recovery circuit extracted simulation along with proposed DEC simulation, with long reference channel and with simulation setup described previously. However, in DEC simulation, receiver was feed with predefined patterns like sync word, data 0, data 1 over a long channel. As seen, WS (X, Y, Z, S) bits have some glitches, which are due to jitter between AB, BC, CA. However, sampling register removes those glitches and Flip, Rotate, Polarity bits will be glitch free.



Figure 4.2 (a) Recovered clock eye diagram (b) CDR waveforms (c) Decoder waveforms

After generating correct symbols, de-serializer clock with double frequency of recovered clock is generated, for 1x7 shift registers operation. Timed and parallel 7 symbol (21-bits), should be sampled using synchronization clock, to retrieve each 21-bits at 7<sup>th</sup> cycle. Therefore, synchronization clock is generated via dividing recovered clock frequency by 7. Resulting waveforms of post-decoder processing for Flip bit are shown in Figure 4.3. Simulation setup is similar to the one, which was used to test extracted DEC block.





(a)



Figure 4.3 (a) De-serializer clock and shift register waveforms (b) synchronization clock waveform



# **4.2 Experimental Results**

Figure 4.4 shows, taped out die photography, and top layout picture. Top layout includes three C-PHY lane, and four D-PHY lane with CLK lane. Total area for combo chip is 1.298  $\mu$ m<sup>2</sup>. PHY only chip with the area of 1.1  $\mu$ m<sup>2</sup>, which is used to probe high speed points and check HSRX lane operation is also taped out. PHY only chip include two C-PHY lane with two D-PHY lane and test pads for receiver and clock outputs for probing.





Figure 4.4 (a) Die photography (b) Top layout picture



Test setup is demonstrated in Figure 4.5. For transmitted signal generation SV3C-CPTX MIPI C-PHY Generator from Introspect Technology Inc. was used. FR-4 PCB board was designed for PHY only chip testing purposes. 8 GHz 10-bit Keysight MSO-S 804A oscilloscope was used for probing high speed differential signals. Test was conducted under 0.9 V supply voltage.



Figure 4.5 PHY only chip test setup

HSRX output waveforms are shown in Figure 4.6 for 1, 2, 3 and 3.5 GS/s symbol rates. Targeted 3.5 GS/s rate, is measured to have 0.53 UI jitter, which is 0.23 UI larger than simulation results. The deviation from simulation is attributed to the buffer stage for high-speed differential signal probing, PCB track losses.



Figure 4.6 HSRX output waveform (a) at 1.5 GS/s (b) 2.0 GS/s (c) 3.0 GS/s (d) 3.5 GS/s



HSRX output waveform 3.5 GS/s symbol-rate is revisited in Figure 4.7 (a), in order to compare it with recovered clock waveform in Figure 4.7 (b), and analyze the problem of clock signal disappearance. Clock waveform exhibit frequency behavior of 500 MHz, which is correct value for synchronization clock. However, after a short while, clock returns to the zero state, and starts toggling again only after reset. This



(a)







Figure 4.7 (a) HSRX output at 3.5 GS/s (b) Clock output at 3.5 GS/s (c) Clock output at 4.5 GS/s



undesirable operation, is attributed to the accumulating jitter at AB, BC, CA, hence delay value range for CDR circuit was underestimated. Although, at 4.5 GS/s symbol rates, delay range overcomes the accumulation jitter, since absolute values of jitter decreases as data-rate increases. As a results circuit displays correct clock with frequency of 643 MHz Figure 4.7 (c) at 4.5 GS/s.



## V. Summary

## 5.1 Conclusion

This thesis focused on designing High-Speed Serial Interface, specifically MIPI C-PHY PHY layer at data rate of 7.98 GB/s (3.5 GS/s) for low power mobile applications like smartphone or VR displays. High-Speed Receiver lane was implemented with TSMC 28nm HPC process technology using standard 0.9 V CMOS devices. Low Power transmitter, receiver and contention detector was implemented via 1.2 V CMOS transistors. High-Speed Receiver uses parallel trimming method for termination resistor to counteract the PVT variations and as a result under 0.1% variance was achieved from the simulation results. To overcome the ISI cursor, active and trimmable CTLE amplifier is designed for C-PHY specified reference channels. Simulation results demonstrated maximum peaking of 13.4 dB, and measurement results for 3.5 GS/s symbol rate exhibit jitter of 0.53 UI. Despite the incorrect operation of Clock Recovery circuit under 4.5 GS/s, it outputs correct signal at 4.5 GS/s, thus concluding the functionable architecture, that requires wide delay tuning range. Proposed DEC block, divides symbol decoder into two logic blocks, hence providing simpler, low power and equal delay paths solution for decoder implementation. At the end 1.1  $\mu$ m<sup>2</sup> PHY only block was taped out and tested. Due to improper behavior of Clock Recovery, power analysis of chip could not be done for this tape out. However, extracted simulation demonstrated 0.19 mW/GB/s/lane power efficiency, which is almost 3 times lower than the most recent paper in C-PHY lane design [26].

#### 5.2 Future Works

For proper measurement of power efficiency, Clock Recovery circuit should accompany wider delay range for future tape out. For higher data rates, CTLE could be implemented to allow larger gain boosting, in an expense of power consumption. Moreover, negative capacitance converter can be incorporated after CTLE stage to cancel high frequency roll-off, and to shift peak frequency to over 10 GHz range. Since, AB, BC, CA outputs of HS-RX exhibits data-dependent jitter depending on between which three levels data is transitioning. Therefore, using feedback loop, data-dependent jitter can be reduced [26]. At higher data rates, for better timing margins double-rate clock operation of back-end should be improved to half-rate clock operation. Overall, it is possible to improve to C-PHY receiver lane operation using above mentioned techniques, without changing the proposed Decoder block and designed CTLE amplifier, but with the addition of new blocks and employing more trimming range.



### REFERENCES

- [1] International Roadmap for Devices and Systems: Executive Summary, 2020 edition, [Online]. Available: https://irds.ieee.org/images/files/pdf/2020/2020IRDS ES.pdf.
- [2] Heterogeneous Integration Roadmap, 2019 edition, [Online]. Available: https://eps.ieee.org/technology/ heterogenous-integration-roadmap/2019-edition.html
- [3] Cisco visual networking index: global mobile data traffic forecast update, 2017–2022 White Paper,
   [Online]. Available: https://twiki.cern.ch/twiki/pub/HEPIX/TechwatchNetwork/HtwNetw
   orkDocuments/ white-paper-c11-741490.pdf
- [4] A. Bathini, M. Jayasimha, and D. Nagaraj, "Signal Integrity Challenges and Solutions for USB4 and TBT3 Protocols," *IEEE Xplore*, Dec. 01, 2020.
- P. H. Putman, "Display Interfacing 2018: Getting Around the UHD Speed Bump," *SMPTE Motio n Imaging Journal*, vol. 127, no. 7, pp. 51–55, Aug. 2018.
- [6] P. Gargini, F. Balestra, and Y. Hayashi, "Roadmapping of Nanoelectronics for the New Electroni cs Industry," *Applied Sciences*, vol. 12, no. 1, p. 308, Dec. 2021.
- [7] "About VESA," VESA Interface Standards for The Display Industry.https://vesa.org/about-vesa
- [8] "MIPI Overview," MIPI, Jul. 05, 2016. https://www.mipi.org/about-us
- [9] H. Q. Nguyen, E. Custovic, J. Whittington, J. Devlin, and A. Borgio, "Clock synchronisation in multi-transceiver HF radar system," 2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Sep. 2011.
- [10] S. Chen, H. Li, and P. Y. Chiang, "A Robust Energy/Area-Efficient Forwarded-Clock Receiver W ith All-Digital Clock and Data Recovery in 28-nm CMOS for High-Density Interconnects," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 24, no. 2, pp. 578–586, Feb. 2016.
- [11] A. Fiedler and S. Krishnan, "A scalable 7.0-Gb/s multi-lane NRZ transceiver with a 1/10th-rate f orwarded clock in 0.13um CMOS," 2016 IEEE International Symposium on Circuits and System s (ISCAS), May 2016.
- [12] S.-H. Chung, Y.-J. Kim, Y.-H. Kim, and L.-S. Kim, "A 10-Gb/s 0.71-pJ/bit Forwarded-Clock Rec eiver Tolerant to High-Frequency Jitter in 65-nm CMOS," *IEEE Transactions on Circuits and Sy stems II: Express Briefs*, vol. 63, no. 3, pp. 264–268, Mar. 2016.
- [13] A. Jose *et al.*, "A 1.5pJ/bit, 5-to-10Gbps Forwarded-Clock I/O with Per-Lane Clock De-Skew in a Low Power 28nm CMOS Process," *2019 IEEE Custom Integrated Circuits Conference (CICC)* , Apr. 2019.



- [14] Inhwa Jung, Daejung Shin, Taejin Kim, and Chulwoo Kim, "A 140-Mb/s to 1.82-Gb/s Continuou s-Rate Embedded Clock Receiver for Flat-Panel Displays," *IEEE Transactions on Circuits and S ystems II: Express Briefs*, vol. 56, no. 10, pp. 773–777, Oct. 2009.
- [15] K. Park, W. Bae, J. Lee, J. Hwang, and D.-K. Jeong, "A 6.7–11.2 Gb/s, 2.25 pJ/bit, Single-Loop Referenceless CDR With Multi-Phase, Oversampling PFD in 65-nm CMOS," *IEEE Journal of S olid-State Circuits*, vol. 53, no. 10, pp. 2982–2993, Oct. 2018.
- K. Park *et al.*, "A 4–20-Gb/s 1.87-pJ/b Continuous-Rate Digital CDR Circuit With Unlimited Fre quency Acquisition Capability in 65-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 56, n
   o. 5, pp. 1597–1607, May 2021.
- [17] Y.-H. Moon, J.-W. Yoo, Y.-S. Ryu, S.-H. Kim, K.-S. Son, and J.-K. Kang, "A 2.41-pJ/bit 5.4-Gb/ s Dual-Loop Reference-Less CDR With Fully Digital Quarter-Rate Linear Phase Detector for Em bedded DisplayPort," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 66, no. 8, pp. 2907–2920, Aug. 2019.
- [18] T. Kim *et al.*, "A 14-Gb/s dual-mode receiver with MIPI D-PHY and C-PHY interfaces for mobil e display drivers," *Journal of the Society for Information Display*, vol. 28, no. 6, pp. 535–547, M ay 2020.
- [19] *MIPI Alliance Specification for C-PHY, Version 2.0*, MIPI Alliance, Piscataway, NJ, USA, May 2019.
- [20] K. Wang, X. Gui, H. Yang, D. Li, and L. Geng, "A 112-Gb/s PAM-4 T/2-spaced 5-Tap FFE in 0.1
   3-μm BiCMOS," 2019 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), Nov. 2019.
- [21] S. Yuan et al., "A 4×20-Gb/s 0.86pJ/b/lane 2-tap-FFE source-series-terminated transmitter with f ar-end crosstalk cancellation and divider-less clock generation in 65nm CMOS," 2015 IEEE Cus tom Integrated Circuits Conference (CICC), Sep. 2015.
- [22] A. Balachandran, Y. Chen, and C. C. Boon, "A 32-Gb/s 3.53-mW/Gb/s Adaptive Receiver AFE E mploying a Hybrid CTLE, Edge-DFE and Merged Data-DFE/CDR in 65-nm CMOS," 2019 IEE E Asia Pacific Conference on Circuits and Systems (APCCAS), Nov. 2019.
- [23] T. Toifl *et al.*, "A 2.6 mW/Gbps 12.5 Gbps RX With 8-Tap Switched-Capacitor DFE in 32 nm C MOS," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 4, pp. 897–910, Apr. 2012.
- [24] D. Thulasiraman, C. G N, J. S. Gaggatur, and K. S. Sankara Reddy, "A 18.6 fJ/bit/dB Power Effi cient Active Inductor-based CTLE for 20 Gb/s High Speed Serial Link," 2019 IEEE Internationa l Conference on Electronics, Computing and Communication Technologies (CONECCT), Jul. 20 19.



- [25] Y.-M. Ying, I-Ting. Lee, and S.-I. Liu, "A 20Gb/s adaptive duobinary transceiver," 2012 IEEE As ian Solid State Circuits Conference (A-SSCC), Dec. 2012.
- [26] P.-H. Lee and Y.-C. Jang, "A 6.84 Gbps/lane MIPI C-PHY Transceiver Bridge Chip With Level-Dependent Equalization," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 67, no. 11, pp. 2672–2676, Nov. 2020.
- [27] S. Choi, P.-H. Lee, J.-W. Han, S.-D. Kim, and Y.-C. Jang, "A MIPI Receiver Bridge Chip Suppor ting 5-Gb/s/lane D-PHY and 3-Gsymbol/s/lane C-PHY," *JOURNAL OF SEMICONDUCTOR TE CHNOLOGY AND SCIENCE*, vol. 20, no. 1, pp. 29–40, Feb. 2020.



## Acknowledgements

I want to express my immense gratitude toward all people who helped me immeasurable during my Master's degree at UNIST. First of all, I am grateful to Prof. Myunghee Lee for his support, guidance and life changing opportunity to work in his laboratory. He provided great passionate mentorship, with rich research environment. Experience and knowledge acquired during my research work under supervision of Prof. Myunghee Lee were tremendous, and will help me to carve my path in my future professional career.

Also, I want to thank Prof. Jae Joon Kim and Prof. Heein Yoon, for being objective committee members. Due to their review of my work, I can be sure in the high quality of my thesis work. Their thoughtful advices on how to properly present research data, and recommendations on how to approach the improvements in future designs provided a great deal of help. Moreover, I would like to thank all of the professor who gave me their best knowledge and lighted my passion toward my chosen field.

I want to portray my true love toward my friends, whom made my graduate life bearable, very much pleasant, joyful and happy. I am grateful to Damira Rakhman, Baurzhan Salimzhanov, Tolganay Toleuova for being my advisors, supporters and true friends at UNIST. I cannot stretch enough my thanks toward my best friends Arstan Ashyrbekov, Bayan Saidolda, Sanzhar Yeleuov for always being there for me despite the distance. I am also thankful to my other friends whom I can call my family.

Finally, my true love to my parents who did and still doing their best help their children and everyone else, to my little brother, who made my life meaningful, to my older sisters who spoiled me a lot with their love. I am very grateful to have such a loving, caring family.

