An Energy-Efficient Reconfigurable Mobile Memory Interface for Computing Systems by Far, Majid Jalali
  
 
 
 
 
An Energy-Efficient Reconfigurable Mobile Memory 
Interface for Computing Systems 
 
 
       
 
           Majid Jalali Far 
 
Dissertation submitted to the 
     Benjamin M. Statler College of Engineering and Mineral Resources 
at West Virginia University 
in partial fulfillment of the requirements 
for the degree of 
 
 
Doctor of Philosophy 
in 
Electrical Engineering 
 
 
Gyungsu Byun, Ph.D., Chair 
Parviz Famouri, Ph.D.  
David W. Graham, Ph.D. 
Yaser P. Fallah, Ph.D. 
Ping Gui, Ph.D. 
 
 
Lane Department of Computer Science and Electrical Engineering 
 
 
Morgantown, West Virginia 
2016 
 
 
 
Keywords: Integrated Circuits, Mobile Memory Interface, Pulse Amplitude Modulation, Amplitude 
Shift Keying, Phase-Locked Loop, Off-Chip Transmission Line 
 
Copyright 2016 Majid Jalali Far 
 
  
  
 
 
Abstract 
 
 
An Energy-Efficient Reconfigurable Mobile Memory Interface for Computing Systems 
 
 
Majid Jalali Far 
Doctor of Philosophy in Electrical Engineering 
West Virginia University 
Gyungsu Byun, Ph.D., Chair 
 
 
The critical need for higher power efficiency and bandwidth transceiver design has 
significantly increased as mobile devices, such as smart phones, laptops, tablets, and ultra-
portable personal digital assistants continue to be constructed using heterogeneous intellectual 
properties such as central processing units (CPUs), graphics processing units (GPUs), digital 
signal processors, dynamic random-access memories (DRAMs), sensors, and graphics/image 
processing units and to have enhanced graphic computing and video processing capabilities. 
However, the current mobile interface technologies which support CPU to memory 
communication (e.g. baseband-only signaling) have critical limitations, particularly super-linear 
energy consumption, limited bandwidth, and non-reconfigurable data access. As a consequence, 
there is a critical need to improve both energy efficiency and bandwidth for future mobile 
devices. 
The primary goal of this study is to design an energy-efficient reconfigurable mobile memory 
interface for mobile computing systems in order to dramatically enhance the circuit and system 
bandwidth and power efficiency. The proposed energy efficient mobile memory interface which 
utilizes an advanced base-band (BB) signaling and a RF-band signaling is capable of 
simultaneous bi-directional communication and reconfigurable data access. It also increases 
power efficiency and bandwidth between mobile CPUs and memory subsystems on a single-
ended shared transmission line. Moreover, due to multiple data communication on a single-ended 
shared transmission line, the number of transmission lines between mobile CPU and memories is 
  
 
 
considerably reduced, resulting in significant technological innovations, (e.g. more compact 
devices and low cost packaging to mobile communication interface) and establishing the 
principles and feasibility of technologies for future mobile system applications. The operation 
and performance of the proposed transceiver are analyzed and its circuit implementation is 
discussed in details. A chip prototype of the transceiver was implemented in a 65nm CMOS 
process technology. In the measurement, the transceiver exhibits higher aggregate data 
throughput and better energy efficiency compared to prior works. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 
 
An Energy-Efficient Reconfigurable Mobile Memory 
Interface for Computing Systems 
 
 
       
 
           Majid Jalali Far 
 
Dissertation submitted to the 
     Benjamin M. Statler College of Engineering and Mineral Resources 
at West Virginia University 
in partial fulfillment of the requirements 
for the degree of 
 
Doctor of Philosophy 
in 
Electrical Engineering 
 
       
Lane Department of Computer Science and Electrical Engineering 
APPROVAL OF THE EXAMINING COMMITTEE 
 
 
                                                                                                         
            Parviz Famouri, Ph.D. 
                                                                                                       
            David W. Graham, Ph.D. 
                                                                                                         
            Yaser P. Fallah, Ph.D. 
                                                                                                         
            Ping Gui, Ph.D. 
                                                                                                          
                        Gyungsu Byun, Ph.D., Chair 
    
                                                                                                           
        
                    Date   
 
 
  
v 
 
 
 
 
Acknowledgement 
 
 
I would like to begin by thanking my advisor, Dr. Gyungsu Byun, for the 
opportunity to perform this research and for his constant guidance and support. I am also 
pleased to thank my committee for the support they have provided me in my graduate 
study. I would like to thank my parents who supported me all throughout my life. 
Whenever I needed their support, they were always there for me with their unconditional 
love.  
Finally, I give thanks to my wife Mariam, who supported me through all my 
degree without any sign of getting tired, I love you! 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
vi 
 
 
 
 
 
Table of Contents 
 
Acknowledgement .............................................................................................................. v 
List of Figures .................................................................................................................. viii 
List of Tables .................................................................................................................... xii 
Chapter 1: Introduction ....................................................................................................... 1 
1.1. Research Objective .......................................................................................................... 2 
1.2. Research Challenges ........................................................................................................ 3 
1.3. Dissertation Organization ................................................................................................ 4 
Chapter 2: Overview of RF-Interconnect Transceivers and MDBI Architecture ............... 6 
2.1. Introduction...................................................................................................................... 6 
2.2. Related Works ................................................................................................................. 7 
2.3. Multilevel Dual-Band Interconnect (MDBI) Architecture ............................................ 15 
Chapter 3: Phase-Locked Loop (PLL) .............................................................................. 17 
3.1 Introduction..................................................................................................................... 17 
3.2 PLL Structure .................................................................................................................. 19 
3.2.1 PFD, charge pump, and loop filter ......................................................................... 19 
3.2.2 Ring VCO ............................................................................................................... 21 
3.2.3 Frequency divider design ....................................................................................... 24 
3.2.4 Loop characteristics ................................................................................................ 25 
3.2.5 Sub-harmonic frequency multiplier design ............................................................ 30 
3.3 Experimental Results ...................................................................................................... 35 
3.4 Conclusion ...................................................................................................................... 41 
Chapter 4: RF-Band Transceiver ...................................................................................... 42 
vii 
 
 
 
 
4.1 Introduction..................................................................................................................... 42 
4.2 RF-Band Transmitter ...................................................................................................... 44 
4.3 RF-Band Receiver .......................................................................................................... 48 
4.4 RF-Band Transceiver ...................................................................................................... 53 
4.5 Band-Selective Transformer ........................................................................................... 55 
Chapter 5: Pulse-Amplitude Modulation (PAM) Transceiver .......................................... 60 
5.1. Introduction.................................................................................................................... 60 
5.2. PAM Transceiver Architecture ...................................................................................... 63 
5.3. Results ........................................................................................................................... 67 
5.4. Conclusion ..................................................................................................................... 69 
Chapter 6: Multilevel Dual-Band Interconnect (MDBI) System ...................................... 70 
6.1 Multilevel Dual-Band Interconnect (MDBI) Transceiver Architecture ......................... 70 
6.2 Multilevel Dual-band Interconnect (MDBI) Transceiver Design ................................... 73 
6.3 Off-Chip Transmission Line ........................................................................................... 81 
6.4 Chip Measurement Results ............................................................................................. 85 
6.5 Discussions ..................................................................................................................... 92 
Chapter 7: Conclusions and Future Work ......................................................................... 97 
7.1. Conclusions ................................................................................................................... 97 
7.2. Future Work ................................................................................................................. 100 
Bibliography ................................................................................................................... 101 
 
 
 
 
 
  
viii 
 
 
 
 
List of Figures 
 
2.1   Conceptual diagram of RF interconnect .........................................................  7 
2.2   Multi-band RF interconnect concept ...............................................................  8 
2.3   A single band BPSK on-chip interconnect link ..............................................  9 
2.4  Measurement results of 3.0Gb/s/pin dual band bi-directional FDMA-I system 
0.6Gb/s RF and 2.4GB/s baseband signaling (a) Input and recovered 
RF/baseband data (b) Measured data eye diagrams. .....................................  10 
2.5   (a) Point to point link (b) Multi-drop bus .......................................................  12 
2.6   Schematic of the RF-I implemented in a 3D 0.18um CMOS process ............  12 
2.7   Measured eye diagrams of aggregate 8.4 Gb/s (4.6 Gb/s BB and 3.8 Gb/s RF-
band) and 10 Gb/s (5 Gb/s BB and 5 Gb/s RF-band) data rate, respectively, on FR-
4 and Rogers 4003 test boards ............................................................................  14 
2.8  Architecture of proposed multilevel dual-band interconnect (MDBI) interface with 
reliable and simultaneous communication .........................................................  16 
3.1   Block diagram of the proposed 20GHz PLL...................................................  19 
3.2   Simplified charge pump circuit .......................................................................  21 
3.3   Ripple of control voltage .................................................................................  21 
3.4   (a) Four stage differential ring-based VCO circuit (b) Delay cell ..................  22 
3.5   Simulated phase noise of the ring VCO with different process corners .........  24 
3.6   Simplified digital static frequency divider circuit ...........................................  25 
3.7   Bode plot of the open-loop transfer function ..................................................  27 
3.8   The proposed multiply-by-10 injection-locked frequency multiplier .............  31 
ix 
 
 
 
 
3.9   Simulated phase noise of the multiply-by-2 and 5 ILFM with different process 
corners  ...............................................................................................................  34 
3.10 Simulated input sensitivity of the ILFM .........................................................  34 
3.11 Chip microphotograph ....................................................................................  35 
3.12 PLL FR4 PCB .................................................................................................  36 
3.13 Oscillation frequency of PLL after the ring VCO ...........................................  37 
3.14 Measured frequency tuning range and phase noise of the VCO .....................  37 
3.15 Oscillation frequency of the PLL after the ILFM ...........................................  38 
3.16 Phase noise of the PLL after the ILFM ...........................................................  39 
3.17 PLL lock waveform ........................................................................................  40 
3.18 PLL jitter measurement ...................................................................................  40 
4.1   System architecture of BPSK RF-band interconnect ......................................  43 
4.2   ASK modulator circuit used in RF-band transceiver ......................................  45 
4.3   Simplified schematic of RF-band transmitter .................................................  46 
4.4   Simulated input and output signals of the ASK modulator (top: the input data and 
bottom: the ASK modulated signal) ...................................................................  47 
4.5   Simplified schematic of RF-band receiver ......................................................  49 
4.6   Simulated power spectrum of the mixer designed ..........................................  50 
4.7  Simulated input and output signals of the differential self-mixer (top: the input 
data and bottom: the mixer output signal) ..........................................................  51 
4.8   Circuit configuration of the differential amplifiers .........................................  51 
4.9   Simulated output signal of the differential amplifiers .....................................  52 
4.10 Circuit implementation of the buffer converter ..............................................  53 
4.11 Simulated output signal of the output driver ...................................................  53 
x 
 
 
 
 
4.12 Simplified schematic of RF-band transceiver .................................................  54 
4.13 Simulated eye diagram of RF-band transceiver at 3Gbps in 180nm CMOS 
process ................................................................................................................  55 
4.14 Transformer design and model .......................................................................  56 
4.15 Transformer layout ..........................................................................................  57 
4.16 Simulated coupling factor (Km) of the transformer .........................................  58 
4.17 Simulated quality factor of each coil of the transformer.................................  58 
4.18 MDBI working mechanism for simultaneous bidirectional data transaction..  59 
5.1  PAM memory interface architecture ................................................................  62 
5.2  Current-mode PAM transmitter design ............................................................  63 
5.3 Timing diagram of current-mode PAM transmitter startup control logic and 
simulated PAM TX startup delay .......................................................................  65 
5.4  PAM receiver design ........................................................................................  66 
5.5  Simplified comparator circuit used in PAM receiver ......................................  66 
5.6  Simplified DFF used in the PAM receiver .......................................................  67 
5.7  Simulated four level voltage waveform ...........................................................  68 
5.8  Eye diagram of PAM interface ........................................................................  69 
6.1 (a) Block diagram of the proposed a multilevel dual-band interconnect (MDBI) 
architecture for reliable and simultaneous communication (b) Multilevel dual-band 
signaling in frequency domain ...........................................................................  71 
6.2  Band-selective transformer used in MDBI transceiver ....................................  72 
6.3  Multilevel dual-band interconnect transmitter .................................................  73 
6.4  Transmitter side layout of (a) the 4-level PAM (b) RF-band transceivers ......  74 
xi 
 
 
 
 
6.5 Multilevel dual-band interconnect receiver with a frequency band-selective 
transformer .........................................................................................................  75 
6.6  Receiver side layout of (a) the 4-level PAM (b) and RF-band transceivers ....  76 
6.7 Simultaneous bi-directional MDBI simulation waveforms for three input data 
streams ................................................................................................................  79 
6.8  Simulated data eye diagram of the MDBI transceiver (a) 4-level PAM output eye 
diagram at 5.5Gb/s/pin (b) RF-band output eye diagram at 5Gb/s/pin ..............  80 
6.9  Simulated latency of the MDBI system ...........................................................  81 
6.10 Channel modeling of off-chip transmission line with wire-bonding ..............  83 
6.11 Simulated signal loss of off-chip 5cm FR4 transmission line.........................  84 
6.12 Simultaneous bi-directional MDBI simulation waveforms for (G) the start and (H) 
the end of the off-chip transmission line ............................................................  85 
6.13 Die photo of the MDBI transceiver .................................................................  86 
6.14 Test PCB layer stack-up with 2 layers ............................................................  86 
6.15 FR4 test board .................................................................................................  87 
6.16 Measured RF-band BER results at PRBS 2
23
-l ...............................................  88 
6.17 Measured eye diagrams of aggregate data rate on FR4 PCB board ................  90 
 
 
 
  
xii 
 
 
 
 
List of Tables 
 
3.1  Performance comparison of the PLL. ................................................................... 41 
5.1  Performance comparison of the 4-level PAM transceiver. ................................... 68 
6.1  Performance comparison of the MDBI system. .................................................... 91 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1 
 
 
 
 
 
 
Chapter 1: Introduction 
 
 
 
Mobile devices have become the major volume of electronic products, with multi-
billion units shipped recently. In addition to this dramatic increase of shipping volume 
every year, more media-intensive functions such as enhanced graphic/image and media 
processing capabilities have been added to the mobile devices due to the convergence of 
communication and computing.  
Due to the space and power constraints in a mobile system, mobile central processing 
unit (CPU), graphic processing unit (GPU) and memory subsystems are required to 
increase both bandwidth and energy efficiency significantly as mobile devices such as 
smartphones continue to have enhanced graphic/image processing capabilities and render 
these images much faster and more smoothly at higher resolutions. Current available 
mobile CPU/GPU and memory system technologies increase the bandwidth by adding 
more pins through a point-to-point link. Memory interfaces for high speed DDR system 
have reached 1.6 to 3.2Gbps/pin regime in the generation of DDR3 and DDR4. Moreover, 
the graphics GDDR5 can operate at over 5Gbps/pin with 14.4pJ/bit [1][2].  
The conventional memory interfaces may achieve the target data rate by using parallel 
I/O interconnect. However, these memory interfaces can increase data bandwidth, the 
baseband I/O transceivers suffer from the signal integrity problems such as crosstalk 
between data and clock and large power consumption due to the increased number of 
2 
 
 
 
 
parallel I/O interfaces. The total power consumption is effectively increased by 
increasing the target data rate of each I/O interface. Another issue in the memory 
interfaces is the latency between microprocessor and memory. The latency will be 
increased by increasing the required data sampling clock frequency. Thus, the power and 
latency in I/O interfaces between CPUs and memory are major limiting factors in future 
DDR memory subsystem. 
The primary significance of the successful completion of this research will be the 
development of an energy efficient multilevel dual-band interconnect (MDBI) 
architecture for simultaneous multiple data stream communications by using both multi-
level-baseband and one RF-band on shared common transmission line. The proposed 
MDBI architecture obtains much higher bandwidth and energy-efficient data 
communication between mobile CPU and memory, and provides significantly 
transformative technologies to heterogeneously integrated chip-to-chip communications 
such as a 3-dimentionally (3D) cache/memory controller link, solid state drive read/write 
channels, and heterogeneous mobile IPs (GPU/image processor and bio-sensors) 
interfaces. 
1.1. Research Objective 
The primary objective of this research is to develop the first multilevel dual-band 
interconnect with data reconfigurable capability, which leads to improve dramatically the 
energy efficiency of future mobile devices (which need media-intensive data processing 
and improved battery life). To establish the principles and the feasibility of a new 
technology for multilevel dual-band interface systems, an energy-efficient multilevel 
dual-band interconnect (e.g. amplitude shift-keying (ASK) transceiver (RF band) and a 4-
level PAM transceiver (BB)) is incorporated with self-calibration schemes (e.g. 
impedance calibration circuits). Thus, our rationale for this work is that its successful 
completion would enable to develop the reliable and reconfigurable interface circuits and 
systems architecture by overcoming the current technological challenges and yield 
degradation in deep scaled CMOS technologies. Moreover, the proposed chip prototyping 
3 
 
 
 
 
and testing of the proposed MDBI circuits and systems is potentially transformative and 
provides new opportunities for future mobile system applications. Four specific tasks will 
be addressed in support of the achievement of these objectives: 
 Task1: Design a low-power phase-locked loop. We will analyze and 
design a low power phase-locked loop (PLL), which is required to generate clock 
frequency for a fully synchronous MDBI interface system. 
 
 Task2: Design a RF-band transceiver with superior band-selective 
dual-band filter. We will analyze and design a RF-band transceiver by using an 
amplitude shift-keying (ASK) transceiver as a RF band signal. We will also 
design a dual-band filter which serves as the critical component required to 
achieve superior band pass filtering for less inter-band-interference/sideband 
spurs. 
 
 Task3: Design an energy efficient four level pulse-amplitude 
modulation (4-PAM) memory I/O BB interface transceiver. We will design an 
energy-efficient PAM transceiver by using a current-mode transmitter and a dual 
sampling receiver as a base-band (BB) signal.  
 
 Task4: Prototype a complete energy-efficient MDBI interface system 
by integrating the PAM transceiver and RF band transceiver which its 
required carrier generated by the PLL. We will develop and fabricate the first 
energy-efficient and reconfigurable MDBI interface system for reliable and 
energy efficient communication system. 
1.2. Research Challenges 
The possible challenges and risks related to each task are as follows: 
 
4 
 
 
 
 
 Challenges and risks of Task1, task2, and Task3: the key challenging issue 
is noise spur from a ring type VCO circuit and possible crosstalk coupling 
between RF and BB bands, which may degrade the bit error rate of the multilevel 
dual-band interconnect communication. The risks will be mitigated by 1) 
designing a robust (de)-modulation scheme (e.g. ASK) which is much insensitive 
to noise and crosstalk coupling and 2) to integrating superior band-selective dual-
band filter such as transformer-based impedance matching and band-passing 
network to the proposed MDBI transceiver. 
 
 Challenges and risks of Task4: each design of the transceiver circuits is 
not risky. However, the monolithic integration of a new reconfigurable MDBI 
architecture into a PLL (20GHz) is never attempted before. Thus we plan to 
overcome the challenges and risks by designing and fabricating (1) clocking 
circuit of PLL (2) the PAM transceiver and (3) the RF-band transceiver on silicon 
separately through the first step. Then, these silicon-verified MDBI and clocking 
circuit will be integrated together and demonstrated. 
1.3. Dissertation Organization 
In this dissertation, an energy-efficient transceiver architectures and circuits are 
developed for high-speed communications and interconnects. We will focus on the 
analysis and implementation of the PLL, 4-level PAM transceiver, RF-band transceiver 
and integrating transceivers along with a superior band-selective transformer on a shared 
single-ended transmission line. 
In chapter 2, we will review conventional I/O memory interfaces and discuss their 
design limitations consisting of bandwidth, latency, and power consumption. In addition, 
this chapter presents the critical issues of current future memory interface, which are non-
scalable in both power consumption and latency a memory I/O interface.  
5 
 
 
 
 
In chapter 3, we will discuss the proposed PLL circuit and analyze the loop filter and 
other circuit blocks used in the PLL. In addition, the simulation and measurement results 
are shown and compared with prior-arts.  
In chapter 4, we will focus on the RF-band interconnect (RF-I) transceiver. We will 
first review conventional RF-I transceiver architecture. After that, the design of the 
proposed RF-I transceiver architecture and circuits will be described. We will discuss the 
implementation of the RF-I transceiver circuits, including ASK modulator, self-mixer, 
differential amplifiers, buffer converter, and output driver. Besides, a band-selective 
transformer used in MDBI transceiver is characterized and described in this chapter. The 
simulation results of the circuits will be presented. 
Chapter 5 will focus primarily on the baseband transceiver design. We will first 
review conventional 4-level PAM transceiver architectures. Then, we will design our 
proposed 4-level PAM transceiver and describe the circuits including encoder, current 
mode driver, comparators, and differential amplifiers in details. After that, we will 
present simulation results of the proposed 4-level PAM transceiver.  
Chapter 6 will present the proposed multilevel dual-band interconnect (MDBI) 
architecture that can transceive simultaneously three data streams on a shared single-
ended transmission line. The MDBI system will be demonstrated and simulation and 
measurement results of simultaneous and bi-directional operation for future memory 
interface by using 65nm CMOS process technology is presented. Finally, chapter 7 will 
be the conclusions and future directions. 
 
 
 
 
 
6 
 
 
 
 
 
 
Chapter 2: Overview of RF-Interconnect 
Transceivers and MDBI Architecture 
 
 
 
2.1. Introduction 
One of the critical problems of a conventional memory I/O interface is to satisfy both 
high performance and low power consumption simultaneously. The conventional I/O 
memory interfaces have design limitations consisting low bandwidth, high latency, and 
high power consumption. They do not fulfill the requirements due to its poor scalability 
of performance even though it operates in deep sub-micron state-of-the-art CMOS 
technology. The latency of chip-to-chip communication between shared DRAMs and 
multi-core CPUs is a bottleneck of current memory bus as well as energy efficiency. 
Increasing the number of microprocessor causes the overall latency of Time Domain 
Multiple Access (TDMA) communication protocol increases dramatically since the 
TDMA communication can perform multiple read/write operation sequentially. Another 
main bottleneck of the traditional memory I/O interface architectures is not 
reconfigurable. They support a fixed channel configuration between memory controller 
and shared DRAMs. To solve the above issues of traditional memory I/O interfaces, a 
multilevel dual-band RF-interconnect is proposed that is described in detail in chapter 6.  
 
7 
 
 
 
 
2.2. Related Works 
One of the possibilities is to use RF interconnect based on frequency-division multiple- 
access algorithms (FDMA) [3][4][5][6][7], to implement multi-band reconfigurable on-
chip communications. In prior arts [8][9][10], a single band binary phase-shift keying 
(BPSK) on-chip interconnect link [10], a dual-band FDMA chip-to-chip RF-I [9] and 3-
Dimensional integrated circuit(3D IC) RF-I [8] have been demonstrated. These RF-
interconnects can achieve high data rate (> 4Gbs/pin in CMOS), low bit-error-rate (BER) 
(< 10
-12
), simultaneous and reconfigurable communication between multiple I/O users by 
using multiple frequency bands on shared physical transmission lines. RF-interconnect 
has some advantages including  high bandwidth, low power, superior signal to noise ratio 
(SNR), low overhead-high data rate/wire and low area/Gigabit and low latency  due to 
speed of light data transmission. Transmission of “waves” instead of binary signal is one 
of the main differences of RF-interconnect signaling with baseband signaling. In the 
conventional baseband signaling the entire length of the wire should be charged and 
discharged to transmit either ‘1’ or ‘0’ since voltage signaling is used in transmission line. 
In the RF approach, an “electro-magnetic” (EM) wave is continuously sent across the 
shared channel which is treated as a transmission line at mm-wave frequencies. There are 
some modulation and demodulation techniques using amplitude and/or phase changes. 
Binary-phase shift keying (BPSK) is one of the simple schemes where the binary data 
changes the phase of wave between 0 and 180. The basic concept of RF-I design 
technique and its conceptual diagram of multi-band spectrum is shown in Figure 2.1.  
f(RF)
 f
Signal
Power
Signal
Power
 f
Signal
Power
 f
FRF
Din Dout
TX Mixer RX Mixer RX Buffer
LPF
FRF
 
Figure 2.1: Conceptual diagram of RF interconnect. 
8 
 
 
 
 
After up-converting data to the high frequency modulated carrier wave by using 
amplitude (or phase) modulation, the RF modulated carrier is down-converted to the 
original data by a mixer at a receiver side of RF-interconnect topology. By expanding the 
idea of the single carrier RF-I, the channel bandwidth efficiency can be improved using 
N-channel multi-carrier RF-I. The conceptual multi-band RF-I is shown in Figure 2.2. 
The transmitter side of multi-band RF-I transmit N modulated data streams over the same 
shared transmission line. In fact, N modulation circuits need in the transmitter side. Thus, 
the total aggregate data rate equals to each channel data rate multiplied by a factor of N, 
the number of up-converters. The multi-band RF-I can achieve ultra-high data rate and 
reduces number of parallel memory bus, pin count and silicon area.    
 
TX
R
X
TX
R
X
TX
R
X
TX
R
X
TX
R
X TX
R
X
`
`
`
`
`
`
MRF-I Transceiver MRF-I Transceiver
f(RF)
Multi-band SignalingSignal
Power
 f
f(BB)
DIN 1
DIN 2
DIN N
DOUT 1
DOUT 2
DOUT N
 
Figure 2.2: Multi-band RF interconnect concept. 
A single band on-chip interconnect link using BPSK modulation technique is 
demonstrated in [10]. Its structure consists of a pair of mixers, a 2-cm on-chip 
transmission line and a sense amplifier, as shown in Figure 2.3.  
9 
 
 
 
 
 
Figure 2.3: A single band BPSK on-chip interconnect link. 
The mixer used in the transmitter modulates input data stream to the RF-band carrier 
signal and transmits to the on-chip transmission line. Then, the mixer used in the receiver 
side demodulates RF-band signals to the baseband and fully recovers the transmitted 
signal. The power consumption of the RF-I link is 16.1mW and the data rate is 2Gbps 
with 300ps delay over a 2cm transmission line. However, this structure needs an off-chip 
phase shifter to synchronize the transmitter and the receiver clocks manually. 
Furthermore, this architecture is only able to have a single band modulation with limited 
data rate. Therefore, it requires a scalable multi-band on-chip RF-I circuits.  
In 2005, Ko et el [9] have implemented a dual-band chip-to-chip FDMA interconnect 
transceiver chip. This link has a RF band at 7.36GHz and one baseband. The aggregate 
data rate is 3.6Gbps/pin over a 10-cm FR4 printed circuit board (PCB) with 92mW power 
consumption. Ko transceiver utilizes the binary phase-shift keying modulation technique 
to up-convert input data streams. Then, the modulated RF-band carrier is transmitted to 
the shared off-chip transmission line through an open-drain cascade buffer with a 
capacitive coupling by using the BPSK modulation scheme in RF-band. A cascode low 
noise amplifier (LNA) with inductively peaking LC tank is used in the receiver side. The 
LNA resonates around the RF carrier frequency and acts as a band-selective filter to 
reject the unwanted frequency side-bands. The I/Q mixers then down-convert the 
modulated RF signal original data. The structure used in the baseband transceiver is a 
conventional current-mode driver in the transmitter side while the receiver side is 
implemented together with the RF transceiver. To implement this transceiver, four of 
10 
 
 
 
 
these chips are used to construct a dual band FDMA Interconnect system which RF band 
and baseband transceivers share a single terminated printed circuit board (PCB) 
transmission line. The transceiver enables simultaneous and bi-directional signaling over 
a single shared off-chip interface or a point-to-point link.  
The FDMA-I transceiver chip is fabricated in 0.18um CMOS technology from 1.8V 
supply. It demonstrates multi-drop and simultaneous chip-to-chip communication with 
programmable bidirectional signaling. An aggregate data rate of 3.6Gbps/pin over a 
10cm FR4 PCB line is achieved by two chips implemented on the test board. Then, the 
RF transceiver chip can achieve 1.8Gbps while dissipating the total power of 74mW. The 
transmitter consumes 19mW, the receiver consumes 36mW, and 19mW power is 
consumed by the quadrature voltage controlled oscillator (QVCO) used to generate 
carrier signal. The baseband transceiver achieves 1.8Gbps/pin while dissipating 12mW 
for the transmitter side and 6mW for the receiver side. Figure 2.4 shows the measured 
simultaneous bi-directional signaling with a 2.4Gb/s/pin in baseband and a 0.6Gb/s/pin in 
RF band with BERs of 10
-9
 for both.  
 
Figure 2.4: Measurement results of 3.0Gb/s/pin dual band bi-directional FDMA-I system 0.6Gb/s RF 
and 2.4GB/s baseband signaling (a) Input and recovered RF/baseband data (b) Measured data eye 
diagrams. 
However, Ko et al successfully demonstrated that off-chip FDMA RF-I is feasible, the 
most critical issue of future memory interface is how to minimize the power consumption 
of high speed parallel I/O interface. In other words, many design considerations such as 
11 
 
 
 
 
modulation scheme and architecture should be improved. The BPSK modulation scheme 
in RF-band is a synchronous modulation scheme, so this modulation scheme requires 
both phase and frequency synchronization between the transmitter and receiver. Thus, a 
frequency synchronization scheme such as a Costas loop or re-modulation technique is 
necessary in the receiver side. These techniques need a power hungry and complicated 
frequency locked loop (FLL). The Costas loop which consumes over 20mW of power 
makes difficult to apply to future low power memory I/O interface. The system power 
budget increases with increase of RF bands which will be much worse and become non-
scalable in future. Furthermore, the FLL suffers from complexity of design 
implementation and large Si area which increase the overall system costs and non-
scalable for next generation multi-band system. Since each RF band receiver needs 
individual FLL with specific frequency band, several FLLs occupy large layout area with 
much higher power consumption in future multi-band FDMA-I. 
Another emerging memory interface is stacking memory chips with very short 
through-silicon via (TSV) technology (i.e., 50 – 200 μm [11], [12]) [13][14]. In this 
technology, chips are stacked on each other using vertical interconnect TSVs [15], [16]. 
3D integration is known to have the current technical limitations such as high-
temperature heat cycles (hotspots) due to inefficient 3D heat transfer [17], [18] and 
thermal-mechanical stresses during 3D wafer thinning process and high-yield die-to-die 
joining [15]. However, 3D stacking can provide lower data access latency due to shorter 
3D vertical interconnect and improved signal integrity since the large capacitive loads 
from ESDs of inter-tiers, off-chip package and traces are totally eliminated. In fact, the 
simple transceiver can be implemented in a smaller chip area, reducing both cost and the 
distance of carrier signals are required to travel across the chip due to 3D vertical 
integration. Therefore, both latency and energy consumption are decreased in 3D vertical 
chip-to-chip communication due to reducing distance. However, use of TSVs as shown in 
Figure 2.5 to have a vertical connection between transistors and metal tiers needs a 
relatively large area since alignment of such direct connection is difficult on a large scale. 
12 
 
 
 
 
IO
CPU (Tier 1)
IO
DRAM (Tier 2)
IO
DRAM (Tier 3)
IO
DRAM (Tier 4)
TSV
IO
DRAM (Tier 5)
3D-ICs
(Multidrop)
IO
CPU (Tier 1)
(a) (b)
DRAM (Tier 2)
3D-ICs
(Point to Point)
 
Figure 2.5: (a) Point to point link (b) Multi-drop bus. 
Use of additional RF signaling has an advantage over standard voltage signaling for 
multi-drop communication. The modulated signal in RF signaling is transmitted to the 
receiver side by capacitive and inductive coupling. In 3D RF-I transceivers, there is no 
need of  any direct coupling, so it makes possible simple and smaller Si area resulting in 
reduced system cost and increased performance. A schematic of a 3D RF-I transceiver 
using capacitive coupling is shown in Figure 2.6.  
 
 
Figure 2.6: Schematic of the RF-I implemented in a 3D 0.18um CMOS process. 
The modulated signal with 21GHz carrier frequency is provided by an amplitude shift 
keying (ASK) modulation scheme while the signal is detected in the receiver side by a 
simple envelope detector. Capacitive coupling between tiers is made by top metal layers 
in each tier which capacitive values are tens of femto-farads, so are sufficient for 
13 
 
 
 
 
effective coupling. This 3D RF-I transceiver achieves a maximum data rate of 
11Gbps/pin and a very low bit error rate (BER) of 10
-14
 measured at about 8Gb/s. 
Tam, et.al [19] demonstrated a tri-band on-chip RF-Interconnect composed of two RF 
bands and one baseband. The tri-band RF-I has 2 concurrent RF-bands in frequencies 
30GHz and 50GHz and one traditional baseband. Each RF band utilizes a simple 
structure including a VCO and a ASK modulator in transmitter side and a self-mixer and 
baseband amplifier on the receiver side. Two modulated RF band signals are directly 
interfaced to two transformers, which couple the RF signals to the shared differential 
transmission line. On the receiver side, two transformers with same size of transformers 
in the transmitter side are used as a first block of the receiver side to decouple the RF 
signals. The mixer used in the RF receiver side acts as an envelope detector and 
demodulates the mm-wave frequency ASK signal into baseband signal, where it is further 
amplified to full swing digital signal. The baseband data is transmitted to the receiver 
side through both sets of transformers by always staying on the secondary coil side and 
using the common mode of the differential transmission line. In fact, the transformer 
becomes a short circuit at low frequency and a pair of low-swing capacitive coupling 
buffers transmits and receives the baseband data. Use of the ASK modulation technique 
as a non-coherent scheme removes the need of synchronous clock between the transmitter 
and the receiver. Therefore, there is no clock line that connects the transceivers and also 
does not require any of the power-hungry frequency synthesizers or extra clocking 
circuits like transceivers reported in [8][9]. The maximum data rate for each RF band and 
BB are 4Gbps and 2Gbps respectively. The total aggregate data rate is 10Gbps. The 
measured BER across all channels is < 1x10
-9
. This design can achieve energy per bit of 
1.7mW/Gb/s which is comparable with the BB-only signaling at 12mW/Gb/s and 
9.4mW/Gb/s [20][21]. However, the bit error rate (BER) measured is 10
3
 higher than the 
common performances introduced in prior arts [20][21]. 
A dual-band interconnect (DBI) transceiver which is composed of one RF band 
signaling and a baseband signaling is demonstrated in [22] and [23]. The RF transmitter 
consists of a 23 GHz LC-VCO and a non-coherent ASK modulator. The modulated signal 
is inductively coupled into an off-chip transmission line by way of an on-chip differential 
14 
 
 
 
 
transformer. The baseband transmitter amplifies the incoming data stream using buffers 
and transmits the data through the transformer. In the receiver side, an on-chip frequency-
selective transformer is used to split data streams into the baseband and RF band. The 
band-pass filtered RF-band data stream is then injected to the receiver differential mixer 
with differential input signals and down-converted to the baseband data. The DBI 
transceiver is fabricated in 65nm CMOS technology to demonstrate dual-band 
bidirectional communication on a shared PCB transmission line. This transceiver is able 
to efficiently (<4pJ/b) demonstrate high bandwidth transceivers capable of 8.4Gb/s and 
10Gb/s throughput on FR-4 and Rogers 4003 test boards, as shown in Figure 2.7.  
 
Figure 2.7: Measured eye diagrams of aggregate 8.4 Gb/s (4.6 Gb/s BB and 3.8 Gb/s RF-band) and 10 
Gb/s (5 Gb/s BB and 5 Gb/s RF-band) data rate, respectively, on FR-4 and Rogers 4003 test boards. 
One of the most differences of this transceiver compared with [19] is to communicate 
through off-chip PCB line instead of on-chip line, which is also more difficult to drive 
than an on-chip line. However, the transmission line used in this transceiver is gained 
from a differential line, which needs more space and is not applicable for serial links. 
Kim, et.al [24], they were able to demonstrate DBI communication through a single 
transmission line, which is quite promising especially for serial and backplane 
15 
 
 
 
 
applications since pin count is an especially concerning issue for chip designers. Besides, 
LC-VCO used in the RF transmitter side in [22][23] is replaced by a ring VCO, which 
uses small area compared to LC VCOs. In order to get a good understanding about the 
difference of the differential transmission line and single ended line, we can compare the 
results by looking from the metric of Gb/s/pin. The transceiver [22][23] using the 
differential T-line has an aggregate data rate per pin of 4.2Gb/s/pin while [24] obtains 
8Gb/s/pin. Moreover, although [24] uses power hungry block of the ring VCO at the RF 
transmitter side, energy efficiency per pin is reduced to 4pJ/bit/pin compared with [23] 
which is 5pJ/bit/pin. 
 
2.3. Multilevel Dual-Band Interconnect (MDBI) Architecture 
The energy-efficient MDBI interface system architecture is shown in Figure 2.8. It 
consists of a phase-locked loop (PLL), a bandgap circuit, voltage reference generator, 
digitally controlled impedance calibrator with on-die termination (ODT), and a multilevel 
dual-band signaling interconnect. The microprocessor can calibrate and control 
impedance calibrator, voltage reference generator, and PLL by sending a calibration code. 
The voltage regulator can tune the output voltage of the regulator by a tuning voltage 
calibration code between a microprocessor and the PLL and bandgap reference to tolerate 
severe PVT variations. The impedance calibrator adjusts possible impedance mismatches 
of the MDBI transmitter and receiver by using digitally controlled ODT. The proposed 
MDBI uses a multilevel dual-band signaling for simultaneous multiple data stream 
communications by using both multilevel baseband and RF-band on an off-chip channel. 
16 
 
 
 
 
Bandgap
PLLVREG
Voltage 
Reference 
Generator CAL
CODE
Voltage Regulator
Impedance
Calibrator 
MDBI IO 
(Memory)
MDBI IO 
(CPU)
Reconfigurable MDBI
for faulty link
VREF
Data Redundant
Si
gn
al
 P
o
w
e
r 
[d
B
]
 ff(RF)f(BB)
Multi-level Dual-band 
Signaling (MDBI)
System 
status Inputs
System 
status Outputs
Microprocessor
For cross-layer self-healing
For inter-layer self-calibration
CAL
CODE
 
Figure 2.8: Architecture of proposed multilevel dual-band interconnect (MDBI) interface with 
reliable and simultaneous communication. 
The goal of this MDBI transceiver is not to completely replace these conventional 
baseband wire traces, but rather to provide more flexible and reconfigurable solutions to 
achieve stable communication and significant yield improvement by using simultaneous 
RF-band data stream as a redundant channel with this reconfigurable channel (e.g. 
multilevel BB for original data stream + RF for additional simultaneous data stream on a 
shared transmission line). Current technologies to improve reliability and yield use 
additional off-chip channels for the redundancy (to replace faulty streams), which usually 
limited by IO pad/pin count of processor and memory. The MDBI is also fully 
compatible to current technologies (because baseband is still used for original data 
stream), improves overall system overhead (due to much reduced pin count) and provides 
reliable simultaneous data communication (due to RF-band redundancy and additional 
data streams). The proposed MDBI is never attempted before. The MDBI transceiver 
architecture provides much reliable data communication between heterogeneous 
integrated chips.  
17 
 
 
 
 
 
 
Chapter 3: Phase-Locked Loop (PLL) 
 
 
 
Due to the usually unknown delay skew between different chips and the need of a 
stable clock source, a low-power time-precision clocking circuit, such as PLL, is required 
to synchronize communication between chips. In this work, a fully integrated phase-
locked loop using sub-harmonic injection-locked frequency multiplier for K-band 
frequency applications is proposed. The differential ring voltage-controlled oscillator 
(VCO) used in the PLL generates a frequency range between 0.6~2.4GHz signal. The 
PLL output signal is then multiplied by a tenth-order injection-locked frequency 
multiplier (ILFM). Using the multiply-by-10 ILFM causes the operational frequency of 
the VCO is reduced to only one-tenth of the desired frequency and removes power 
hungry high frequency dividers in the PLL feedback path to save power consumption. 
The PLL demonstrates a measured tuning range of 18.7-21.2GHz and a phase noise of -
91.35dBc/Hz at 1MHz at a center frequency of 20GHz. The power consumption of the 
PLL is 17.2mW. The PLL is implemented in a standard 0.18μm CMOS process with the 
total area of 0.335mm
2
. 
3.1 Introduction 
Phase-locked loops (PLLs) are used in wide applications such as wireless systems, 
communication systems, digital circuits, and power systems. In the design of a radio-
18 
 
 
 
 
frequency (RF) transceiver, one of the most critical blocks is the PLL which is usually a 
power hungry block. Thus, reducing the power consumption of the PLL can dramatically 
decrease the power consumption of transceivers. The most power-hungry components of 
the PLL are voltage-controlled oscillator (VCO) and frequency dividers [25-29]. The 
VCOs operate at the highest frequency in the conventional PLLs [27]. The higher 
operating frequency of VCOs results in the higher power consumption of PLLs. The 
PLLs which employ the highest frequency VCOs also need the highest frequency 
dividers which consume a significant fraction of the total power budget of the PLLs.  
The most popular frequency dividers which operate at high frequency are current-
mode logic (CML) static dividers [30], Miller frequency dividers [31], and injection-
locked frequency dividers (ILFDs) [32]. However, CML dividers and Miller dividers 
obtain a wide frequency range and operate at high frequency, but usually consume 
significantly higher power (e.g. 307mW [26]). An ILFD can operate at high frequency 
while consuming less power consumption than CML dividers and Miller dividers [33]. 
However, the ILFDs [32-34] suffer from narrow locking range and a limited division 
ratio which is usually three. 
To have a high frequency PLL, a multistage of these dividers should be used in the 
PLL loop due to the limited frequency range of static digital frequency dividers. The 
recent PLLs [35-38] utilize frequency multipliers to generate the high frequency signals 
and further reduce power consumption. However, they suffer from the limited 
multiplication ratio and a multi-stage frequency divider operating at the highest 
frequency is still required in the PLL feedback loop. 
To overcome aforementioned problems, we propose an energy-efficient K-band PLL 
along with a novel multiply-by-10 injection-locked frequency multiplier (ILFM). We 
eliminate conventional multistage power hungry frequency dividers at the PLL feedback 
path and employ the ILFM to generate the desired frequency. Therefore, the overall 
power consumption of the PLL is significantly reduced by using a low power and high 
multiplication ratio frequency multiplier. 
 
19 
 
 
 
 
3.2 PLL Structure 
The block diagram of the proposed PLL cascaded with a multiply-by-10 ILFM is 
shown in Figure 3.1. It includes a phase frequency detector (PFD), a charge pump (CP), a 
loop filter (LF), a differential ring VCO, a static frequency divider, and a multiply-by-10 
ILFM. The PLL output is ten times higher in frequency than the VCO output which is 
multiplied by novel ILFM. 
VDD
UP
DN
Charge
Pump
R
C2C1
Loop Filter
VCO
Reference 
clock
PLL
Phase & Freq. 
Detector
PFD
Divider ÷32
foutfin
fvco
ILFM
10
 
Figure 3.1: Block diagram of the proposed 20GHz PLL. 
3.2.1 PFD, charge pump, and loop filter 
A standard phase frequency detector (PFD) used in the PLL consists of two latches 
and a delayed reset path for reducing the dead-zone problem [37]. The PFD compares the 
reference signal with the feedback signal and generates an error signal that is proportional 
to the phase difference between the input and feedback signal. The error signal is then fed 
in a CP. In the CP, the error signal is amplified and filtered by using the CP and the low 
pass filter to adjust the VCO frequency to diminish this error signal by tuning its 
frequency. 
The simplified current-switched CP and passive loop filter are shown in Figure 3.2. 
The CP amplifies the error signal and the current-mode switches (SW1 and SW2 in the 
CP) charge and discharge the loop filter in the PLL in response to a static phase error of 
the PFD. A CP circuit should have low power consumption, good current matching and 
20 
 
 
 
 
fast switching speed. One of the dominant sources of nonlinearity in a PLL system is 
made by the current mismatch of the CP [39]. In the conventional CP structures, NMOS 
and PMOS switches are used in source, gate, or drain of the current mirror to reduce 
current mismatch [40]. However, the use of these switches is led to increase the clock 
feed-through when the switches turn on and off. On-resistance between up/down switches 
at peak current will be different due to the clock feed-through. Therefore, the output 
current matching between charging and discharging operation is degraded [41]. Thus, the 
conventional PMOS and NMOS switches are replaced by transmission gate switches with 
linear on-resistance and maximum dynamic range in the CP circuit to decrease the clock 
feed-through. The clock feed-through effects on transmission gate switches and a more 
precise charge-injection modeling can be found in [42]. Moreover, this design is removed 
charge injection problem because the switches do not affect the control voltage. 
The simulated charge and discharge current is 0.75mA in this CP design. The loop 
filter circuit is fabricated by the on-chip passive elements. The capacitors C1 and C2 are 
implemented by the on-chip passive capacitors such as metal-insulator-metal capacitors 
(MIMCAP). The on-chip passive resistor, R1 used in the loop filter is poly-resistor.  
Figure 3.3 shows the transient simulation results for the CP. The control voltage ripple 
is 1.6mV. The ripple can be reduced more by a large resistor in the loop filter; however it 
increases thermal noise and chip area. The PLL loop bandwidth is almost 550KHz to 
filter out the non-ideal effect of the reference spurs. This bandwidth can be increased if 
the current-matched techniques are used in the design of the CP. However, these 
techniques increase the power consumption. The total power consumption of charge 
pump and low pass filter is 3.6mW. 
21 
 
 
 
 
C1
C2
Vctrl
M6
M7
M10
M8
M9
Up Up
Dn Dn
M11
M12 M13
Iref
R1
VDD
M1
M3
M2
M4
M5
SW1
SW2
 
Figure 3.2: Simplified charge pump circuit. 
 
Figure 3.3: Ripple of control voltage. 
3.2.2 Ring VCO 
The VCO is a key building block that dominates the performance of the PLL. Two 
common topologies such as a ring-based VCO and a LC VCO are used to design VCOs. 
22 
 
 
 
 
LC-VCOs resonate at the resonance frequency of tank circuit and a cross-coupled 
transistor to generate a negative resistance and to cancel out parasitic resistive losses of 
capacitors and inductors in the tank circuit. The ring VCOs suffer from poor phase noise 
performance and a low operating frequency when compared to LC VCOs [43]. However, 
the LC VCOs take up a larger area than ring VCOs due to the large area occupied by the 
passive components. The size of the passive components is also increased by reducing 
oscillation frequency. Moreover, LC VCOs show a relatively narrow tuning range, which 
further decreases with the supply voltage, in contrast to ring VCOs can operate at a wide 
frequency range. Therefore, the inferior phase noise of ring VCOs is a major problem 
compared to LC VCOs. Increasing the output swing voltage and minimizing the amount 
of noise injected during output voltage transitions improve the phase noise of the ring 
oscillators [44]. 
 
Figure 3.4: (a) Four stage differential ring-based VCO circuit (b) Delay cell. 
 
23 
 
 
 
 
Figure 3.4(a) shows the voltage-controlled oscillator (VCO) circuit composing of four 
stages of differential CMOS inverters. The delay cell used in the VCO circuit is a 
modified delay cell introduced in [45]. The inputs of the delay cell circuit are the NMOS 
transistors (M1, M7) and the PMOS transistors (M2, M8), as shown in Figure 3.4(b). The 
NMOS cross-coupled pair (M3, M5) and the PMOS cross-coupled pair (M4, M6) are 
adopted to restore the logic levels at the output of the inverters and speed-up the 
switching by varying the control current. The O1 and O2 represent the differential output 
voltage of the delay cell. The four-stage differential ring VCO provides eight different 
phases (0
o
, 45
o
, 90
o
, 135
o
, 180
o
, 225
o
, 270
o
, 315
o
). The phase difference between the input 
(N0, N1) and output (O1, O2) is 225
o
, due to the oscillation conditions of the ring VCO. 
Also, the inputs (P0, P1) come 45
o
 earlier in phase than inputs (N0, N1). The cross-
coupled pairs used in the delay cells are designed with large widths to minimize the 
overdrive voltage. Thus, the wider output swing is generated for minimizing the intrinsic 
phase noise. A tail current source MOS transistor, commonly used in a conventional ring 
VCO, is avoided to reduce its flicker noise up-conversion [46]. The pseudo-differential 
ring VCO offers a larger output swing that generates more signal power. Thus, the phase 
noise is reduced since it can be approximated by side-band noise power divided by signal 
power. The oscillation frequency of the ring-VCO is given by, 
     
 
   
 
     
       
                                         (3.1) 
where N is the number of the delay stages, τ is the delay time in one stage, Vdd is the 
supply voltage, Ictrl is the control current, and CL is the load capacitance. The VCO 
frequency can be controlled through the control current injected by voltage to current 
converter. The slope of frequency versus control signal curve at the oscillation frequency 
is called voltage-to-frequency conversion gain, KVCO,  
     
     
      
                                         (3.2) 
Phase is the integral of frequency, so the output phase of the oscillator is equal to, 
24 
 
 
 
 
     ∫                                                      (3.3) 
In the frequency domain (s-domain), the VCO is modeled as, 
    
     
    
    
 
                                         (3.4) 
In ideal case, KVCO should be relatively constant over a large frequency range for the 
linear analysis. The phase noise of the ring VCO is simulated at different PVT corners as 
shown in Figure 3.5. The simulated phase noise is -115.8dBc/Hz at 1MHz for slow-slow 
(SS) corner at 85
o
C and the center frequency of 2GHz. 
 
 
Figure 3.5: Simulated phase noise of the ring VCO with different process corners. 
 
3.2.3 Frequency divider design 
The CMOS frequency divider (FD) consists of three-stage digital static flip-flop 
circuit. Each stage is composed of two master-slave flip-flop with a multiplexer, as 
shown in Figure 3.6. The multiplexer is used to change the FD division ratio from 4 to 2. 
The FD will divide by 4 when the division ratio (DR) control signal is high. Thus, the FD 
25 
 
 
 
 
output frequency is controllable externally and the FD can divide the ring VCO output 
frequency by 8, 16, and 32. The simulated total power consumption of the 1/32 frequency 
divider is 270µW. 
 
CK
CK
CK
A
B
Q
Q
A
B
A
B
A
B
A
B
Q
Q
Q
Q
Q
Q QQ
CK CK CK CK CK CK CK CK
CK
CK
DR
C
C
D
E
F
E
DR
DR
DR
D
F
DR
DR
DR
X
Y
X Y
CK
DR DR
 
Figure 3.6: Simplified digital static frequency divider circuit. 
 
3.2.4 Loop characteristics 
The dynamic behavior of the PLL described in this section. The PLL structure has two 
integrators. The combination of the CP and C1 is an integrator that generates the average 
of up and down pulses. Another integrator is made by the VCO. The loop gain of the CP 
also has two poles at origin, so the closed loop system is unstable. To stabilize the 
system, a resistor R is added in series with C1 to introduce a zero,  
   
 
   
                                         (3.5) 
A linear continuous time approximation is used to model the stability of an operating 
point. The rate that PFD needs to be refreshed is determined by the reference frequency. 
26 
 
 
 
 
Regarding to this linear approximation, Vctrl is equal to, 
     
  
    
   
  
                                             (3.6) 
Where F(s) is the transfer function of the loop filter. By ignoring C2 in Figure 3.1, the 
F(s) is given by, 
     
      
   
                                       (3.7) 
The open-loop transfer function is provided by, 
             
    
 
                                        (3.8) 
       
   
  
                                        (3.9) 
Where KPFD is PFD gain, F(s) is the loop filter transfer function, ICP is the charge 
pump current, and KVCO is the conversion gain of the VCO. Replacing F(s) in (3.8) and 
ignoring C2 in the loop filter, the open transfer function is given by, 
         (
      
  
)
    
  
                                        (3.10) 
As seen in (3.10), the transfer function has two poles at origin and one zero which is 
used to stabilize the closed loop PLL system. By considering C2 in the loop filter, one 
extra pole is added to the open-loop transfer function which is equal to, 
         (
      
     
)
    
  (  
 
   
)
                                        (3.11) 
Thus, the third pole is, 
    
 
         
                                        (3.12) 
27 
 
 
 
 
M
ag
n
it
u
d
e 
(d
B
)
P
h
as
e 
(d
eg
re
e)
-180o
-90o
Gain Margin
0dB
ωz
ωc ωp3 ω
ω
40dB/dec
20dB/dec
40dB/dec
Phase Margin
Φ
 
Figure 3.7: Bode plot of the open-loop transfer function. 
The bode plot of the open loop transfer function is shown in Figure 3.7. ωc is the open-
loop unity gain frequency. The closed-loop transfer function of a the PLL from input 
phase and output phase is, 
       
    
   
    
      
             
                                        (3.13) 
Replacing the open-loop transfer function (3.10) in (3.13), 
       
    
   
    
          
                  
                                        (3.14) 
Where KL is the loop gain, 
28 
 
 
 
 
   
           
    
                                        (3.15) 
As can be seen in (3.14), the transfer function shows a low pass filter. Thus it helps 
reject input noise frequencies higher than the PLL bandwidth. Moreover, if the closed 
loop transfer function from the control signal to the output phase is calculated, we have, 
    
     
    
     
                  
                                        (3.16) 
This transfer function is a band-pass filter, which can reject internal noise coupled into 
Vctrl. Since KPFD, KVCO and C1 are typically design constant parameters, the gain loop, 
KL, is determined by the CP current. On the other hand, natural frequency, ωn, is 
proportional to square-root of the loop gain. The natural frequency and damping factor 
are, 
   √
  
 
                                        (3.17) 
  
  
   
                                        (3.18) 
As seen in (3.17) and (3.18), the natural frequency and the damping factor are 
adjustable by changing the CP current and the zero frequency. Therefore, the loop 
bandwidth and peaking of the PLL can be adjusted by changing the CP current and the 
zero frequency. 
By considering third pole and replacing (3.11) in (3.13), the closed-loop transfer 
function of third-order PLL is given by, 
       
    
   
    
             
                                
                      (3.19) 
In order to calculate loop filter components, we should get open loop gain, 
              (
      
     
)
    
  (  
 
   
)
                                             (3.20) 
29 
 
 
 
 
By applying s=jω, 
                                  (
 
 
)   
               
          
 
(  
  
   
)
  
      
          
  
  
  
  
  
   
                                  
       (
 
 
)   
      
     
  
  
  
  
  
   
 
  
   
                           (3.21) 
The phase margin of a third-order PLL is given by, 
                           (
 
  
)      (
 
   
)                      (3.22) 
By setting the derivative of the phase margin equal to zero,  
  
  
 
 
  
  (
  
  
)
  
 
   
  (
  
   
)
                         (3.23) 
From (3.23), the open-loop unity gain frequency (loop bandwidth), ωc, is  
   √                                              (3.24) 
As seen from (3.24), unity gain frequency of open loop response is located at the point 
of minimum phase shift. Thus, this condition ensures loop stability as shown in Figure 
3.7. Moreover, when the magnitude of the open loop gain equals one, the phase margin 
should be maximum to insure loop stability, so we have, 
       (
 
 
)   
      
     
  
  
  
  
  
   
 
  
   
                             (3.25) 
 
   
      
   
  ‖
  
   
  
  
   
   
‖  
  
   
                                   (3.26) 
30 
 
 
 
 
In order to calculate loop filter components (R, C1, and C2), we should determine zero 
frequency and third pole frequency. Therefore, if the loop bandwidth and the phase 
margin are specified, we can calculate all the loop filter components [47], 
     
  
           
                                   (3.27) 
   
  
 
   
                                   (3.28) 
Thus, the values of the loop filter components are provided by, 
   
      
   
  
  
   
 √
  (
  
  
)
 
  (
  
   
)
                                    (3.29) 
     (
   
  
  )                                   (3.30) 
  
 
    
                                   (3.31) 
 
3.2.5 Sub-harmonic frequency multiplier design 
The proposed multiply-by-10 injection-locked frequency multiplier circuit is shown in 
Figure 3.8. It consists of cascading a second-order ILFM and a fifth-order ILFM. The 
multiply-by-2 ILFM consists of a differential three-stage ring oscillator with two NMOS 
harmonic modulator (M3n and M4n) inserted between two output nodes of three delay 
cells. Each delay cell of the ring oscillator, as shown in Figure 3.8, incorporates a fully 
differential design with cross coupled PMOS transistors connected to the outputs of the 
NMOS input transistors. The benefits of this configuration are rail-to-rail output voltage 
swing and a wide frequency tuning range. 
 
31 
 
 
 
 
VOUTPVOUTN
VDDVDD Bu
ffe
r
B
u
ffe
r M8M7
4f1 & 6f1
f2 (ILFM
)=5
f1
M1 M2
M3 M4
MixerMixer
Non-linear devices (M3,M4) 
generate even and odd 
harmonic signals
VBIAS VBIAS
L1L1
M5 M6
VDD
VDD
VDD
VDD
INNINP
f(VCO) M1n M2n
M2pM1p
M1n M2n
M2pM1p
M1n M2n
M2pM1p
Delay cell used in 
multiply-by-2 ILFM
M3n M4n
Harmonic 
modulator
L2 L2
K K
f(VCO)
f1 =f1 (ILFM
)=2
f(V
C
O
)
f1 =f1 (ILFM
)=2
f(V
C
O
)
f2 (ILFM
)=5
f1
 
Figure 3.8: The proposed multiply-by-10 injection-locked frequency multiplier. 
32 
 
 
 
 
The ILFM output frequency (f1(ILFM)) is locked to the second harmonic of the ring 
VCO frequency due to the nonlinear effect of the NMOS modulators [48], when the 
ILFM (i.e., f1(ILFM)=4GHz) oscillates at two times of the input frequency (i.e., f 
(VCO)= 2GHz). However, a chain of the second-order ILFM might increase the power 
consumption due to large internal signal swing. Therefore, using frequency multipliers 
with high multiplication ratio is necessary to further reduce overall power consumption 
[49-51]. However, there are still key design trade-offs to improve limited frequency 
tuning range and lower current injection level. 
To overcome aforementioned problems, we proposed a novel sub-harmonic fifth-order 
injection-locked frequency multiplier, as shown in Figure 3.8. The proposed ILFM is 
realized by bandpass filters and two injection NMOS transistors (M3, M4) acting as sub-
harmonic mixers, and two voltage buffers (M7, M8). Two NMOS cross-coupled pairs 
(M1, M2 M5, and M6) along with a compact transformer and parasitic transistors make 
two bandpass filters. In order to calculate the self-oscillation frequency of the ILFM, T-
model equivalent circuit can be applied to the transformer [52]. The self-oscillation 
frequency of the ILFM tank circuit is provided by: 
         
 
  
√  
         ⁄   √           ⁄             ⁄         
           
      (3.32) 
where K is the coupling factor of the transformer, C1 and C2 are parasitic capacitances, 
and L1 and L2 are inductances of primary and secondary coils of the transformer. As is 
seen from (3.32), K is a key parameter at the ILFM oscillation frequency and the required 
self-oscillation can be achievable with the proper selection of inductors’ size and the 
coupling factor of the transformer. The multiply-by-5 ILFM (f2(ILFM)) oscillates five 
times the multiply-by-2 ILFM output frequency (f1). The frequency components of 4f1 
and 6f1 are yielded at the source of M3 as a result of mixing the second-order ILFM 
generated frequency (e.g. f1) at the gate of M3 with the self-oscillation frequency of the 
LC tank ILFM (e.g. 5f1) from the drain of M3. These frequency components of 4f1 and 
6f1 mix with the other input frequency of f1 at M4 to generate 3f1, 5f1, and 7f1. The band-
pass filters constructed by the LC tank suppress the frequency components of 3f1 and 7f1. 
33 
 
 
 
 
Thus, the multiply-by-5 ILFM is locked at five times the input frequency (f1).  
The locking range of the injection-locked frequency tripler is expressed in [53]. This 
model can be extended for the multiply-by-5 ILFM by applying an injection signal, Vin 
with input frequency f1 to the gate of M3 in Figure 3.8. The fifth-order ILFM locking 
range is provided by, 
       
  
 
 
  
√
(     
 )
 
          (     
 )
  
 
  
|
     
 
      
|                         (3.33) 
where fr is the resonant frequency, quality factor (Q) is the tank quality factor, Vout is 
the output amplitude, and the coefficient a5 is the nonlinear property of the fifth-order 
harmonic frequency generator. A decreasing LC tank quality factor and an increasing the 
coefficient a5 increase the locking range. On the other hand, the Q factor has a direct 
effect on the phase noise performance of an oscillator with respect to the following 
equation introduced in [54]: 
 {  }       [
    
  
(  
 
    
)
 
]                               (3.34) 
where Δf is the frequency offset, L{Δf} is the phase noise measured at Δf, Q is the tank 
Q factor, F is the excess noise factor, f is the resonance frequency, and Ps is the 
oscillation signal power. From (3.34), the phase noise is inversely proportional to the 
square of Q tank. Thus, the decrease of the Q factor results in degrading the phase noise. 
Therefore, the coefficient a5 should be maximized to increase the locking range of the 
fifth-order ILFM. The differential injection transistors are biased in the sub-threshold 
region to increase the nonlinearity property of the subharmonic mixers and growth fifth-
order frequency component. The phase noise of second-order and fifth-order ILFMs is 
simulated at different process corners in Figure 3.9. The simulated phase noise is -
112.46dBc/Hz and -93.83dBc/Hz at 1MHz for slow-slow (SS) corner at 85
o
C and the 
center frequencies of 4GHz and 20GHz. The simulated sensitivity of the fifth-order 
ILFM is shown in Figure 3.10. The locking range will be increased as the input power 
34 
 
 
 
 
level increases. The maximum locking range of 3.6GHz is achieved in simulation from 
18.5 to 22.1GHz with an input power of 0dBm. The overall power consumption of the 
ILFMs is 5.8mW in the simulation. 
 
Figure 3.9: Simulated phase noise of the multiply-by-2 and 5 ILFM with different process corners. 
 
Figure 3.10: Simulated input sensitivity of the ILFM. 
35 
 
 
 
 
3.3 Experimental Results 
The proposed PLL is designed and fabricated in a 0.18µm CMOS technology. Figure 
3.11 shows the die microphotograph of the chip with an area of 0.61×0.55mm
2
 including 
the measurement of the pads. The most part of chip area is dedicated to the loop filter 
(LF) that is almost (30%). The total power consumption of the PLL is 17.2mW at 1.5V 
supply voltage. The PLL was tested on an FR4-PC board, as shown in Figure 3.12, using 
Agilent E4440A spectrum analyzer and Agilent 86100D wide-band oscilloscope. The 
input reference frequency fed in the PFD can be changed from 32MHz to 250MHz since 
static frequency divider is a controllable block which can be externally controlled. 
However, the measurement results are provided with the frequency range from 58MHz to 
67MHz and the input power of 0dBm.  
 
 
Figure 3.11: Chip microphotograph. 
 
 
36 
 
 
 
 
 
Figure 3.12: PLL FR4 PCB. 
 
Figure 3.13 shows the PLL output spectrum at a center frequency of 2GHz with a 
power level -1.69dBm. The ring VCO is measured in the separated chip due to the heavy 
loading effects such as off-chip PCB routing, wire-bonds, connectors, cables when the 
PLL test is performed. Figure 3.14 shows the frequency tuning range and phase noise of 
the differential ring VCO. The oscillation frequency increases as the control voltage rises. 
However, the increase of the oscillation frequency results in degrading the phase noise. 
 
37 
 
 
 
 
 
Figure 3.13: Oscillation frequency of PLL after the ring VCO. 
 
Figure 3.14: Measured frequency tuning range and phase noise of the VCO. 
 
38 
 
 
 
 
The oscillation frequency is tunable and can be tuned from 0.6GHz to 2.4GHz when 
the control voltage is changed from 0 to 1.5V. The measured output power is varying 
from -2.84dBm to 1.67dBm. The VCO achieves the lowest phase noise of -116.14dBc/Hz 
at 1MHz frequency offset. The worst phase noise of -109.08dBc/Hz happens at the tuning 
voltage 1.5V and the oscillation frequency of 2.4GHz. The total power consumption of 
ring VCO and buffers is 6.28mW. The FM multiplies the ring VCO frequency output by 
2 and then the signal fed in the ILFM is multiplied by 5. Figure 3.15 shows the power 
spectrum of the locked ILFM output at a center frequency of 20GHz which provides the 
output power of -20.73dBm. 
 
 
Figure 3.15: Oscillation frequency of the PLL after the ILFM. 
 
The phase noise response of the ILFM with respect to the center frequency is -
91.35dBc/Hz at 1MHz frequency offset, as shown in Figure 3.16. The measured total 
39 
 
 
 
 
output frequency range of the fabricated PLL is from 18.7GHz to 21.2GHz. The output 
power at 18.7GHz is about 5dBm lower than that at 21.2GHz. Figure 3.17 shows the 
locking clock waveform at the reference frequency of 62.5MHz. The measured jitter 
histogram is shown in Figure 3.18. The peak-to-peak jitter and RMS jitter are 11.73ps 
and 1.74ps, respectively at 20GHz. Table 3.1 summarizes the performance of the PLL 
and presents the results from previously published K-band PLLs. It can be seen that the 
PLL operates with a lower power consumption and better phase noise at 1MHz offset. 
The total power consumption of the PLL is less compared with prior arts, since there is 
not any the high frequency divider in the PLL loop. Moreover, the output frequency 
range can be increased if the ILFM is designed with the higher locking range. However, 
it is still comparable with previous works [25-27].  
 
 
Figure 3.16: Phase noise of the PLL after the ILFM. 
 
40 
 
 
 
 
 
Figure 3.17: PLL lock waveform. 
 
 
Figure 3.18: PLL jitter measurement. 
41 
 
 
 
 
 
Table 3.1: Performance comparison of the PLL. 
Ref. Technology 
Output 
Freq. 
(GHz) 
Reference 
Freq. 
(MHz) 
Supply 
Voltage 
(V) 
Power 
(mW) 
VCO and 
FD Power 
(mW) 
Phase 
Noise 
(dBc/Hz) 
Chip 
Area 
(mm2) 
FD Type 
[25] 
CMOS  
0.13µm 
20.05-21 78 1.5 22.5 18 
-112.1 
@5MHz 
0.6 Prescaler 
[26] 
CMOS 
 0.13µm 
17.6-19.4 625 1.5 480 475 
-113.5 
@10MHz 
1.7 CML 
[27] 
BiCMOS 
0.25µm 
20.4-27.6 25 2.5/3/5 680 657 
-105 
@1MHz 
4.8 Programmable 
[29] 
BiCMOS 
0.13µm 
20.51-
21.27 
80.1-83.1 1.5 40 27 
-97.17 
@1MHz 
0.48 CML &TSPC 
This 
work 
CMOS 
 0.18µm 
18.7-21.2 58-67 1.5 17.2 6.46 
-91.35 
@1MHz 
0.33 CMOS FD 
 
3.4 Conclusion 
We have presented a fully integrated PLL integrated with a multiply-by-10 ILFM for 
K-band applications. The PLL is fabricated in a 0.18µm CMOS technology while 
significantly reducing the PLL power (17.2mW) and the layout area (0.33 mm
2
). The 
proposed PLL with multiply-by-10 ILFM dramatically reduces the total PLL power 
because the CMOS frequency dividers (i.e., standard CML dividers which operate at the 
highest frequency) are eliminated. Compared with prior arts, the proposed dual-
FM/ILFM cascaded PLL with higher divide-by-ten function exhibits higher PLL 
frequency, better power efficiency, and compact layout area.  
  
42 
 
 
 
 
 
 
Chapter 4: RF-Band Transceiver 
 
 
 
The most critical key factor to design future advanced memory interface is high 
bandwidth, while maintaining the low power consumption. In this chapter, a RF-band 
transceiver is implemented to employ in multilevel dual-band interconnects (MDBI). 
Therefore, it is very important to determine the type of system architecture and 
modulation scheme for RF-band transceiver in terms of target system budgets. 
Particularly for lower power consumption, the implementation of low energy per-bit RF-
band transceiver is highly dependent upon the modulation and demodulation scheme of 
RF-interconnect (RF-I) system. The transmitter uses a ASK modulator to send data 
streams through a transmission line. The clock frequency of the transmitter is generated 
by the 20GHz PLL implemented in the chapter 3. The receiver side employs a differential 
self-mutual mixer to demodulate the data. Then, the differential amplifiers boost the 
incoming signal. The transceiver is simulated in 180nm CMOS technology. The results 
show the data bandwidth of 3Gb/s/pin while the RF-band transceiver consumes 12mW. 
4.1 Introduction 
There is a bottleneck of how much data can send in a band-limited channel with using 
base-band signaling like 4-level PAM since supply voltage is reduced due to scaling 
down of CMOS feature sizes. In fact, with reducing supply voltage the eye opening is 
43 
 
 
 
 
also reduced. For instance, the supply voltage is only 1.1V in 90nm CMOS SOI 
technology in [20] allowing for the most ideal eye opening to be 367mV for each level. 
However, the measured eye opening came out to be only 95mV differentially, which 
100mV is likely the recovery threshold for a signal. Therefore, integrating a RF 
interconnect with a BB transceiver increases the channel bandwidth. In other words, 
integrating of a data modulated RF streams to a base band data streams increases the 
number of input data streams. Moreover, the use of RF-band interconnect is mitigated the 
inter-symbol interference (ISI) which is one of the dominant issues at the BB 
transceivers. However, the carrier frequency of the RF signal may be attenuated by the 
channel in the high frequency region. The sources of these channel attenuations are the 
skin effect and dielectric absorption.  
Binary phase shift keying (BPSK) modulation and demodulation schemes are utilized 
for design simplicity in a conventional prototype Frequency-Division-Multiple-Access 
interconnect (FDMA-I) system. A basic BPSK FDMA RF-Interconnect architecture is 
shown in Figure 4.1 [9]. Its transmitter consists of a TX BPSK mixer and a direct-
coupled cascade output driver, while the receiver side has off-chip transmission lines and 
a RX mixer and a Costas demodulation scheme.  
 
Transmission 
Line (T-Line) 
Frequency 
Synthesizer
Din Dout
TX Mixer RX Mixer RX Buffer
 
Figure 4.1: System architecture of BPSK RF-band interconnect. 
 
The TX mixer first up-converts an input data stream to the high frequency modulated 
RF carrier, and capacitively coupled to the off-chip transmission line. At the receiver 
side, the RX mixer down-converts the incoming modulated RF carrier to an original 
44 
 
 
 
 
baseband signal and then RX baseband buffer amplifies the recovered baseband signal to 
CMOS level output. Phase and frequency synchronization is very important in BPSK 
scheme. Thus, the mismatch of phase and frequency between the transmitter input signal 
and the receiver local oscillator (LO) input signal is the most critical issue of this 
conventional BSPK modulation scheme. This issue is resulting in direct performance 
degradation of the overall system. This mismatch between a transmitter and receiver can 
be solved by using the frequency and phase synchronization schemes such as frequency 
locked loop (FLL) and phase locked loop (PLL). The frequency synthesizer directly 
sends the RF carrier to a separated transmission line such that the delay between the 
receiver signal and carrier are matched. However, this power-hungry and complicated 
time-precision and synchronization schemes require huge power consumption and a large 
Si area, resulting in increased system costs and non-scalable system architecture of future 
memory I/O interface. The direct phase frequency locking technique such as Costas loop 
in RF-Interconnect [9] can be used, as well as source synchronization technique. 
However, this BPSK demodulation scheme of Costas loop suffers from poor frequency 
acquisition problems to synchronize the frequency and phase at the same time, resulting 
in inappropriate and degraded system architecture for future memory I/O interface. 
Moreover, in both techniques, synchronization circuits suffer from large silicon area, high 
power consumption and non-scalable complex system architecture, which is not suitable 
for future RF-Interconnects. 
4.2 RF-Band Transmitter 
RF-band Transmitter is a key block of any communication which includes sending 
required data to the receiver through a channel. The transmitter has to meet certain design 
requirements, in this case die size, efficiency of power use and voltage headroom. Basic 
transmitter design includes a clock source (i.e. VCO or PLL) which generates the 
required frequency signal to carry the data, a modulator which modulates the data with 
the carrier signal to send the signal through the channel. 
The RF transmitter architecture consists of the three blocks: phase-locked loop (PLL), 
45 
 
 
 
 
input buffer and ASK modulator. The PLL which is generated a 20GHz clock frequency 
is described in the chapter 3. The PLL’s output is the carrier signal that will be modulated 
by the input buffer’s output data. The ASK modulator accepts input from the PLL and the 
input buffer. In a sense, allows the carrier frequency to feed-through when the input 
buffer signal is high. This is the final output that is sent to the transmission line. This type 
of modulation is sometimes referred to as on-off shift keying. 
Implementing ASK modulation in a CMOS configuration is a relatively straight-
forward design. In our mind, we can visualize ASK modulation as a carrier signal (clock 
signal generated by the 20GHz PLL) being able to pass to the output if our data is high, 
but should not be able to if the data is low. Figure 4.2 shows the Ask modulation 
configuration. The data stream outputted by the input buffer modulates the carrier by 
switching on/off the current flow through transmission gates M3-M4 and M5-M6 to 
complete the ASK modulation. A transmission gate M7-M8 is also employed in the ASK 
modulator to make a fast shut off at the output [55]. Using this switch reduces the 
amplitude of non-modulated signals. The inductor used in the ASK modulator has a 
center tap which is connected to the power supply to be given to the modulator. 
VDD
IN_NIN_PIN_NIN_P
IN_N
IN_P
CLK_NCLK_P
Equalizer
M3 M4 M5 M6
M7
M8
M1 M2
 
Figure 4.2: ASK modulator circuit used in RF-band transceiver. 
 
46 
 
 
 
 
Figure 4.3 shows the simplified RF band ASK RF-interconnect transceiver. The 20GHz 
clock signal generated by the PLL is fed in the ASK modulator. After that, the 
differential ASK output signal is coupled to the off-chip transmission line via the on-chip 
band-selective transformer. Due to this on-chip transformer’s inherent impedance 
transformation, the impedance matching complexity is significantly relaxed. By sizing 
and determining the turn-ratio of an on-chip transformer, the reflected wave from the off-
chip transmission line can be reduced greatly. 
VDD
UP
DN
Charge
Pump
R
C2C1
Loop Filter
VCO
Reference 
clock
PLL
Phase & Freq. 
Detector
PFD
Static frequency Divider 
÷32
62.5MHz
2GHz
ILFM
20GHz
10
Input_buffer
V
D
D
IN_N
IN_P
IN_N
IN_P
IN
_
N
IN
_
P
Input Data
IN_N
IN_P
Off-Chip 
Channel
ASK modulated signal
Transformer
1  1    0 1 0
1  1    0 1 0
D1(RF) IN
ASK Modulator
V
te
rm
 
Figure 4.3: Simplified schematic of RF-band transmitter. 
 
In order to minimize the dispersion of the RF carrier signal, the RF carrier to target data 
47 
 
 
 
 
rate ratio is also extremely critical by considering the reasonable signal loss of the off-
chip transmission at the same time. By this careful consideration of impedance matching 
and reduced dispersion of RF carrier, the power hungry pre-emphasis circuit at 
transmitter and complicated equalization at receiver can be removed, resulting in reduced 
Si area and power consumption of transceiver. 
Figure 4.4 shows the simulated input and modulated output signal of the ASK 
modulator in 180nm CMOS process technology. The input data stream that is top 
waveform of Figure 4.4 is randomly generated at 3Gb/s. The bottom waveform exhibits 
the ASK-modulated random data output stream after startup time of the PLL frequency 
synthesizer. 
.
 
Figure 4.4: Simulated input and output signals of the ASK modulator (top: the input data and 
bottom: the ASK modulated signal). 
 
48 
 
 
 
 
4.3 RF-Band Receiver 
Figure 4.5 shows the schematic of a RF-band receiver (RFRX) with signal waveforms. 
In the RF receiver, the differential mutual mixer amplifies the incoming ASK modulated 
RF carrier as an envelope detector and down-converts the RF carrier to an original 
baseband signal. In other words, the RF receiver design features a mixer that is capable of 
non-coherent direct-down conversion to a baseband signal from the carrier frequency. It 
is followed by a buffer converter with RC feedback for input common-mode equalization 
to amplify the recovered baseband signal and to have full-swing rail-to-rail digital signal. 
The conventional single-ended passive mixer [56] suffers from substantial signal loss 
in microwave frequencies and is sensitive to supply noise coupling. In contrast, this 
pseudo-differential active mixer amplifies and down-converts the modulated RF carriers 
by feeding ASK modulated signals to differential gates and drains. This active 
differential-signal mixing scheme with subsequent differential amplifier will generate 
differential outputs to the baseband output driver. Consequently, ultra wide bandwidth 
with enhanced RF signaling and compact area is achieved by eliminating inductors in 
LNA and phase/frequency locked loop (PLL and FLL) [9]. Additionally one of the most 
important advantage of ASK modulation scheme over BPSK is that ASK modulation 
technique is able to demodulate the RF carrier without a complex frequency and phase 
synchronization circuit such as a Costas loop and de-modulation scheme. Because the 
receiver of ASK (de)modulation scheme only senses the amplitude change of a RF carrier 
signal, there is no need to detect and synchronize frequency or phase variation, which 
resulting in much simple receiver architecture.  
As explained, the mixer is chosen with a non-coherent configuration as it lacks power 
hungry clocking circuits which reduce the power consumption by the mixer. The mixer 
operates under the design principles of the Gilbert cell multiplier circuit [57], where in 
this instance the RF and LO inputs are the same carrier frequency (self-mixer). The 
intended consequence of this is the down converted baseband signal that originally 
modulated the carrier.  
The circuit configuration of the implemented mixer is shown in Figure 4.5. The ASK-
49 
 
 
 
 
modulated signal from the transformer get through into the four inputs, M1 through M4, 
which is traditionally considered the RF input in the Gilbert cell architecture. There is 
also a high-pass path (through the capacitors C1 and C2) to the drains of the two 
differential amplifiers contained within the mixer and this signal is considered the LO 
input. The resulting output is the down-converted baseband signal that is fed into the 
pseudo-differential amplifier. In fact, the mixer here can be considered as envelope 
detector as it will follow the envelope of the ASK-modulated signal. The drains of the 
two differential amplifiers present in the mixer are cross coupled and have resistors as the 
load. 
 
Off-Chip 
Channel
VTERM
V
D
D
Output Driver
Mixo
Mixob
Incoming RF band signal Recovered RF band signal
D1(RF) Output
Transformer Differential mutual- mixer
1  1    0 1 0
Differential 
Amplifiers Buffer Converter
C1
C2
M1
M2
M3
M4
 
Figure 4.5: Simplified schematic of RF-band receiver. 
 
The process in this mixer begins with multiplication of the two sinusoidal input signals 
that are multiplied together. These two signals can be described as: 
                                                                   (4.1) 
                                                                   (4.2) 
Using product-to-sum trigonometric identity and rearranging factors, we get two 
different frequency components: 
       
  
 
[   (                )     (                )]  (4.3) 
50 
 
 
 
 
As is seen from above equation, the first term is the sum or image frequency signal and 
is followed by baseband signal. Also the first signal is in additive form and the other in 
subtractive form. Furthermore, if we use two modulated signals that are the same, we 
could theoretically convert directly down to baseband. In other words, the additive signal 
has a frequency component of double the RF carrier frequency and the subtractive signal 
will be the down-converted baseband signal. The additive signal will be filtered out in 
subsequent circuit blocks, while the binary baseband data will be amplified rail-to-rail for 
transmission line. The analysis and operation of Gilbert mixer is involved and exhaustive 
and has become focus of whole theses and journals. Our aim is to just use the 
conventional implemented structure in our system from previous analysis performed 
[57][58][59][60]. Figure 4.6 shows the spectral output of the mixer designed in 20GHz. 
Simulated input and output signals of the differential self-mixer are shown in Figure 4.7. 
However, the amplitude of the received signal is reduced to 0.15V due to the channel 
degradation, it is still sufficient as the mixer input. 
 
 
Figure 4.6: Simulated power spectrum of the mixer designed. 
 
 
 
51 
 
 
 
 
 
 
Figure 4.7: Simulated input and output signals of the differential self-mixer (top: the input data 
and bottom: the mixer output signal). 
VDD
M2 In_nM1
R1
In_p
Out_p Out_n
R1
 
Figure 4.8: Circuit configuration of the differential amplifiers. 
 
Figure 4.8 shows the circuit configuration of all the differential amplifiers used in the 
system. All three differential amplifiers differ among themselves by the load resistor 
values. As explained, the differential amplifier helps to amplify the signal and removes 
52 
 
 
 
 
the additional high frequency peaks present in the signal as shown in Figure 4.9. The 
differential amplifiers were arranged in a decreasing order of their load resistance so as to 
bring down the circuit impedance. 
The output of the mixer is differential. Due to the method of down-conversion used by 
the mixer, these complementary output signals have two different common-modes. These 
different common-modes present a problem for subsequent differential stages considering 
each side will have a different common-mode. In order to solve this problem, the buffer 
converter with RC-feedback which acts as low pass filter can be used to control the 
overall gain while settling on a single common-mode voltage. The circuit configuration 
of the implemented buffer converter is shown in Figure 4.10. The transient signal settles 
after 140ns which tells us that about 140ns of data cannot be recovered which will be the 
initialization period before any significant data can be recovered. The settling time 
depends on the RC feedback network. A very large resistance of 400kΩ with a capacitor 
of 128fF was chosen for the best transient response. The other components following the 
buffer converter include inverter chain and output driver. The inverter chain includes 
multiple inverters to make the signal from rail to rail. The output driver is of 50Ω which 
is equivalent to the connecter at the output of the receiver and the output signal of the 
output driver is shown in Figure 4.11. 
 
Figure 4.9: Simulated output signal of the differential amplifiers. 
53 
 
 
 
 
M1
R1
In+
C1
M2
M3
R1
In-
C1
M4
VDD
Out-Out+
 
Figure 4.10: Circuit implementation of the buffer converter. 
 
Figure 4.11: Simulated output signal of the output driver. 
 
4.4 RF-Band Transceiver 
Figure 4.12 shows a single RF-band transceiver architecture using ASK modulation 
scheme, which consists of a phase-locked loop (PLL), an ASK modulator, an off-chip 
transmission line with on-chip transformers, a differential mixer, amplifiers and buffers. 
54 
 
 
 
 
PLL
20GHz
Input_buffer
V
D
D
IN_N
IN_P
IN_N
IN_P
IN
_N
IN
_P
IN_N
IN_P
ASK modulated signal
Transformer
1  1    0 1 0
1  1    0 1 0
D1(RF) IN
ASK Modulator
Off-Chip Channel
VTERM
V
D
D
Mixo
Mixob
Incoming RF band signal Recovered RF band signal
Transformer
Differential mutual- mixer
Differential 
Amplifiers Output Driver
D1(RF) Output
1  1    0 1 0
Buffer Converter
Vterm
 
Figure 4.12: Simplified schematic of RF-band transceiver. 
 
Figure 4.13 shows the simulated waveforms of single RF band ASK RF-interconnect 
transceiver, which fully recovers the original input data stream by ASK (de)modulation 
technique without any frequency and phase synchronization. Furthermore the ASK RF-
Interconnect does not suffer from process-induced RF-carrier variation between the 
transmitter and receiver. The RF-Interconnect can be also compatible to a CMOS 
technology and it can operate fully with traditional digital logic circuits placed directly 
55 
 
 
 
 
under its on-chip passives such as an on-chip transformer and inductors, which results in 
much reduced layout area. In conclusion, without the needs of power-hungry frequency 
and phase synchronization, a simple RF-band transceiver architecture provides potential 
advantages that ASK RF-interconnect can be power-efficient and applied to future 
advanced memory interface of challenging situation of chip-to-chip communication. 
 
Figure 4.13: Simulated eye diagram of RF-band transceiver at 3Gbps in 180nm CMOS process. 
4.5 Band-Selective Transformer 
This section will be composed of information about the on-chip band-selective 
transformer. The RF bands cannot be resistively terminated and instead require a 
transformer. The transformer is particularly of interest because it acts as an impedance 
matching circuit that couples/decouples the RF bands to/from the transmission line. The 
focus will mainly be on the design and characterization of transformer. 
In this prototype design, an on-chip transformer with primary-to-secondary-coil turn 
ratio of 2:1 is chosen. The objective of our transformers should effectively couple/de-
56 
 
 
 
 
couple its RF signal and isolate it from the baseband signal (DC blocking). Furthermore, 
its purpose is not just for filtering, but also impedance matching to the transmission line 
and the input impedance of the receiver. Unlike the baseband signaling, the RF-band 
signal cannot be terminated by using a simple resistor, since the parasitic around the 
resistor would dominate at high frequencies. Consequently, the proposed RF transceiver 
uses on-chip transformers as matching devices with simplified network illustrated in 
Figure 4.14. The transformer is loaded by the mutual-mixer input stage with impedance 
modeled by series connection between a resistor and a capacitor. The detailed step-by-
step analysis is given in [23]. It showed that the input seen at the secondary coil in 
parallel with the input impedance at the receiver matches well at resonance. In fact, the 
transformer properties can actually boost the RF signal from a transmission line. The 
same analysis applies to impedance matching between the transmitter and the channel. 
 
C
R
(B) Load
L
Rs=2π • fs•(L/Q)
Rs
(A) 2nd coil
Input 
impedance
fs=20 GHz
Q=21
C=110 fF
Rs= 3 Ω 
R=10 Ω 
M6
M5 
Z1in(w)
(A) Band-selective transformer
Single-ended T-Line
Z2in(w)
(B) Load Lumped RLC Model
XP2
XN2
XP1
XN1
1st coil 2nd coil
L1 L2
P1 P2 P2
Z2in(w)
R
FR
X
 
Figure 4.14: Transformer design and model. 
 
The layout of the transformer is shown in Figure 4.15. The physical parameters used 
for both transformers are 5µm spacing between windings and 10µm widths. The primary 
coil has two turns and implemented with metal 6 while metal 5 is used for the secondary 
single-turn coil. The outer radius of the transformer is 150µm and its inner radius is 
100µm. The simulated inductances of L1 and L2 are 0.89nH and 0.42nH at 20GHz, 
respectively.  
57 
 
 
 
 
 
Figure 4.15: Transformer layout. 
 
The simulated coupling factor (Km) of the transformer is shown in Figure 4.16. With an 
increase in operating frequency, the coupling factor becomes increased, where Km is 0.67 
at 20GHz. Therefore, the coupling factor is effectively strong between the coils since the 
amplitude of modulated signal is on the order of several hundred millivolts (>400mV). 
The Q factor of each transformer coil is shown in Figure 4.17. The Q factor of the 
primary and secondary coils is 14.6 and 16.8, respectively. Both primary and secondary 
coils in transformers well exceed a quality factor of 10 at their particular frequency band 
they were designed for filtering. Having a quality factor of 10 or greater for a desired 
frequency is a good rule of thumb to follow.  
 
58 
 
 
 
 
 
Figure 4.16: Simulated coupling factor (Km) of the transformer. 
 
Figure 4.17: Simulated quality factor of each coil of the transformer. 
 
Figure 4.18 shows how the RF band and baseband transmission are interfaced with the 
transformers. From baseband signaling perspective, the signal is fed into the center tap of 
the primary coil and transferred to the channel in common-mode. On the receiving end, 
the baseband signal is extracted through the center tap of a secondary coil. In the case of 
RF-band signaling, the differential signal is injected into the differential port of a primary 
coil and then coupled to the single-ended channel through a secondary coil. 
 
59 
 
 
 
 
Center-tap
Common-mode (CM) 
signaling for baseband
XCM XCM
Differential-mode (DM) 
signaling for RF-band
XRF+
XRF-
XRF+
XRF-
R
FT
X
R
FR
X D1(RF) OUT
1  1    0 1 0
D1(RF) IN
1  1    0 1 0
D2(BB) IN
0 0 1  11
B
B
TX
D1(BB) IN
1  1    0 1 0
Metal6 (Top)
Metal5
XRF+
XRF-
XBB (CM)
Controller side
Off-Chip T-Line
(Single ended)
XRF+
XRF-
XBB (CM)
Memory side
Center-tap Center-tap
VDD VDD
B
B
R
X
D2(BB) OUT
0 0 1  11
D1(BB) OUT
1  1    0 1 0
VDD
VDD
VDD
VDD
 
Figure 4.18: MDBI working mechanism for simultaneous bidirectional data transaction. 
 
 
 
 
 
 
 
 
  
60 
 
 
 
 
 
 
Chapter 5: Pulse-Amplitude Modulation 
(PAM) Transceiver 
 
 
 
A four-level pulse amplitude modulation (4-PAM) memory I/O interface for Mobile 
DRAMs is presented. The PAM transceiver uses a current mode transmitter and a dual-
sampling receiver to transmit and receive two data streams through a shared single ended 
channel simultaneously. The transmitter employs a current mode output driver which 
sends data through a channel. The receiver side uses differential amplifiers to decode 
three voltage levels by comparing the PAM signal with three reference voltages. The 
transceiver is simulated in 180nm CMOS technology at 1.5V. The results show the 
increase of the data bandwidth to 6.4Gb/s. Energy efficiency of mobile PAM I/O memory 
interface is 1.9pJ/bit. 
5.1. Introduction 
The scaling of CMOS feature sizes has yielded the capability of integrating 
heterogeneous intellectual properties (IPs) on a single die. To have a reliable 
communication between multiple IPs such as DRAMs, a network-on-chip (NoC) [61] is a 
feasible approach. Key parameters such as latency, power consumption, bandwidth and 
area are substantially challenging in integrating IPs. Dynamic voltage/frequency scaling 
61 
 
 
 
 
(DVFS) is a common technique to reduce power consumption of the NoCs [62-64]. In 
this method, switches or links used in a NoC are idle for a specific period of time and use 
a lower voltage or frequency. However, the feature size scaling and process variations are 
a main obstacle for using this technique [65]. Another approach is to utilize fewer hops in 
a channel to reduce the power consumption [66]. However, the effect of router networks 
used to improve power consumption is reduced by increasing their count [67]. Thus, 
novel techniques are needed to reduce power consumption of the NoCs. Unfortunately, as 
CMOS feature sizes have been scaling smaller, this has exacerbated latency and signal 
degradation due to increasing channel resistance. With shrinking the CMOS feature size, 
the cross sectional area of the channel shrinks resulting in the increase of channel 
resistance [68]. To mitigate the aforementioned issues, repeater circuits are employed 
which they improve the latency of the transmitted signal in some specific parts of the 
channel [69-71]. However, the latency is improved by using these repeaters but the power 
consumption increases. A low swing signaling (LSS) [72] is another technique to reduce 
the communication latency of the NoCs. In this technique, the latency improves through 
increasing bandwidth, but it is very sensitive to power supply and ground noises. The 
noise problem is intensified with the increase of the number of IPs.  
One of common IPs is DRAMs used in mobile devices such as portable tablets and 
smartphones. Energy efficiency is the most important issue to address in today’s mobile 
devices. Furthermore, improving their bandwidth is the second highest performance 
priority. In order to achieve these demands, as previously mentioned, increasing the 
operating frequency and scaling down of supply voltage have become promising 
techniques. However, they can degrade overall performances [73], [74]. 
Moreover, In terms of bandwidth, serial link could be a promising solution for mobile 
memory I/O interface by providing high bandwidth, reducing cost and power dissipation 
and requiring less data bus lines. However, serial links transceivers typically require long 
initialization time (~1000 clock cycles). Thus, to have a fast switching between active, 
standby, self-refresh and power-down operation modes in mobile DRAM, serial links do 
not meet these requirements [73],[75],[76]. Another approach to fulfill higher data 
bandwidth is a multi-channel memory interface. Figure 5.1 shows memory interface 
62 
 
 
 
 
architecture for multi-channel memory devices. A multi-channel memory interface 
mostly consists of multiple channels (i.e., 32 or 64 bit lanes) for higher data bandwidth. 
These structures transceive data bi-directionally and need to have a high frequency and 
low-power I/O memory interface.  
A pulse amplitude modulation (PAM) parallel bus that transceives data simultaneously 
and bi-directionally has been demonstrated [77], [78] by using multilevel modulated 
signaling. The primary motivation for using a greater degree of modulation is the 
bandwidth efficiency: increasing the aggregate date rate for a given channel bandwidth. 
For example, switching to 4-PAM essentially transceives twice the data for the same 
symbol rate (sampling rate) through encoding every 2-bits into 4-signaling levels. 
 
CPU DRAM
Memory Bus
Data (n bit)
TX
R
X
TX
R
X
 
 
1  1    0 1 0
f
f
f
MemoryCPU
D(BB): PAM signaling 
filter
f f
D1(RF) out
Dn(BB) in
Dm(RF) out
f
P
A
M
 
R
X
f
P
A
M
 
TX
0 0 1  11
D1(BB) in
ClockClock
P
A
M
T
X
f
P
A
M
R
X
0 01 0
0 01 01 0
0 1
1  1    0 1 0
D2(BB) in
D3(BB) out
D1(BB) in
0 0 1  11
D4(BB) out
0 01 0
0 01 01 0
0 1
0
1 0
0 1
01
1
 
Figure 5.1: PAM memory interface architecture. 
 
63 
 
 
 
 
5.2. PAM Transceiver Architecture  
Figure 5.2 shows the proposed current mode PAM transmitter. The transmitter utilizes 
a novel leakage-suppression encoder, DRAM mode control logic, a DAC-based output 
driver and digital impedance control logic. The data coming from processor chip is first 
encoded with a thermo code (shown in Figure 5.3) in order to minimize any possible 
glitches during PAM level transition.  
VDD
0 0 1  11
D2 (in)
01 00 1
D1 (in)
Out
R
En
co
d
e
r
CMD CTRL Logic
a
b
c
PAM_ENC
Digital 
Impedance 
Control Logic
Off-chip Channel
D1
D2
D1
a
b
c
D1
Bit-synchronized encoder
 
Figure 5.2: Current-mode PAM transmitter design. 
 
 
The PAM transmitter generates 4-level bidirectional signaling. Transistor switches in a 
conventional PAM transmitter can create a leakage current in the transmission line, so we 
added a novel leakage suppression control logic block to reduce the leakage current in 
DRAM power-down/nap mode. The encoder outputs (a, b and c) are connected to 
transistor switches. When the control logic circuit output (PAM_ENC) is 1 (logical high), 
the switches are turned off in DRAM power-down mode to save all leakage power. If the 
64 
 
 
 
 
state of PAM_ENC is changed to 0, the PAM transmitter perform normal operation in 
DRAM active (or active standby) mode. Thus, if the memory is in an active-standby 
mode for a long time, the proposed PAM transmitter doesn’t degrade the data read/write 
performance. Lastly, when the state of control signal a and b is 1 (i.e., PMOS), the state 
of c should be 0 to eliminate possible glitches when transistor switches of a and b are off. 
If c is 0, the wanted level of PAM output signal can be high or low (i.e., any switch’s 
floating case that causes leakage from high-Z condition can be avoided).  
The leakage component of power consumption in deep submicron CMOS has a more 
pronounced effect than the switching components and a larger percentage of total power 
consumption is due to the transistors’ leakage current (P= CV2f+VIleak). The proposed 
PAM transmitter can reduce this large portion of leakage current consumption at power-
down mode. Figure 5.3 shows the simulation result of a worst case PAM startup latency 
when the power mode of the PAM transmitter makes a transition from power down to 
normal operation. Thus, when a command signal such as ACT (active read or write) from 
microprocessor is asserted, a PAM startup control signal turns on a PAM interface one 
clock cycle before the data strobe (DQS) signal to give enough time for read/write. After 
ACT signal de-asserts, the PAM control logic turns off the PAM interface circuit to save 
power. Moreover, the transmitter utilizes a digital impedance control logic circuit to 
avoid impedance mismatch and reduce sensitivity to process, voltage and temperature 
(PVT) variations. The digital impedance control logic circuit is composed of multiple 
binary-weighted sub-drivers and decoder logics to provide digitally controllable driver 
strength by selectively enabling the sub-drivers [23]. 
 
65 
 
 
 
 
 
DQS
(Differential)
DQ
(PAM)
Clock
(Differential)
CMD
PAM_ENC
ACT
PAM 
Signal
0.15ns (PAM startup delay)
PAM TX off
PAM Startup
Timing diagram of PAM startup control logic 
P
A
M
 
TX
PAM startup 
Control Logic
CMD
0 0 1  11
D2 (in)
01 00 1
D1 (in)
DRAM timing
margin
D1 D2 D3 D4
D5 D6 D7 D8
PAM TX off
PAM_ENC
PAM_ENC
0
0
1
1
a b c
0
0
0
0
1 1
0
1
1
0
1
1
1
0
1
1
level
11
01
00
10
Z
0 0 0 1 11
Worst case PAM startup latency
(Power Down         ACT Mode)
 
 
Figure 5.3: Timing diagram of current-mode PAM transmitter startup control logic and 
simulated PAM TX startup delay. 
At the receiver side to recover the data coming from the transmitter of the controller, 
the receiver should have low offset, high resolution and low latency characteristics in 
66 
 
 
 
 
both voltage and time dimensions. Figure 5.4 shows the PAM receiver design. The 
transmitting data from the channel is transceived by the comparators of DRAM sides. 
This subtraction is done using the proper comparators. The comparators effectively 
recover the data from the channel by using reference voltages selected sampling. The data 
is compared with the reference voltages and then the recovered data from comparators is 
decoded to 4-bit binary data. The comparators compose of differential amplifiers that 
have a stable gain over a wide common-mode range, as shown in Figure 5.5.  Therefore, 
the output of comparators is connected to differential voltage buffers. Before transferring 
data to decoder, a DFF circuit is utilized to hold data which it shows in Figure 5.6. 
 
-
+
-
+
Comp
-
+
Comp
Comp
0 0 1  11
D2(out)
01 00 1
D1(out)
VREF 
Generator
ref 00
ref 11
ref 10
ref 01
VREF3
VREF2
VREF1
11
01
10
00
in
Output 
Driver
D
eco
d
er
in
Digital ODT
Control
VDD
Rt
DS DFF
CLK
D
Q
ds1
ds2
rxclk
 
Figure 5.4: PAM receiver design. 
 
V
D
D
R
R
V
D
D
 
Figure 5.5: Simplified comparator circuit used in PAM receiver. 
67 
 
 
 
 
CLK
Clkp
Clkn
Clkn
Clkp
DFF
D Q
DFF
D Q
D Db
Clkb
Clk
Out Outb
 
Figure 5.6: Simplified DFF used in the PAM receiver. 
A digitally controlled on-die termination (ODT) is utilized to amplify the incoming 
signal data stream using buffers. The ODT sets the common mode voltage and removes 
the impedance mismatch for optimal signal integrity. In order to implement the ODT, 
passive resistors are connected in series with transistors as shown in [23]. The maximum 
data rate can be limited by common-mode range, voltage resolution and offset voltage of 
the comparator. One of the most important methods to increase the maximum achievable 
data rate is the employment of parallel sampling using comparators. However, if the 
various common-mode ranges are applied to the comparators, it will make them more 
complicated and unstable due to the noises at the different common modes. Thus, the 
sampling method increases the data rate and results in the increase of power consumption 
and noise. In this work, the common mode range of the comparator should be carefully 
calibrated to recover the modulated signal correctly [78]. Thus, the noise problem can be 
solved by proper selection of sampling range for each comparator. 
 
5.3. Results 
The 4-level PAM I/O memory interface is designed in 180nm CMOS technology. 
Figure 5.7 shows the waveforms of receiver end data of channel interface. In the case of 
PAM transceivers, output data eye diagrams shows much degraded due to noises from 
multiple reference voltage level. We found the PAM transceivers’ eye diagram do not 
show such severe distortion unlike conventional PAM transceivers. Figure 5.8 shows eye 
diagrams of 6.4Gb/s data rate. The simulated power consumption of a transmitter and 
68 
 
 
 
 
receiver is 5.12mW and 7.04mW at 6.4Gb/s, respectively. 
The simulated energy efficiency of PAM transceiver is 1.9 pJ/bit at 6.4Gb/s. Therefore, 
the PAM I/O memory interface can be a promising solution to increase both bandwidth 
and energy efficiency. Table 5.1 compares the PAM I/O performance to that of 
conventional memory interfaces. This PAM level can be further extended to more levels 
in the future, such that multiple data streams can be transceived through a shared I/O 
interface and generate much improved energy efficiency.  
 
Figure 5.7: Simulated four level voltage waveform. 
 
Table 5.2: Performance comparison of the 4-level PAM transceiver. 
  [70]  [79]  [80]  [81]  [82]  [83] This work 
Technology 0.13µm 0.18µm 65nm 0.13µm 0.18µm 0.18µm 0.18µm 
Supply (V) 1.2 1.8 0.9~1.25 1.2 1.8 1.8 1.5 
Power (mW) 6 1.05 10 0.627 6 16.1 12.16 
Data Rate 
(Gb/s/pin) 
3 1 5 1.1 3 1 6.4 
E/bit 
Efficiency 
2 1.05 2 0.57 2 16.1 1.9 
Data 
communication 
Non-
simultaneous 
Non-
simultaneous 
Non-
simultaneous 
Simultaneous 
(PAM) 
Non-
simultaneous 
Non-
simultaneous 
Simultaneous 
(PAM) 
Channel Type Differential Differential Differential Single Ended Differential Differential Single Ended 
69 
 
 
 
 
 
Figure 5.8: Eye diagram of PAM interface. 
5.4. Conclusion 
We analyzed and designed a PAM memory I/O interface for mobile devices in 180nm 
CMOS at 1.5V supply to obtain an aggregate data throughput of 6.4Gb/s, with a power 
consumption of 12.16mW. The PAM memory I/O interface is able to meet the higher 
aggregate data throughput and higher energy efficiency than prior works.  
70 
 
 
 
 
 
 
Chapter 6: Multilevel Dual-Band 
Interconnect (MDBI) System 
 
 
 
The key goal of this section is to analyze, design, and fabricate the proposed 
multilevel dual-band interconnect transceiver and integrating all components such as 
phase-locked loop (PLL), RF-band transceiver, band-selective transformer, and PAM 
transceiver to support simultaneous data streams on a single-ended shared off-chip T-line 
by using a multilevel dual-band signaling. The MDBI transceiver is designed and 
fabricated in 65nm CMOS technology. The PAM and RF-band transceivers carry 
9.2Gb/s/pin and 4.2Gb/s/pin, respectively, on a 5cm shared off-chip transmission line. 
The MDBI transceiver operates at 13.4Gb/s/pin with the power efficiency of 2.8pJ/b/pin. 
 
6.1 Multilevel Dual-Band Interconnect (MDBI) Transceiver 
Architecture  
Figure 6.1(a) shows the system architecture for the proposed multilevel dual-band 
interconnect interface architecture. Figure 6.1(b) shows the intended MDBI signaling that 
contains both multilevel baseband and RF-band for multiple concurrent data 
communications. The proposed MDBI system architecture includes a RF-band 
71 
 
 
 
 
transceiver (RFTX and RFRX) and a baseband transceiver (BBTX and BBRX) with a 
single-ended shared off-chip transmission line. These transceivers provide a CPU-
memory communication interface which is needed to transmit and receive data and 
command from both sides. In this system, where BBTX and BBRX communicate with 
each other by using a multilevel BB-only signaling, and the RFTX and RFRX 
communicate concurrently by using a RF-band signaling. Figure 6.1(b) shows the dual-
band frequency allocations for a concurrent I/O interface in frequency domain. The 
proposed multilevel BB and RF-band signaling with a synchronous clocking scheme 
enables to communicate multiple data streams through a frequency band and multilevel 
BB without any significant latency penalty. 
 
Data 
(Multi-level BB and RF-band)
Band-selective 
transformer
D2(BB) in
1  1    0 1 0
D1(BB) in
0 0 1  11
1  1    0 1 0
0 0 1  11
Clock
f
f
f
f
MDBI Chip
MemoryCPU
B
B
TX
MDBI Chip
MDBI channel
 
D3(RF) in
1  1    0 1 0
f
R
FT
X
f
R
FR
X
1  1    0 1 0
(a)
B
B
R
X
  PLL
D2(BB) out
D1(BB) out
D3(RF) out
ILFM
Clock
(b)
f(RF)
RF-band Signal
Proposed Multi-Level Dual-Band Signaling
 f
f(BB)
Multi-level SignalSignal
Power
0
1 0
0 1
01
1
Level 1
Level 2
Level n
Signal
Power
Baseband-only Signal
 f
f(BB)
0 0 1  11
Conventional
 
Figure 6.1: (a) Block diagram of the proposed a multilevel dual-band interconnect (MDBI) 
architecture for reliable and simultaneous communication. (b) Multilevel dual-band signaling in 
frequency domain. 
 
72 
 
 
 
 
The described MDBI architecture can be readily extended to accommodate additional 
RF-bands on the shared transmission line by using energy-efficient ASK modulations. 
The key challenges in designing this MDBI system are to reduce the RF-band 
transceiver’s power overhead and to design an area-efficient band-selective transformer 
(as shown in Figure 6.2), because a sufficient spectral isolation among dual 
communication bands would be required to achieve concurrent MDBI communication. 
These key technical challenges can be overcome by the design of the energy-efficient 
MDBI circuit and architecture topology, such as ASK (de)-modulation transceivers. As 
CMOS technology continues to advance in deep-scaled processes, this proposed 
multilevel modulation interconnect concept can be further extended to “Multiple-level 
basebands + RF-band” in the future by using advanced RF (de)-modulation and PAM 
techniques, such that multiple data streams can be simultaneously transceived through a 
shared transmission line between mobile CPU and memory systems. 
 
Single-ended 
off-chip channel
D3(RF)
D2(BB)
Memory
R
FT
X
B
B
R
X
R
FR
X
CPU
D3(RF) IN
1  1    0 1 0
D2(BB) IN
0 0 1  11
D3(RF) OUT
D2(BB) OUT
1  1    0 1 0
0 0 1  11
B
B
TX
Metal9 (Top)
Metal8
XRF+
XRF-
XBB (CM)
Band-selective 
transformer
Off-Chip T-Line
(Single ended)
XRF+
XRF-
XBB (CM)Center-tap Center-tap
D1(BB)D1(BB) IN
1  1    0 1 0
D1(BB) OUT
1  1    0 1 0
VDD VDD
Band-selective 
transformer
 
Figure 6.2 Band-selective transformer used in MDBI transceiver. 
 
 
73 
 
 
 
 
6.2 Multilevel Dual-band Interconnect (MDBI) Transceiver Design 
The central objective of this section is to analyze and design an energy-efficient MDBI 
transceiver using the simplest ASK modulation technique. Figure 6.3 shows the proposed 
MDBI transmitter with our band-selective transformer. The RFTX contains a phase-
locked loop (PLL), an ASK modulator and an on-chip frequency-selective transformer 
with on-die termination (ODT). In RFTX, the PLL first generates RF carriers at two high 
frequencies (f1, f2) which are fed into transmission gates (TG) transistors. The data 
stream (D3) modulates the carrier by switching on/off TG transistors to complete ASK 
modulation. The modulated output then will be inductively coupled into a shared 
transmission line by way of the proposed on-chip band-selective transformer to minimize 
inter-channel interference (ICI). The ODT is also integrated into band-selective 
transformer to overcome impedance mismatch. The multilevel baseband transmitter is 
composed of input buffers, an encoder, and a multilevel transmitter. The two data streams 
(D1(BB) and D2(BB)) are fed into a multilevel transmitter through an encoder. 
PRF+
PRF-
Off-chip T-Line
(Output)
(ASK modulated signal)
1  1    0 1 0
[dB]
ASK Modulator
PLL f(RF)
RFTX
 ff(RF)f(BB)
13.4Gb/s MDBI Signal
Multi-level dual-
band Signal
P(BB) 
BBTX
Input Buffer
Multi-Level Transmitter
4-level
1  1    0 1 0
D1(BB) in
D2(BB) in
0 01 01
D3(RF) in
0 0 1 0 1
En
co
d
e
r
4.2G/s9.2G/s
B
B
TX
9.2G/s
4.2G/s
 PLL
ILFM
Center-tap
D1
D2
D1
a
b
c
D1
Bit-synchronized encoder
VDD
R
a
b
c
a
b
c
VDD
Rterm
Buff. Con.
IN-N
IN-P
IN-N
IN-P
IN-N IN-P
IN-N
IN-P
4.6GHz
4.6GHz
4.2GHz
R1
R1
R2
R2
R16
R16
Digital Impedance 
Control Logic
EN<15:0>
ENB<15:0>
 
Figure 6.3: Multilevel dual-band interconnect transmitter. 
74 
 
 
 
 
The multilevel transmitter uses a DAC-based output driver and an impedance 
controller. A multilevel control logic circuit creates different output levels by selecting 
different pushing/pulling current to the baseband output driver. The level control logic 
circuit turns on a combination of PMOS and NMOS transistors such that the total output 
impedance of the output driver would be always same as the characteristic impedance of 
the shared transmission line (ex. Z = 50 Ω) in order to provide output impedance 
matching. Figure 6.4 shows the transmitter side layout of the 4-level PAM and RF-band 
transceivers. 
 
(a) 
 
(b) 
Figure 6.4: Transmitter side layout of (a) the 4-level PAM (b) RF-band transceivers. 
 
75 
 
 
 
 
Figure 6.5 shows the proposed MDBI receiver with a band-selective transformer. The 
MDBI-RX first splits data streams into multilevel BB and RF-band, respectively by using 
the proposed band-selective transformer which can suppress the inter-band interference 
by spectral separation. The band-pass filtered RF-band data stream is then injected to the 
differential mutual-mixer. A pair of resistor-loaded switching devices and a class-AB 
amplifier with resistive feedback can be utilized to further filter out the residue of the 
unwanted RF carrier. A multilevel receiver receives the 4-level PAM signal from the off-
chip transmission line. This signal is compared with three multiple reference voltages 
which are generated using a reference voltage generator. A comparator senses each 
multiple level signal through the digital-controlled reference voltage generator. Pre-
amplifiers are used to make complementary signals from the incoming single-ended 
multilevel signal to provide higher noise immunity and maximize multilevel voltage 
margins.  
 
Incoming RF band signal Recovered RF band signal
P(BB) 
PRF1+
PRF1-
Output 
Driver
Differential
Mutual Mixer
RFRX
Logic: 00
Logic: 11
Logic: 10
Logic: 01
VREF3
VREF2
VREF1
11
01
10
00
Voltage Amp
VREF1
VREF3
VREF2
Multi-level Receivers
BBRX
amplifiers
Voltage Comparator
[dB]
 ff(RF)f(BB)
13.4Gb/s MDBI Signal (single-ended)
Multi-level 
Dual-band Signal
4.2G/s9.2G/s
Voltage
Reference
Generator
9.2G/s
4.2G/s
Vin VREF
Voutp Voutn
D
e
co
d
e
r
D3(RF) out
0 0 1 0 1
VTERM
Digital 
Impedance
Control 
Off-chip T-Line Center-tap
Rterm
Differential mutual- mixer
in
V
D
D
+
-
+
-
1  1    0 1 0
D1(BB) out
D2(BB) out
0 01 01
 
Figure 6.5: Multilevel dual-band interconnect receiver with a frequency band-selective 
transformer. 
 
76 
 
 
 
 
Finally a decoder block converts these encoded multilevel signals to the original 
output data. Further, since this proposed MDBI receiver using ASK modulation only 
senses the incoming signal’s amplitude, the power-hungry frequency and phase 
synchronizations between RF transmitter and receiver are not required. This will greatly 
simplify the overall mobile interface design. Figure 6.6 shows the receiver side layout of 
the 4-level PAM and RF-band transceivers.  
 
 
(a) 
 
(b) 
Figure 6.6: Receiver side layout of (a) the 4-level PAM and (b) RF-band transceivers. 
 
 
77 
 
 
 
 
Figure 6.7 shows the post-extracted simulation waveforms of simultaneous bi-
directional MDBI system. We used letters A-F to show input and output results for each 
band. Input and output data streams of the RF-band transceiver are shown in Figure 6.7 
(A) and (B). The input data (A) along with 20GHz carrier generated by the PLL are 
injected to the ASK modulator. Finally, this signal is recovered by the self-mutual mixer 
as shown in Figure 6.7 (B). The simulation waveforms of the 4-level PAM signals can be 
viewed in Figure 6.7(C) to (F). The 4-level PAM has two input data streams, (C) and (E), 
which are encoded by the encoder and transmitted by the current-mode output drive. The 
recovered data streams in the PAM receiver side are shown in Figure 6.7 (D) and (F). The 
delay existed between input and output data streams for each band is mostly due to the 
off-chip transmission line.  
 
Data 
(Multi-level BB and RF-band)
Band-selective 
transformer
f
f
f
f
MDBI Chip
MemoryCPU
B
B
TX
MDBI Chip
MDBI channel
 
f
R
FT
X
f
R
FR
X
B
B
R
X
A
B
C D
E F
 
 
(A) 
78 
 
 
 
 
 
(B) 
 
(C) 
 
(D) 
 
(E) 
79 
 
 
 
 
 
(F) 
Figure 6.7: Simultaneous bi-directional MDBI simulation waveforms for three input data streams. 
 
Figure 6.8 shows both of BB (4-level PAM) and RF-band data eye diagrams. The data 
rate of the PAM for both data streams is 5.5Gb/s/pin and the RF band data rate is 
5Gb/s/pin. Therefore, the proposed MDBI transceives simultaneously three data streams 
with a data rate of 16Gb/s/pin at 1.2V supply voltage. The energy-efficiency of the MDBI 
transceiver in the simulation is 2.37pJ/b/pin.  
 
 
(a)  
80 
 
 
 
 
 
(b) 
Figure 6.8: Simulated data eye diagram of the MDBI transceiver (a) 4-level PAM output eye 
diagram at 5.5Gb/s/pin (b) RF-band output eye diagram at 5Gb/s/pin. 
 
 
Figure 6.9 shows the simulated latency of the MDBI transceiver using Monte Carlo 
mismatch simulation [85]. The most important part of PAM transceiving data is potential 
latency difference of decoded data because data should be encoded and decoded. This 
packet transition can cause the receiver’s side latency problem. In our analysis, a 
forward-clocking schemed for source synchronization based on multiple D-flip flops; the 
possible latency discrepancy at PAM receiver side can be minimized. When the total 
timing distortion on the DRAM side is less than half clock cycle (i.e., 594ps @ 5GHz & 
10Gb/s/pin data rate), the clocking scheme can synchronize skews between PAM data 
channels on the DRAM side well. The major latency of the RF-band transceiver happens 
at transmitter side due to using the PLL. The PLL needs at least 130ps to lock. However, 
as shown in Figure 6.9, 5cm off-chip transmission line makes the most latency. 
 
81 
 
 
 
 
0
100
200
300
400
TX Transformer T-line RX
La
te
n
cy
 (
p
s)
BB(PAM) RF-band
MDBI Blocks BB RF-band
TX 98ps 163ps
Transformer 4ps 4ps
T-line (5cm) 336ps 315ps
RX 186ps 168ps
Total latency 624ps 650ps  
Figure 6.9: Simulated latency of the MDBI system. 
 
6.3 Off-Chip Transmission Line 
The frequency characteristic of the off-chip transmission line on FR-4 PCB is 
analyzed. This analysis is so important because the RF-band signal may degrade in the 
carrier frequency of 20GHz. The channel is modeled by HFSS software [84] and data 
provided by the HFSS is used in the circuit simulations. The transmission line affects 
signals and it delays, disperses, and attenuates the signals. The use of differential 
transmission lines [22-23] for RF-band wave signals minimizes skew and can be resistant 
to common-mode noise. However, the differential transmission line needs more space 
and it can be a challenging case if we can design and use a single-ended transmission line. 
Therefore, a single-ended off-chip transmission line is employed to support RF carrier 
frequency and baseband signals. 
82 
 
 
 
 
RF carrier frequency generated by the PLL is 20GHz that should be through the off-
chip transmission line. Therefore, several high frequency effects must be considered and 
explained in electromagnetic text books. Some effects are: 
 Frequency-dependent dielectric loss effect: it is an energy loss caused by PCB 
material containing dipole molecules, where very high frequencies cause these 
molecules to oscillate and absorb energy. This has the effect of attenuating signals 
as they propagate down to a PCB. 
 Return path resistance effect: it is the effect of having non-zero impedance in the 
return path of a signal. 
 Skin effect: it is the tendency of current to move to the outside edges of a 
conductor as the frequency of the signal increases, owing to the conductor's self-
inductance 
 Crowding effect: it is the situation where conductors sense each other's presence 
through the distortion of the electromagnetic field, resulting in the high-frequency 
current bunching up close toward the side of the neighboring conductor 
 
As aforementioned, it is so important to design the optimal geometry of off-chip 
transmission line which is also matched to the off-chip impedance to minimize signal 
reflections and ringing due to impedance mismatch. After designing and simulating the 
transmission line by the HFSS simulator, a couple of test PCB boards with several 
different width traces are implemented and measured to correlate the HFSS simulator 
results to real test results.  
A test PCB for off-chip transmission line with different width traces is utilized to find 
the best width of transmission line. The characteristic impedance of PCB traces depends 
on the width of transmission lines. As the width of lines increases, the impedance 
variation of thicker width lines becomes much smaller that of thinner lines. We also 
considered the PCB thickness and checked transmission lines for two different PCB 
thickness 16mil and 32mil. The experimental results show that the less PCB thickness 
(16mil) and wider transmission line is preferred for minimal impedance discontinuities 
83 
 
 
 
 
and signal reflections. Therefore, the optimal off-chip transmission line is chosen by 
considering characteristic impedance.  
Figure 6.10 shows HFSS models of off-chip transmission line with wire-bonding.  
After decision of optimal differential transmission geometries, the full 3D-EM HFSS 
models of off-chip transmission line and wire-bonding must be simulated to generate 
accurate S-parameters of 5-cm off-chip channel between MDBI transceiver. Figure 6.11 
shows the simulated signal loss of the off-chip transmission line. The signal loss of an 
FR4-PCB 5cm physical wire is -0.9dB/Hz and -6.4dB/Hz at 5GHz and 20GHz 
respectively. 
 
FR4 PCB
VSS VSSS
Off-chip 
MDBI channel
Wire-bonding MDBI Chip
 
 
Figure 6.10: Channel modeling of off-chip transmission line with wire-bonding. 
84 
 
 
 
 
 
Figure 6.11: Simulated signal loss of off-chip 5cm FR4 transmission line. 
 
Figure 6.12 shows post-extracted simulation waveforms of the MDBI system at the 
start and end of the off-chip transmission line. The amplitude of the RF-band signal is 
almost 0.4V at the start point and this is degraded by the off-chip line and reduces to 
0.12V at the end point.   
 
Data 
(Multi-level BB and RF-band)
Band-selective 
transformer
f
f
f
f
MDBI Chip
MemoryCPU
B
B
TX
MDBI Chip
MDBI channel
 
f
R
FT
X
f
R
FR
X
B
B
R
X
G H
 
85 
 
 
 
 
 
(G) 
 
(H) 
Figure 6.12: Simultaneous bi-directional MDBI simulation waveforms for (G) the start and (H) 
the end of the off-chip transmission line. 
 
6.4 Chip Measurement Results 
A prototype off-chip MDBI transceiver chip is fabricated in the CMOS 65nm CMOS 
process to demonstrate simultaneous and bi-directional data communication. Figure 6.13 
shows Die Photo of MDBI transceiver which occupies an active area 0.24mm
2
. 
 
86 
 
 
 
 
 
Figure 6.13: Die photo of the MDBI transceiver. 
 
The test PCB layer stack-up configuration with two layers is shown in Figure 6.14. 
The critical factor to design PCB is layer thickness between top layer and bottom layer 
that should be carefully chosen to minimize the high frequency PCB material dielectric 
loss. The number of layers used in the test PCB board is two with thickness of 16MILS.  
 
16MILS 
1oz.
1oz.
SILKSCREEN (TOP Side)
SOLDERMASK (TOP Side)
SILKSCREEN (Bottom Side)
SOLDERMASK (Bottom Side)
 
Figure 6.14: Test PCB layer stack-up with 2 layers. 
 
Figure 6.15 shows the FR4 demo test board. In order to verify the capability of 
simultaneous bi-directional communication, an FR4 PCB has been designed and 
fabricated. The input and output signals and clocks are highlighted on the test board. 
They are laid out in a way that high speed signal is not distorted or reflected through the 
PCB lines.  
87 
 
 
 
 
We considered some design guidelines when designing and laying out the high-speed 
PCB test board. Some of useful rules considered are as below: 
 A narrow spacing between power and ground can also create an excellent high 
frequency bypass capacitance.  
 Power and ground pins should be connected to power and ground planes, 
respectively, with wide low-impedance traces.  
 50Q impedance design rules on power and ground traces should be avoided to 
provide a low-impedance connection.  
 Keeping ground PCB return paths short and keeping ground traces wide to 
provide a return path that creates the smallest loop for image currents to return 
along.  
 Surface mount capacitors should be used to minimize high frequency 
inductance effects and be located close to test chip pins. 
 
 
 
 
Figure 6.15: FR4 test board. 
88 
 
 
 
 
We employed several equipment to measure the MDBI chip such as, Agilent HP 
Modular Signal Generator (70340A), Agilent 86100D DCA-X Wide-Bandwidth 
Oscilloscope, Agilent 70843B Error Performance Analyzer, and Agilent Technologies 
PSA Series Spectrum Analyzers (E4440A). In order to measure three concurrent data 
communications, two Agilent 70843 Error Performance Analyzers generate three 
uncorrelated random data streams for PAM and RF band transceivers. The independent 
random data input stream sequences are transmitted to baseband and RF-band input 
connectors to demonstrate the simultaneous and bi-directional data communication. 
Finally the random data sequences are recovered and measured from the Agilent 86100A 
oscilloscope.  
Figure 6.16 shows the measured RF-band BER result at PRBS 2
23
-1. The BER is 
measured by BERT scope and the RF-band transceiver is working with error-free at 
4.2Gb/s (PRBS 2
15
-l). The BER is less than 10
-10
 at 3.5Gb/s at PRBS 2
23
-1, as shown in 
Figure 6.16 and is less than 10
-15
 at 4.2Gb/s at the same pattern. Furthermore, since the 
receiver mixer with differential input signals only senses the incoming signal’s amplitude, 
the frequency and phase synchronizations between RF TX and RX are not required. This 
greatly simplifies the overall memory I/O interface design. For the same reason, the BER 
is also expected to be better than that of phase sensitive modulation schemes. 
 
Figure 6.16: Measured RF-band BER results at PRBS 2
23
-l. 
89 
 
 
 
 
Figure 6.17 shows the measured eye diagrams of aggregate 13.4Gbps (two data 
streams 4.6Gb/s PAM + 4.2Gb/s RF-band) data rate on FR4 test board. The PAM band 
carries 9.2Gbps with a maximum rms jitter and peak-to-peak jitter 6.08ps and 37.09ps, 
respectively. The RF band carries 4.2Gbps data rate with a maximum rms jitter and peak-
to-peak jitter 5.31ps and 33.52ps, respectively. 
 
 
PAM Data 1 (4.6Gb/s)
 
90 
 
 
 
 
PAM Data 2 (4.6Gb/s)
 
RF-band (4.2Gb/s)
 
Figure 6.17: Measured eye diagrams of aggregate data rate on FR4 PCB board. 
91 
 
 
 
 
Conventional memory interfaces with base-band-only (BB-only) signaling operate at 
5Gb/s/pin [45], 6Gb/s/pin [46] and 2.15Gb/s/pin [48] with a power efficiency of 
17.4pJ/b/pin, 15.8pJ/b/pin and 6.6pJ/b/pin, respectively. Current mobile DDR memory 
I/O interfaces with dual band signaling (BB+RF) have better power efficiency of 
5pJ/b/pin at 4.2Gb/s/pin [22] and 4pJ/b/pin at 8Gb/s/pin [24] for simultaneous 
bidirectional communications due to transceiving dual data streams instead of one in BB-
only interfaces. However, the dual band interconnects (DBIs) are improved power 
efficiency, they still have a limited data rate. Table 6.1 compares the MDBI performance 
to that of prior memory I/O interfaces. Although the power is higher than prior interfaces 
due to using a PAM signaling and the PLL, the MDBI still exhibits higher aggregate data 
throughput (13.4Gb/s/pin) and better energy efficiency (~2.8pJ/b/pin) compared with 
prior arts. 
 
 
Table 6.1: Performance comparison of the MDBI system. 
 [45] [46] [48] [22] [24] This Work 
CMOS 
Technology 
180nm 130nm 40nm 65nm 65nm 65nm 
Bands BB BB BB BB+RF BB+RF BB(PAM)+RF 
Supply 1.8V 1.2V 1.1V 1.0V 1.2V 1.2V 
Signaling type Single-ended Single-ended differential differential Single-ended Single-ended 
T-line Length 10cm 5cm 7cm 10cm 5cm 5cm 
Aggregate data rate 5Gb/s/pin 6.0Gb/s/pin 2.15Gb/s/pin 4.2Gb/s/pin 8Gb/s/pin 13.4Gb/s/pin 
Communication Bidirectional Bidirectional Bidirectional 
Simultaneous 
Bidirectional 
Simultaneous 
Bidirectional 
Simultaneous 
Bidirectional 
Energy/bit/pin 17.4pJ/bit/pin 15.8pJ/bit/pin 6.6pJ/bit/pin 5pJ/bit/pin 4pJ/bit/pin 2.8pJ/bit/pin 
Total power 87mW 95mW 14.4mW 21mW 32mW 38mW 
Chip area 0.52mm2 0.30mm2 0.9mm2 0.14mm2 0.12mm2 0.24mm2 
Measured BER 
10-12 
(PRBS 215-1) 
10-12 
(PRBS 215-1) 
N/A 
10-15 
(PRBS 223-1) 
10-12 
(PRBS 215-1) 
10-15 
(PRBS 223-1) 
 
92 
 
 
 
 
6.5 Discussions 
A fully integrated 13.4Gb/s/pin MDBI transceiver has been designed, fabricated, and 
tested in a 65nm CMOS technology on a shared off-chip transmission line. The MDBI 
transceiver achieves an energy/bit/pin of 2.8pJ/bit/pin that is 1.4 times less than the most 
recent memory off-chip transmission line interfaces. The energy-efficiency can be 
improved if the PLL is replaced by a VCO since prior works employed a ring or LC VCO. 
Therefore, by excluding the PLL power consumption, the MDBI transceiver obtains 
1.5pJ/bit/pin that is 2.6 times less than the energy-efficiency of the transceiver reported in 
[24]. The multilevel signaling and ASK signaling used in the MDBI transceiver 
demonstrates the methods of improving bandwidth and signal integrity with less 
signaling power dissipation. In contrast to previous off-chip memory interfaces [22] [24], 
the proposed MDBI system transceives simultaneously three data streams on a shared 
transmission line. This happens since the baseband part of the MDBI system can 
transceive two data streams instead of one due to replacing a conventional BB transceiver 
with a 4-level PAM transceiver.  
The MDBI transceiver demonstrated in this work consists of the PAM transceiver and 
RF-band transceiver. The PAM transceiver uses a current mode transmitter and a dual-
sampling receiver to transceive two data streams through a shared single ended channel 
simultaneously. In the transmitter side of the PAM transceiver, a DAC-based driver is 
used to generate 4-level signal. In the conventional PAM transmitters, a leakage current 
in the transmission line is created due to the transistor switches used in the driver. In 
order to mitigate this current leakage, a novel leakage suppression control logic block is 
added in the PAM transmitter side to reduce the leakage in DRAM power-down/nap 
mode. A startup control signal block is also employed in the PAM transmitter to save the 
power. The PAM startup control signal turns the PAM transceiver on one clock cycle 
before the data strobe (DQS) signal comes from a microprocessor. It gives enough time 
for read/write. After ACT signal de-asserts, the PAM control logic turns off the PAM 
interface circuit to save power. The PAM receiver side uses a dual sampling to recover 
two data streams. This method increases the maximum achievable data rate since the 
93 
 
 
 
 
comparators utilize a parallel sampling method. This dual sampling technique also 
reduces the power consumption compared to prior works since the proposed method 
samples simultaneously two incoming signals and removes the need of two individual 
sampling circuits.  
The RF-band signaling used in the MDBI transceiver is designed with the least power 
consumption. The implementation of low energy per-bit RF-band transceiver is highly 
dependent upon the modulation and demodulation scheme of RF-interconnect system. 
One of the common modulation techniques in RF-band transceiver is BPSK modulation 
scheme that is a synchronous modulation scheme. Thus, this modulation scheme requires 
both phase and frequency synchronization between the transmitter and receiver. In fact, a 
Costas loop or re-modulation technique is necessary in the receiver side. Using additional 
circuits (i.e. frequency-locked loop (FLL)) increases the system power budget of the RF 
bands. This problem will be significantly critical in future multiband MDBI since each 
RF band receiver needs individual FLL with specific frequency band. The proposed RF-
band transceiver uses a novel ASK modulator in the transmitter side. The receiver of  
ASK (de)modulation scheme only senses the amplitude change of a RF carrier signal, so 
there is no need to detect and synchronize frequency or phase variation, which resulting 
in much simple receiver architecture and less power consumption. The differential self-
mutual mixer used in the RF-band receiver side is a non-coherent configuration as it lacks 
power hungry clocking circuits which reduce the power consumption by the mixer. 
Moreover, the MDBI transmitter side utilizes a digital impedance control logic circuit to 
avoid impedance mismatch and reduce sensitivity to process, voltage and temperature 
(PVT) variations. 
The carrier frequency in the RF-band transmitter side in prior works [22] and [24] is 
generated by a VCO while the carrier is produced by a PLL in the MDBI system. The 
proposed fully integrated PLL uses a novel multiply-by-10 ILFM in K-band frequency 
range. The proposed PLL effectively reduces the total PLL power because power hungry 
dividers such as standard CML dividers and injection-locked frequency dividers (ILFD) 
which operate at the highest frequency are eliminated in the PLL feedback loop. This 
PLL can provide the required clocks for a multi-chip-to-chip communications while prior 
94 
 
 
 
 
works need individually a VCO for each transceiver. Therefore, the power consumption 
is dramatically reduced by expanding the structure for a multi-chip-to-chip 
communications.  
 Compared to the conventional memory I/O interfaces which use differential off-chip 
transmission lines, the MDBI interface utilizes a shared single ended transmission line. A 
single-ended design uses only one PCB track as signal path. For differential design, the 
signal path includes a pair of PCB tracks. With a reference, this single-ended design is 
more area efficient than the differential design, which uses a pair of PCB tracks. The 
main advantage of the differential design is its noise immunity. In fact, common-mode 
noise in the receiver side is rejected by comparing the differential signals. This property 
is so beneficial for low-swing signaling, where the signal swing is reduced from supply 
voltage (VDD), then the receiver noise margin is significantly compromised. However, 
the differential signaling is incompatible with existing standards, and it also occupies a 
large Si area due to using differential transmission lines. In the MDBI system, to mitigate 
the common-mode noise in the RF-band receiver side, the differential self-mutual mixer 
removes the common-mode noise.  
An on-chip band-selective transformer is also used in the RF-band transceiver since 
the RF-band cannot be resistively terminated. The transformer acts as an impedance 
matching circuit that couples/decouples the RF bands to/from the transmission line. A 
digitally controlled on-die termination (ODT) is also utilized in both transformers of the 
transmitter and receiver side to set the common mode voltage and remove the impedance 
mismatch for optimal signal integrity. Therefore, the fully integrated MDBI system 
reduces the impedance discontinuities and the ISI on the channel, which therefore 
increases the available 5-cm PCB channel bandwidth up to 1.67 times (13.4Gb/s/pin) 
conventional memory I/O interfaces [24]. Moreover, the PAM signaling and RF signaling 
employed in the MDBI system are suitable for use in high-speed board-level chip-to-chip 
communications to achieve low latency, low power, and high signal integrity. 
One of the main issues in the memory I/O interfaces is channel degradation. The 
channel degradation limits the bandwidth. Increasing the operation frequency amplifies 
the channel degradation as shown in Figure 6.11. Therefore, the choose of  clock 
95 
 
 
 
 
frequency for the transmitter side of the RF band is mainly important since low operation 
frequency restricts data rate and makes interferences with the BB signaling and high 
operation frequency increases the channel degradation and ISI problems. As well as the 
channel degradation that limits the data rate, there are some issues in designing the 
transceivers. In the PAM transceiver, the maximum data rate can be limited by common-
mode range, voltage resolution and offset voltage of the comparators. When the various 
common-mode ranges are applied to the comparators, it will make them more 
complicated and unstable due to the noises at the different common modes. However, we 
use a dual sampling method to increase the data rate, but still the noise made by the 
comparators is an obstacle to increase the data rate. The noise problem can be solved by 
the proper selection of sampling range for each comparator in the PAM receiver side.  
Another obstacle that limits the data rate at the MDBI system is the use of ASK 
modulator technique in the RF-band transceiver. Since the receiver of the ASK 
(de)modulation scheme only senses the amplitude change of the RF carrier signal, the 
signal amplitude is degraded by increasing the channel length. To mitigate this problem, 
we should replace the ASK modulator technique with the phase modulation techniques 
like PSK and BPSK methods. However, these kinds of techniques are coherent 
modulation and need extra synchronization circuits, so they increase the power 
consumption resulting in degrading the system energy efficiency, as mentioned 
previously.  
Finally, the data rate can be increased by transceiving more data streams. The number 
of data streams can be increased by adding additional RF-band transceivers and/or 
increasing the level of the PAM transceiver (i.e. 10-level PAM [86]). In order to add RF-
band transceiver, a new model of a band-selective transformer is required to filter out 
each RF band. Moreover, the operation frequencies of each RF band should be enough 
far to prevent from frequency interferences. Higher frequencies increase the channel 
degradation, so in order to mitigate this problem a shorter channel length (< 5cm) or less 
data rate can be an accepted option. Another way to increase data rate of memory I/O 
interface is the increase of number of PAM level. There are varying degrees of PAM in 
the number of logical levels, but the required supply voltage is also increased with 
96 
 
 
 
 
increasing the degrees of PAM. On the other hand, with shrinking CMOS process that 
utilizes smaller supply rails, we have reached a bottleneck of how much data streams can 
fit in a band-limited channel with using PAM signaling. For the top supply rail voltage of 
1.2V, in an ideal eye opening of 4-level PAM for each level is 0.4V that this level will 
come out to be almost 150mV in the measurement. For example, a 10-level PAM 
reported in [86] that uses in a serial link transceiver has a 3.3V Supply voltage. The 
higher supply voltage is the higher power consumption resulting in degrading system 
energy efficiency.  
 
 
 
 
 
  
97 
 
 
 
 
Chapter 7: Conclusions and Future Work 
 
 
 
In this work, we investigated and presented a new energy-efficient multilevel dual-
band interconnect transceiver for high-speed communications and memory interfaces. 
Here we will summarize our work contributions and discuss the options for further 
improvement on energy efficiency and data rate for future implementation. 
 
7.1. Conclusions 
The proposed MDBI system consists of a 4-level PAM transceiver and a RF-band 
transceiver to transceive simultaneously and bi-directionally three data streams on a 
single-ended off-chip transmission line. We first present an ultra-low power phase-locked 
loop for K-band frequency applications and mainly focus on essential building blocks in 
the PLL and the multiply-by-10 injection-locked frequency multiplier. The designed PLL 
generates 20GHz carrier frequency for the ASK modulator used in the RF-band 
transceiver. The data streams injected to the ASK modulator send through a band-
selective transformer and an off-chip transmission line. Then, the wave signal received at 
the end of the transmission line is injected to the self-mutual mixer and amplified by the 
differential amplifiers. A 4-level PAM transceiver is also employed in the MDBI system. 
The baseband conventional binary signaling is replaced by this advanced baseband 
signaling and could transceive two data streams. The data coming from processor chip is 
first encoded with a thermo code and the current-mode output driver generates 4-level 
signals. After passing signals from the off-chip transmission line, the signals are 
compared with the reference voltages and then the recovered data from comparators is 
98 
 
 
 
 
decoded to 4-bit binary data. Based on the detailed discussion on the measurement and 
data analysis results, the main achievements of this work are: 
 
 A fully integrated phase-locked loop (PLL) using sub-harmonic multiply-by-
10 injection-locked frequency multiplier for K-band frequency applications is 
designed and fabricated. The differential ring voltage-controlled oscillator 
(VCO) used in the PLL generates a frequency range between 0.6~2.4GHz 
signal. The PLL output signal is then multiplied by a tenth-order injection-
locked frequency multiplier (ILFM). Using the multiply-by-10 ILFM causes 
the operational frequency of the VCO is reduced to only one-tenth of the 
desired frequency and removes power hungry high frequency dividers in the 
PLL feedback path to save power consumption. We also analyzed the stability 
of the PLL. The PLL demonstrates a measured tuning range of 18.7-21.2GHz 
and a phase noise of -91.35dBc/Hz at 1MHz at a center frequency of 20GHz. 
The power consumption of the PLL is 17.2mW that is the lower compared 
with prior arts. 
 A RF-band transceiver is designed and implemented to employ in multilevel 
dual-band interconnects (MDBI) system. We proposed and employed a non-
coherent ASK modulator for the transmitter side of the RF interconnect to send 
data streams through a channel transmission line. We also analyzed and 
simulated essential blocks such as ASK modulator, equalizer, self-mutual 
mixer, and amplifiers. Moreover, the band-selective transformer used in the 
MDBI system is analyzed and simulated. The transceiver is simulated in 
180nm CMOS technology at 1.5V. The results show the data bandwidth of 
3Gb/s/pin while the RF-band transceiver consumes 12mW at 1.5V. 
 A four-level pulse amplitude modulation (4-PAM) memory I/O interface is 
presented and implemented. The PAM transceiver uses a current-mode 
transmitter and a dual-sampling receiver to transmit and receive two data 
streams through a shared single ended channel simultaneously. The proposed 
99 
 
 
 
 
PAM transmitter reduces the leakage current consumption at power-down 
mode by using a startup control logic circuit. Besides, the transmitter utilizes a 
digital impedance control logic circuit to avoid impedance mismatch and 
reduce sensitivity to process, voltage and temperature (PVT) variations. The 
receiver side uses differential amplifiers to decode three voltage levels by 
comparing the PAM signal with three reference voltages. The transceiver is 
simulated in 180nm CMOS technology at 1.5V. The results show the increase 
of the data bandwidth to 6.4Gb/s. Energy efficiency of mobile PAM I/O 
memory interface is 1.9pJ/bit. 
 Integrating two transceivers, the 4-level PAM transceiver and the RF-band 
transceiver, along with the band-selective transformers makes a simultaneous 
bi-directional chip-to-chip communication. The proposed MDBI system is able 
to transceive simultaneously three data streams on a shared single-ended off-
chip transmission line. Instead of limiting the baseband operation within its 
linear-power-consumption region versus the bandwidth, we can now triple the 
interface bandwidth by using MDBI and still maintain the linear power-
consumption versus the bandwidth in each of the dual bands. The MDBI is 
implemented in 65nm CMOS process technology. The measurement results 
show the MDBI system achieves higher aggregate data throughput 
(13.4Gb/s/pin) and better energy efficiency (2.8pJ/b/pin) compared with prior 
arts.  
 We are reaching a point where we are pushing the limits of traditional RF 
interconnect technology for DRAM. Multiband interconnect is a good 
candidate to replace RF interconnect for interfacing to DRAM. It supports 
more concurrent logical channels, can operate at a higher frequency with lower 
power, can adapt to improve bandwidth utilization, and is completely 
compatible with modern CMOS technology. In additional, the MDBI is able to 
reduce the pin count by third while still maintaining the same throughput and 
with minimal degradation in power efficiency.  
100 
 
 
 
 
7.2. Future Work 
In this work, we focus on developing the use of low power multilevel dual-band 
interconnect which can communicate simultaneously and bi-directionally on a shared 
single-ended off-chip transmission line. Although a large amount of knowledge related to 
memory interfaces has been discussed, there are several areas in which further 
investigations can help to improve the performance of this work. These areas are 
discussed below. 
 
 This dual (BB+RF) band concept can further extended to Base+Multiple-RF 
bands in the future, such that multiple data streams can be simultaneously 
transmitted through a shared memory I/O interface transmission line, as long 
as a multi-band coupling scheme can be devised.  
 To increase data bandwidth, the 4-level PAM transceiver can be replaced by 
higher order PAM modulators. For instance, using a 6-level PAM 
transceiver+RF band transceiver can transceive four data streams. 
 To improve jitter performance of the MDBI system, some additional blocks 
can be added to the system. In the PAM transceiver, we can replace reference 
circuits used in the receiver side with accurate reference sources. One of the 
main sources of noises of the system resulting in degrading the system 
performance is supply noises. This issue can be mitigated if accurate supply 
voltages used. Furthermore, a duty-cycle correction (DCC) circuit in the 
transmitter/receiver side the MDBI system can improve jitter performance.   
  
101 
 
 
 
 
Bibliography 
 
[1] K. Oh, et al., “A 5-Gb/s/pin transceiver for DDR memory interface with a crosstalk 
suppression scheme,” IEEE J. Solid- State Circuits, vol. 44, pp. 2222-2232, Aug. 
2009. 
[2] J. Kim, et al., “A 3.6 Gb/s/pin simultaneous bidirectional (SBD) I/O interface for 
high-speed DRAM,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. 
Papers, pp.414-415, Feb. 2004. 
[3] M.F. Chang, et al., “RF/Wireless interconnect for inter- and intra-chip 
communications,” Proceedings of the IEEE, vol. 89, no 4, pp.456 - 466, April 2001. 
[4] E. Socher, and M.-C.F. Chang, “Can RF help CMOS processors? [Topics in Circuits 
for Communications],” IEEE Communications Magazine, vol.45, no.8, pp. 104-111, 
August 2007. 
[5] M.-C. Frank Chang, E. Socher, S.-W. Tarn, J. Cong, and G. Reinman, “RF 
interconnects for communication on-chip”, International Symposium on Physical 
Design, 2008. 
[6] M-C. F. Chang, et al., “Advanced RF/baseband interconnect schemes for inter- and 
intra-ULSI communications,” IEEE Trans. on Electron Devices, vol.52, no. 7, pp. 
1271-1285, July 2005. 
[7] B.A. Floyd, C.-M. Hung, K.K. O, “Intra-chip wireless interconnect for clock 
distribution implemented with integrated antennas, receivers, and transmitters,” IEEE 
J. Solid- State Circuits, vol.37, no.5, pp.543-552, May 2002. 
[8] Q. Gu, Z. Xu, J. Ko and F. Chang, “Two l0Gb/s/pin low-power interconnect methods 
for 3D ICs,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 
pp.448-614, Feb. 2007. 
[9] J. Ko, J. Kim, Z. Xu, Q. Gu, C. Chien and F. Chang, “An RF/baseband FDMA 
interconnect transceiver for reconfigurable multiple access chip-to-chip 
communication,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 
pp.338-602, Feb. 2005. 
102 
 
 
 
 
[10] R.T. Chang, N. Talwalkar, C.P. Yue, S.S. Wong, “Near speed-of-light signaling 
over on-chip electrical interconnects,” IEEE J. Solid- State Circuits, vol.38, no.5, pp. 
834-838, May 2003. 
[11] J. Kim, J. Pak, J. Cho, E. Song, J. Cho, T. Song, H. Kim, J. Lee, K. Park, S. Yang, 
M. Suh, K. Byun, and J. Kim, “High-frequency scalable electrical model and analysis 
of a through silicon via (TSV),” IEEE Trans. On Components, Packaging and 
Manufacturing Technology, vol. 1, no. 2, pp. 181-195, Feb. 2011. 
[12] K. Takahashi and M. Sekiguchi, “Through silicon via and 3-D wafer/chip 
stacking technology,” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp.114-117, 
June 2006. 
[13] U. Kang, et al., “8 Gb 3-D DDR3 DRAM using through-silicon-via technology,” 
in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 130-131, Feb. 
2009. 
[14] M. Jalalifar, and G.-S. Byun, “An energy-efficient mobile PAM memory interface 
for future 3D stacked mobile DRAMs,” in International Symp. on Quality Electronic 
Design (ISQED), pp.675-680, March 2014. 
[15] A. W. Topol, et al., “Three-dimensional integrated circuits,” IBM J. Res. Dev., vol. 
50, no. 4/5, pp. 491-506, Jul./Sep. 2006. 
[16] J. A. Burns, et al., “A wafer-scale 3-D circuit integration technology,” IEEE 
Trans. Electron Dev., vol. 53, no. 10, pp. 2507-2516, Oct. 2006. 
[17] S. M. Jung, “Highly cost effective and high performance 65 nm S3 (stacked 
single-crystal Si) SRAM technology with 25F2, 0.16 µm cell and doubly stacked 
SSTFT cell transistors for ultra-high density and high speed applications,” in IEEE 
Symp. VLSI Technology Dig. Tech. Papers, pp. 220-221, 2005. 
[18] K. T. Park et al., “A 45 nm 4 Gb 3-dimensional double-stacked multi-level 
NAND flash memory with shared bit line structure,” in IEEE Int. Solid-State Circuits 
Conf. (ISSCC) Dig. Tech. Papers, pp. 510-511, Feb. 2008. 
[19] S.-W. Tam, E. Socher, A. Wong, and M. C. F. Chang, “A simultaneous tri-band 
on-chip rf-interconnect for future network-on-chip,” in IEEE Symp. VLSI Circuits Dig. 
Tech. Papers, 2009, pp. 90-91, 2009. 
103 
 
 
 
 
[20] T. Toi, C. Menolfi, M. Ruegg, R. Reutemann, P. Buchmann, M. Kossel, T. Morf, 
J. Weiss, and M.L. Schmatz, “A 22-gb/s pam-4 receiver in 90-nm CMOS SOI 
technology,” IEEE J. Solid- State Circuits, vol. 41, no. 4, pp. 954-965, 2006. 
[21] B. Casper, J. Jaussi, F. O'Mahony, M. Mansuri, K. Canagasaby, J. Kennedy, E. 
Yeung, and R. Mooney, “A 20gb/s embedded clock transceiver in 90nm CMOS,” in 
IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 1334-1343, 2006. 
[22] G.-S. Byun, Y. Kim, J. Kim, S.-W. Tam, H.-H. Hsieh, P.-Y. Wu, C. Jou, J. Cong, 
G. Reinman, and M.-C.F. Chang, “An 8.4gb/s 2.5pj/b mobile memory I/O interface 
using simultaneous bidirectional dual (base+rf) band signaling,” in IEEE Int. Solid-
State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 488-490, 2011. 
[23] G.-S. Byun, Y. Kim, J. Kim, S.-W. Tam, and M. C F Chang, “An energy-efficient 
and high-speed mobile memory I/O interface using simultaneous bi-directional dual 
(base+rf)-band signaling,” IEEE J. Solid- State Circuits, vol. 47, no. 1, pp. 117-130, 
2012. 
[24] Y. Kim, G.-S. Byun, A. Tang, C.-P. Jou, H.-H. Hsieh, G. Reinman, J. Cong, and 
M.F. Chang, “An 8gb/s/pin 4pj/b/pin single-t-line dual (base-band+rf) band 
simultaneous bidirectional mobile memory i/o interface with inter-channel 
interference suppression,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. 
Papers, pp. 50-52, 2012. 
[25] Y. Ding and K. O. Kenneth, “A 21-GHz 8-modulus prescaler and a 20-GHz 
phase-locked loop fabricated in 130-nm CMOS,” IEEE J. Solid- State Circuits, vol. 
42, no. 6, pp. 1240–1249, Jun. 2007. 
[26] J. Kim, J. K. Kim, B. J. Lee, N. Kim, D. K. Jeong, and W. Kim, “A 20-GHz 
phase-locked loop for 40-Gb/s serializing transmitter in 0.13-um CMOS,” IEEE J. 
Solid-State Circuits, vol. 41, no. 4, pp. 899-908, Apr. 2006. 
[27] S. A. Osmany, F. Herzel, and J. C. Scheytt, “An integrated 0.6-4.6 GHz, 5-7 GHz, 
10-14 GHz, and 20-28 GHz frequency synthesizer for software-defined radio 
applications,” IEEE J. Solid-State Circuits, vol.45, no.9, pp.1657-1668, Sept. 2010. 
104 
 
 
 
 
[28] A. Musa, R. Murakami, T. Sato, W. Chaivipas, K. Okada, and A. Matsuzawa, “A 
low phase noise quadrature injection locked frequency synthesizer for MM-wave 
applications,” IEEE J. Solid-State Circuits,  vol.46, no.11, pp.2635-2649, Nov. 2011. 
[29] J. He, J. Li, D. Hou, Y.-Z. Xiong, D. L. Yan, M. A. Arasu, and M. Je, “A 20-GHz 
VCO for PLL synthesizer in 0.13-μm BiCMOS,” in IEEE Radio-Frequency 
Integration Technology (RFIT), pp.231-233, Nov. 2012. 
[30] U. Singh and M. M. Green, “High-frequency CML clock dividers in 0.13-µm 
CMOS operating up to 38 GHz,” IEEE J. Solid-State Circuits, vol. 40, no. 8, pp. 
1658-1661, Aug. 2005. 
[31] J. Lee and B. Razavi, “A 40-GHz frequency divider in 0.18-μm CMOS 
technology,” IEEE J. Solid-State Circuits, vol. 39, pp. 594-601, Apr. 2004. 
[32] Y.-T. Chen, M.-W. Li, H.-C.Kuo, T.-H. Huang, and H.-R. Chuang, “Low-
voltage K -band divide-by-3 injection-locked frequency divider with floating-source 
differential injector,” IEEE Trans. Microw. Theory Tech., vol.60, no.1, pp.60-67, Jan. 
2012. 
[33] T.-N. Luo, S.-Y. Bai, and Y.-J. E. Chen, “A 60-GHz 0.13-µm CMOS divide-by-
three frequency divider,” IEEE Trans. Microw. Theory Tech., vol. 56, no. 11, pp. 
2409-2415, Nov. 2008. 
[34] M. Tiebout, “A CMOS direct injection-locked oscillator topology as high-
frequency low-power frequency divider,” IEEE J. Solid-State Circuits, vol.39, no.7, 
pp.1170-1174, July 2004. 
[35] S. K. Reynolds, B. A. Floyd, U. R. Pfeiffer, T. Beukema, J. Grzyb, C. Haymes, B. 
Gaucher, and M. Soyuer, “A silicon 60-GHz receiver and transmitter chipset for 
broadband communications,” IEEE J. Solid-State Circuits, vol. 41, no. 12, pp. 2820-
2831, Dec. 2006. 
[36] B. A. Floyd, “A 16-18.8-GHz sub-integer-N frequency synthesizer for 60-GHz 
transceivers,” IEEE J. Solid-State Circuits, vol. 43, no. 5, pp. 1076-1086, May 2008. 
[37] C.-Y. Wu, M.-C. Chen, and Y.-K. Lo, “A phase-locked loop with injection-locked 
frequency multiplier in 0.18-µm CMOS for V-band applications,” IEEE Trans. 
Microw. Theory Tech., vol.57, no.7, pp.1629-1636, July 2009. 
105 
 
 
 
 
[38] C.-C. Wang, Z. Chen, and P. Heydari, “W-band silicon-based frequency 
synthesizers using injection-locked and harmonic triplers,” IEEE Trans. Microw. 
Theory Tech., vol.60, no.5, pp.1307-1320, May 2012. 
[39] E. Temporiti, G. Albasini, I. Bietti, R. Castello, and M. Colombo, “A 700-kHz 
bandwidth ΣΔ fractional synthesizer with spurs compensation and linearization 
techniques for WCDMA applications,” IEEE J. Solid-State Circuits, vol.39, no.9, pp. 
1446-1454, Sept. 2004. 
[40] W. Rhee, “Design of high-performance CMOS charge pumps in phase-locked 
loops,” in proc. IEEE International Symposium on Circuits and Systems (ISCAS), 
pp.545-548, July 1999. 
[41] M. Jalalifar and G.-S. Byun, “Near-threshold charge pump circuit using dual 
feedback loop,” Electronics Letters, vol.49, no.23, pp.1436-1438, Nov. 2013. 
[42] W. Xu, and E.G. Friedman, “Clock feedthrough in CMOS analog transmission 
gate switches,” in Annu. IEEE Int. ASIC/SOC Conf., pp.181-185, Sept. 2002. 
[43] J. D. Van Der Tang, D. Kasperkovitz, and A. van Roermund, “A 9.8-11.5-GHz 
quadrature ring oscillator for optical receivers,” IEEE J.  Solid-State Circuits, vol.37, 
no.3, pp.438-442, Mar. 2002. 
[44] A. Hajimiri, S. Limotyrakis, and T. Lee, “Jitter and phase noise in ring oscillators,” 
IEEE J. Solid-State Circuits, vol. 34, no. 6, pp. 790–804, Jun. 1999. 
[45] Y.A. Eken, and J. Uyemura, “A 5.9-GHz voltage-controlled ring oscillator in 
0.18-μm CMOS,” IEEE J. Solid-State Circuits, vol.39, no.1, pp.230-233, Jan. 2004. 
[46] S. Levantino, C. Samori, A. Bonfanti,  S. L. J. Gierkink, A. L. Lacaita, and V. 
Boccuzzi, “Frequency dependence on bias current in 5 GHz CMOS VCOs: impact on 
tuning range and flicker noise upconversion,” IEEE J. Solid-State Circuits, vol.37, 
no.8, pp.1003-1011, Aug 2002. 
[47] W. O’Keese, “An analysis and performance evaluation of a passive filter design 
technique for charge pump PLL’s,” National Semiconductor Application Note 1001, 
2001. 
106 
 
 
 
 
[48] K. Takano, M. Motoyoshi, and M. Fujishima, “4.8GHz CMOS frequency 
multiplier with subharmonic pulse-injection locking,” in IEEE Asian Solid-State 
Circuits Conference, pp.336-339, Nov. 2007. 
[49] L. Zhang, D. Karasiewicz, B. Cifctioglu, and H. Wu, “A 1.6-to-3.2/4.8 GHz dual-
modulus injection-locked frequency multiplier in 0.18μm digital CMOS,” in IEEE 
Radio Frequency Integrated Circuit (RFIC), pp.427-430, 2008. 
[50] W.L. Chan and J.R. Long, “A 56-65 GHz injection-locked frequency tripler with 
quadrature outputs in 90-nm CMOS,” IEEE J. Solid-State Circuits, vol.43, no.12, 
pp.2739-2746, Dec. 2008. 
[51] P.-H. Feng and S.-I. Liu, “A current-reused injection-locked frequency 
multiplication/division circuit in 40-nm CMOS,” IEEE Trans. Microw. Theory Tech., 
vol.61, no.4, pp.1523-1532, April 2013. 
[52] M. Babaie, and R.B. Staszewski, “A class-F CMOS oscillator,” IEEE J. Solid-
State Circuits, vol.48, No.12, pp.3120-3133, Dec. 2013. 
[53] M.-C. Chen and C.-Y. Wu, “Design and analysis of CMOS subharmonic 
injection-locked frequency triplers,” IEEE Trans. Microw. Theory Tech., vol. 56, no. 
8, pp. 1869–1878, Aug. 2008. 
[54] D. B. Leeson, “A simple model of feedback oscillator noise spectrum,” Proc. 
IEEE, 54, (2), pp. 329-330, Feb. 1966. 
[55] J. Lee, Y. Chen, and Y. Huang, “A low-power low-cost fully-integrated 60-GHz 
transceiver system with OOK modulation and on-board antenna assembly,” IEEE J. 
Solid-State Circuits, vol.45, no.2, pp.264-275, Feb. 2010. 
[56] F.S. Lee and A.P. Chandrakasan, “A 2.5 nJ/bit 0.65 V pulsed UWB receiver in 90 
nm CMOS,” IEEE J. of Solid State Circuits, vol.42, no. 12, pp.2851-2859, Dec. 2007. 
[57] Barrie Gilbert, “A precise four-quadrant multiplier with sub-nanosecond response,” 
IEEE J. of Solid State Circuits, vol. 3, no. 4, pp. 365-373, 1968. 
[58] Craig Timothy Remund, Design of cMOS four-quadrant Gilbert cell multiplier 
circuits in weak and moderate inversion, Ph.D. thesis, Brigham Young University, 
2011. 
107 
 
 
 
 
[59] G. Han and E. Sanchez-Sinencio, “CMOS transconductance multipliers: a 
tutorial,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal 
Processing, vol. 45, no. 12, pp. 1550-1563, Dec 1998. 
[60] S.-C. Qin and R.L. Geiger, “A +/-5-v CMOS analog multiplier,” IEEE J. of Solid 
State Circuits, vol. 22, no. 6, pp. 1143-1146, Dec 1987. 
[61] K. Lee, S. Lee, and H. Yoo, “Low-power network-on-chip for high performance 
SoC design,” IEEE Trans. VLSI Syst., vol. 14, no. 2, pp. 148–160, Feb. 2006. 
[62] A. K. Mishra, R. Das, S. Eachempati, R. Iyer, N. Vijaykrishnan, and C. R. Das, 
“A case for dynamic frequency tuning in on-chip networks,” in Proc. 42nd Annu. 
IEEE/ACM Int. Symp. on Microarchitecture, 2009, pp. 292–303, ser. MICRO 42. 
[63] A. K. Mishra, A. Yanamandra, R. Das, S. Eachempati, R. Iyer, N. Vijaykrishnan, 
and C. R. Das, “Raft: Arouter architecture with frequency tuning for on-chip 
networks,” J. Parallel Distrib. Comput., vol. 71, pp.625–640, May 2011. 
[64] Y.S.-C. Huang, K.C.-K. Chou, K. Chung-Ta, “Application-Driven End-to-End 
Traffic Predictions for Low Power NoC Design,” IEEE Trans. VLSI Syst., vol.21, 
no.2, pp.229-238, Feb. 2013. 
[65] S. Garg, D. Marculescu, R. Marculescu, U. Ogras, “Technology-driven limits on 
DVFS controllability of multiple voltage-frequency island designs: A system-level 
perspective,” in. Proc. 46th ACM/IEEE DAC, 2009. 
[66] W. J. Dally, “Enabling technology for on-chip interconnection networks,” in 
Proc. IEEE/ACM Int. Symp. Netw.-on-Chip, 2007. 
[67] A. Hansson, K. Goossens, and A. Radulescu, ‘‘A unified approach to constrained 
mapping and routing on network-on chip architectures,’’ in. Proc. 3rd 
IEEE/ACM/IFIP Int. Conf. Hardware/Software Codesign and System Synthesis 
(CODES+ISSS), pp. 75-80, 2005. 
[68] E. Mensink, et al., “Power efficient gigabit communication over capacitively 
driven RC-limited on-chip interconnects,” IEEE J. Solid-State Circuits, vol. 45, no. 2, 
pp. 447-457, 2010. 
108 
 
 
 
 
[69] J. Seo, D. Blaauw, and D. Sylvester, “Crosstalk-aware PWM-based on-chip links 
with self-calibration in 65 nm CMOS,” IEEE J. Solid-State Circuits, vol. 46, pp. 
2041–2052, Sep. 2011. 
[70] D. Schinkel, E. Mensink, E. A. M. Klumperink, et al., “A 3-Gb/s/ch  transceiver 
for 10-mm uninterrupted RC-limited Global on-chip interconnects,” IEEE J. Solid-
State Circuits, vol. 41, no. 1, pp. 297-306, Jan., 2006. 
[71] J. Seo, et al., “High-bandwidth and low-energy on-chip signaling with adaptive 
pre-emphasis in 90nm CMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. 
Tech. Papers, 2010, pp. 182–18. 
[72] R. Ho, K. W. Mai, and M. A. Horowitz, “Efficient on-chip global interconnects,” 
in IEEE Symp. VLSI Circuits Dig. Tech. Papers, 2003, pp. 271–274. 
[73] B. Leibowit, et al., “A 4.3 GB/s mobile memory interface with power efficient 
bandwidth scaling,” IEEE J. Solid-State Circuits, vol. 45, no.4, pp. 889–898, Apr. 
2010. 
[74] T.-Y. Oh, et al., “A 7 Gb/s/pin GDDR5 SDRAM with 2.5 ns bank-to bank active 
time and no bank-group restriction,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) 
Dig. Tech. Papers, pp. 434–435, Feb. 2010. 
[75] T. Granberg, Handbook of digital techniques for high-speed design. Englewood 
Cliffs, NJ: Prentice Hall PTR, 2004. 
[76] K. Fukuda, et al., “A 12.3mW 12.5Gb/s complete transceiver in 65nm CMOS,” in 
IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 368-369, Feb. 
2010. 
[77] K. Farzan and D. A. Johns, “A CMOS 10-Gb/s power-efficient 4-PAM 
transmitter,” IEEE J. Solid-State Circuits, vol. 39, no. 3, pp. 529–532, Mar. 2004. 
[78] J.-H. Kim, et al., “A 4-Gb/s/pin low-power memory I/O interface using 4-level 
simultaneous bi-directional signaling,” IEEE J. Solid-State Circuits, vol. 40, no. 1, 
pp. 89–101, Jan. 2005. 
[79] R. Ho, T. Ono, R. D. Hopkins, A. Chow, J. Schauer, F. Y. Liu, and R. Drost, 
“High speed and low energy capacitively driven on-chip wires,” IEEE J. Solid-State 
Circuits, vol. 43, no. 1, pp. 52–60, Jan. 2008. 
109 
 
 
 
 
[80] D. Walter, S. Hoppner, H. Eisenreich, G. Ellguth, S. Henker, S. Hanzsche, R. 
Schuffny, M. Winter, G. Fettweis, “A source-synchronous 90Gb/s capacitively driven 
serial on-chip link over 6mm in 65nm CMOS,” in IEEE Int. Solid-State Circuits 
Conf. (ISSCC) Dig. Tech. Papers, 2012, pp.180-182. 
[81] V. Venkatraman and W. Burleson, “An energy-efficient multi-bit quaternary 
current-mode signaling for on-chip interconnects,” in Proc. Custom Integrated 
Circuits Conf. (CICC), 2007, pp. 301–304. 
[82] A. P. Jose and K. L. Shepard, “Distributed loss compensation for low latency on-
chip interconnects,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. 
Papers, Feb. 2006, pp. 516–517. 
[83] R. T. Chang, C. P. Yue, and S. S. Wong, “Near speed-of-light on-chip electrical 
interconnect,” in Symp. VLSI Circuits, Dig. Tech. Papers, Jun.2002, pp. 18–21. 
[84] High Frequency Structure Simulator [Online].  
http://www.ansys.com/ 
[85] Cadence Virtuoso Spectre Circuit Simulator [Online].  
http://www.cadence.com/products/rf/spectre_circuit/pages/default.aspx. 
[86] B. Song, K. Kim, J. Lee, and J. Burm, “ A 0.18 μm CMOS 10-Gb/s dual-mode 
10-PAM serial link transceiver,” IEEE Trans. Circuits Syst. I: Regular Papers, 
vol.60, no.2, pp.457-468, Feb. 2013.  
