Design Techniques for Energy Efficient Multi-GB/S Serial I/O Transceivers by Song, Younghoon
  
 
DESIGN TECHNIQUES FOR ENERGY EFFICIENT MULTI-GB/S SERIAL I/O 
TRANSCEIVERS 
 
 
A Dissertation 
by 
YOUNG HOON SONG 
 
Submitted to the Office of Graduate and Professional Studies of 
Texas A&M University 
in partial fulfillment of the requirements for the degree of 
 
DOCTOR OF PHILOSOPHY 
 
 
Chairs of Committee,   Samuel Palermo 
Committee Members,  Edgar Sanchez-Sinencio 
  Kai Chang 
                                           Duncan M Walker 
Head of Department,  Chanan Singh 
 
May 2014 
 
 
Major Subject: Electrical Engineering 
 
Copyright 2014 Young Hoon Song 
 ii 
 
ABSTRACT 
 
Total I/O bandwidth demand is growing in high-performance systems due to the 
emergence of many-core microprocessors and in mobile devices to support the next 
generation of multi-media features. High-speed serial I/O energy efficiency must 
improve in order to enable continued scaling of these parallel computing platforms in 
applications ranging from data centers to smart mobile devices.  
The first work, a low-power forwarded-clock I/O transceiver architecture is presented 
that employs a high degree of output/input multiplexing, supply-voltage scaling with 
data rate, and low-voltage circuit techniques to enable low-power operation. The 
transmitter utilizes a 4:1 output multiplexing voltage-mode driver along with 4-phase 
clocking that is efficiently generated from a passive poly-phase filter. The output driver 
voltage swing is accurately controlled from 100-200 mVppd using a low-voltage pseudo-
differential regulator that employs a partial negative-resistance load for improved low 
frequency gain. 1:8 input de-multiplexing is performed at the receiver equalizer output 
with 8 parallel input samplers clocked from an 8-phase injection-locked oscillator that 
provides more than 1UI de-skew range.  
Low-power high-speed serial I/O transmitters which include equalization to 
compensate for channel frequency dependent loss are required to meet the aggressive 
link energy efficiency targets of future systems. The second work presents a low power 
serial link transmitter design that utilizes an output stage which combines a voltage-
mode driver, which offers low static-power dissipation, and current-mode equalization, 
 iii 
 
which offers low complexity and dynamic-power dissipation. The utilization of current-
mode equalization decouples the equalization settings and termination impedance, 
allowing for a significant reduction in pre-driver complexity relative to segmented 
voltage-mode drivers. Proper transmitter series termination is set with an impedance 
control loop which adjusts the on-resistance of the output transistors in the driver 
voltage-mode portion. Further reductions in dynamic power dissipation are achieved 
through scaling the serializer and local clock distribution supply with data rate.  
Finally, it presents that a scalable quarter-rate transmitter employs an analog-
controlled impedance-modulated 2-tap voltage-mode equalizer and achieves fast power-
state transitioning with a replica-biased regulator and ILO clock generation. 
Capacitively-driven 2 mm global clock distribution and automatic phase calibration 
allows for aggressive supply scaling.  
  
 iv 
 
DEDICATION 
 
To my parents, brother, sister and parents-in-law, and  
to my dearest wife, Hyeok Kim, and adorable daughters,  
Sumin Kelly Song and Shua Song 
 
 
I am grateful for the encouragement and understanding as well as support from my 
parents, brother, sister and parents-in law. Especially, I am grateful to my lovely wife 
and two daughters, Hyeok Kim, Sumin Kelly Song, and Shua Song for their love, 
encouragement, patience and sacrifice. I couldn’t have successively finished this long 
journey without them. 
 
 
  
 v 
 
ACKNOWLEDGMENTS 
 
First of all, I would like to express my sincere gratitude to my advisor, Dr. Samuel 
Palermo, for his support and guidance throughout my graduate studies at Texas A&M 
University. I greatly benefited from his deep intuition and strong knowledge in analog 
and mixed circuit and system design in serial link I/O. I also want to thank my PhD 
committee members, Dr. Edgar Sanchez-Sinencio, Dr. Kai Chang, and Dr. Duncan M 
Walker, for agreeing to serve on my committee and for their time spent on my 
committee. Also, special thanks to Dr. Patrick Yin Chiang from Oregon State University, 
for his guidance and encouragement during SRC projects. 
I would like to thank the graduate students who worked with me on my research 
projects at Texas A&M University and Oregon State University; namely, Ehsan Zhian-
Tabasy, Noah Hae Yang, Byungho Min, Rui Bai, Hao Lee, and Kangmin Hu.  
I also want to express my appreciation to all my colleagues in the TAMU Analog and 
Mixed Signal Center (AMSC) for helpful conversations regarding research and course 
projects; especially to Jusung Kim, Raghavendra Kulkarni, Hyung-Joon Jeon, Hajir 
Hedayati, and Youngtae Kim. Furthermore, special thanks goes to the secretary of 
AMSC group, Ella Gallagher, for her kind help. 
I would like to thank my internship mentors, Sungho Lee at Broadcom company and 
Tod Dickson at IBM company, who spent much time and effort discussing with me 
technical issues and solutions in I/O application,  
 
 
 vi 
 
NOMENCLATURE 
 
CMOS Complementary Metal Oxide Semiconductor 
I/O Input and Output 
FO4 Fanout-of-4 
ISI Inter-Symbol Interference  
MUX Multiplexing 
DMUX De-Multiplexing 
CML Current Mode Logic  
PPF Passive Poly-phase Filter   
DJ Deterministic Jitter 
CTLE Continuous Time Linear Equalization   
UI Unit Interval 
ILRO Injection Lock Ring Oscillator  
PLL Phase locked loop 
DLL Delay locked loop 
BER Bit Error Rate 
VM Voltage-Mode 
CM Current-Mode 
TX Transmitter 
RX Receiver 
PRBS Pseudo-Random Binary Sequency 
 vii 
 
FIR Finite Impulse Response 
PCB Printed Circuit Board 
GP General Purpose 
LDO Low Drop Out 
DAC Digital-to-Analog Converter 
 
  
 viii 
 
TABLE OF CONTENTS 
              Page 
ABSTRACT......................................................................................................................ii 
DEDICATION.................................................................................................................iv 
ACKNOWLEDGMENTS................................................................ ...............................v 
NOMENCLATURE..........................................................................................................vi 
TABLE OF CONTENTS.................................................................................................viii 
LIST OF FIGURES........................................................................ .................................x 
LIST OF TABLES ........................................................................................................xvi 
I. INTRODUCTION..........................................................................................................1 
I.1. Motivation.................................................................................................................1 
I.2. Dissertation Organization.........................................................................................3 
II. BACKGROUND..........................................................................................................6 
     II.1. Energy Efficiency Transceiver Design Consideration............................................6 
        II.1.1. Channel.......................................................................................................8 
        II.1.2. Data rate....................................................................................................11 
     II.2. Transmitter Design Consideration.........................................................................12 
        II.2.1. Transmitter equalization techniques..........................................................15 
     II.3. Receiver Design Consideration.............................................................................21 
        II.3.1. Receiver data path......................................................................................22 
     II.4. Power Management...............................................................................................27 
        II.4.1. Power supply voltage scaling.....................................................................28 
        II.4.2. Fast power switching bandwidth scaling....................................................29  
III. ENERGY EFFICIENT TRANSCEIVER DESIGN....................................................31 
     III.1. Introduction..........................................................................................................31 
 III.2. Transceiver Architecture Considerations.............................................................32 
            III.2.1. Transmitter.................................................................................................32 
            III.2.2. Receiver.....................................................................................................36 
            III.2.3. Proposed transceiver architecture..............................................................40 
 III.3. Transmitter...........................................................................................................41 
 ix 
 
            III.3.1. Local multi-phase clock generation...........................................................42 
            III.3.2. Level-shifting pre-driver............................................................................44 
III.3.3. Output driver..............................................................................................45 
            III.3.4. Global impedance controller......................................................................48           
III.4. Receiver................................................................................................................50 
       III.4.1. CTLE and quantizers..................................................................................50 
       III.4.2. ILRO clocking............................................................................................51 
III.5. Experimental Results............................................................................................53 
III.6. Summary..............................................................................................................64 
IV. HYBRID VOLTAGE-MODE TRANSMITTER WITH CURRENT MODE 
EQUALIZATION........................................................................................................65 
 IV.1. Introduction..........................................................................................................65 
     IV.2. Proposed Transmitter Equalization Techniques..................................................67 
     IV.3. Proposed Transmitter Architecture......................................................................73 
     IV.4. Experimental Results...........................................................................................82 
     IV.5. Summary..............................................................................................................92 
V.  IMPEDANCE-MODULATED VOLTAGE-MODE TRNASMITTER WITH   
      FAST POWER STATE TRANSITIONING...............................................................93 
 V.1. Introduction..........................................................................................................93 
     V.2. Low Power Transmitter Design Techniques.........................................................95 
            V.2.1. Global clock distribution ……………........................................................96 
            V.2.2. Voltage-mode transmitter equalization………………...............................98 
     V.3. Multi-Channel Transmitter Architecture……….................................................100 
     V.4. Transmitter Channel Design…….......................................................................103 
           V.4.1. Transmitter block diagram with digital phase calibration........................103 
           V.4.2. Output driver.............................................................................................105 
           V.4.3. Global impedance control and modulation loop.......................................107 
           V.4.4. Fast switching replica based voltage regulator.........................................109 
     V.5. Experimental Results..........................................................................................112 
     V.6. 4:1 Output Multiplexing Transmitter..................................................................120 
     V.7. Summary…………………………...................................................................123 
VI. CONCLUSION AND FUTURE WORK ………......................................................125 
     VI.1. Conclusion........................................................................................................125 
 VI.2. Recommendations For Future Work…...........................................................128 
REFERENCES.................................................................................................................130 
 x 
 
LIST OF FIGURES 
Page 
Fig. 1.1. Energy efficiency versus year of published serial I/O transceivers.............1 
Fig. 1.2. Energy efficiency versus data rate of published serial I/O transceivers......2 
Fig. 2.1. A multi-data-channel embedded-clock I/O architecture.............................6 
Fig. 2.2. A multi-data-channel forwarded-clock I/O architecture.............................7 
Fig. 2.3. Single board channel...................................................................................8 
Fig. 2.4. Backplane channel......................................................................................8 
Fig. 2.5. The channel (a) frequency response and (b) single pulse bit response.......9 
Fig. 2.6. Energy efficiency versus channel loss of serial I/O transceivers...............10 
Fig. 2.7. Inverter FO4 delay versus VDD in general 65nm CMOS technology......11 
Fig. 2.8. (a) Current mode driver versus (b) voltage mode driver with current 
consumption comparison............................................................................13 
Fig. 2.9. Voltage-mode driver with impedance control (a) by supply regulated    
pre-driver (b) by the selection of segmented pre-driver.............................14 
Fig. 2.10. 2-Tap de-emphasis waveform with equalization key specification...........16 
Fig. 2.11. (a) Implementation of 2-tap FIR equalization in low-swing voltage     
mode drivers with segmented resistive voltage divider (b) equivalent           
output driver circuitry.................................................................................17 
Fig. 2.12. (a) Implementation of 2-tap FIR equalization current-mode driver (b) 
equivalent output driver circuitry...............................................................19 
Fig. 2.13. Forwarded clock with (a) DLL/PLL and PI based (b) ILO based      
receiver architecture...................................................................................21 
Fig. 2.14. Schematic of RX CTLE with tuning circuitry...........................................23 
Fig. 2.15. Simulated AC response of CTLE by (a) capacitor tuning (b) resistor 
tuning.........................................................................................................24 
 xi 
 
Fig. 2.16. (a) One-stage strongARM comparator (b) two-stage low-voltage 
comparator with integrating stage..............................................................25 
Fig. 2.17. One-stage strongARM comparator and two-stage low-voltage    
comparator with integrating stage comparison (a) clock to data delay 
versus power supply (b) power versus power supply.................................27 
Fig. 2.18. Adaptive power-supply regulator overview...............................................28 
Fig. 2.19. Interface bandwidth adapting to instantaneous bandwidth requirements..29 
Fig. 3.1. Output multiplexing approaches for voltage-mode drivers:                       
(a) producing an output data pulse with two-transistor output        
segments, (b) producing   an output data pulse with a pulse-clock           
and a single-transistor output segment.......................................................32 
Fig. 3.2. Transmitter architectures with different output multiplexing factors:  
(a)1:1, (b)4:1, (c)8:1...................................................................................34 
Fig. 3.3. Simulated 8Gb/s transmitter performance with varying output  
multiplexing factors: (a) deterministic jitter versus supply voltage,          
(b) dynamic power consumption................................................................35 
Fig. 3.4. A forwarded-clock 1:N receiver architecture............................................36 
Fig. 3.5. Key receiver circuitry simulated performance versus supply voltage:       
(a) ring oscillator phase variation, (b) quantizer delay..............................38 
Fig. 3.6. Receiver power consumption versus de-multiplexing factor....................39 
Fig. 3.7. The implemented single-data-channel low-power forwarded-clock 
transceiver block diagram..........................................................................41 
Fig. 3.8. 4:1 output multiplexing transmitter block diagram....................................42 
Fig. 3.9. Passive poly-phase filter I and Q phase spacing versus frequency............43 
Fig. 3.10. CML-to-CMOS converter with duty-cycle and phase spacing 
compensation..............................................................................................44 
Fig. 3.11. Level-shifting pre-driver............................................................................45 
Fig. 3.12. Level-shifting pre-driver simulated operation: (a) input pulse-clock and 
data signals, (b) output data pulse before and after level shifting..............45 
 xii 
 
Fig. 3.13. Low-voltage regulator utilizing a pseudo-differential error amplifier     
with partial negative-resistance load..........................................................47 
Fig. 3.14. Low-voltage regulator simulated performance with various negative 
resistance settings: (a) error amplifier gain versus frequency, (b) supply 
step response from 0 to 0.65 V with VREF=120 mV................................47 
Fig. 3.15. Global output driver impedance controller................................................49 
Fig. 3.16. Simulated AC response of CTLE by resistor tuning.................................50 
Fig. 3.17. Two-stage comparator with current offset control....................................51 
Fig. 3.18. ILRO schematic.........................................................................................51 
Fig. 3.19. Simulated impact of clock injection approach on phase spacing 
uniformity..................................................................................................52 
Fig. 3.20. I/O transceiver chip micrograph................................................................53 
Fig. 3.21. (a) Measurement Setup. (b) Testing PCB board.......................................54 
Fig. 3.22. (a) 4.8Gb/s, (b) 6.4Gb/s, and (c) 8 Gb/s transmitter output eye 
diagrams.....................................................................................................55 
Fig. 3.23. Clock pattern (1010...) at 8 Gb/s Data rates (a) duty cycle (b) clock 
jitter............................................................................................................56 
Fig. 3.24. 4:1 output-multiplexing transmitter phase spacing maximum DNL     
versus supply voltage.................................................................................57 
Fig. 3.25. Transmitter output impedance versus VREF............................................58 
Fig. 3.26. Receiver de-skew range.............................................................................59 
Fig. 3.27. Frequency response of 3.5” FR4 trace and interconnect cables.................59 
Fig. 3.28. (a) Transceiver BER performance with optimal TX/RX supply voltages 
and CTLE settings, (b) transceiver BER with minimum CTLE peaking 
settings.......................................................................................................60 
Fig. 3.29. Transceiver energy efficiency versus data rate..........................................61 
Fig. 4.1. The proposed transmitter for clock forwarded link....................................66 
 xiii 
 
Fig. 4.2. (a) Implementation of 2-tap FIR equalization in low-swing voltage-    
mode driver with shunting resistor network (b) equivalent output        
driver circuitry............................................................................................68 
Fig. 4.3. (a) Implementation of 2-tap FIR equalization in proposed low-swing 
voltage-mode driver with current-mode equalization and (b) equivalent 
output driver circuitry.................................................................................69 
Fig. 4.4. Normalized transmitter output driver static power comparison.................72 
Fig. 4.5. Schematic simulation eye diagram of proposed 3-tap transmitter with 1 
main tap and two post cursor taps..............................................................73 
Fig. 4.6. TX block diagram......................................................................................74 
Fig. 4.7. Implementation 4:2 MUX and differential 2:1 MUXs with 1 UI delay....75 
Fig. 4.8. Hybrid voltage-mode driver with current mode equalization....................76 
Fig. 4.9. Simulated return loss for transmitter and the CEI-SR return loss limit.....79 
Fig. 4.10. S21 response for Channel with -6.4 dB loss at 3 GHz..............................80 
Fig. 4.11. Transmitter schematic simulation result (a) eye diagram TX 50 ohms 
termination at 6 Gb/s (b) eye diagram TX 60 ohms Termination at             
6 Gb/s.........................................................................................................80 
Fig. 4.12. S21 response for channel with -10 dB loss at 3 GHz................................81 
Fig. 4.13. Transmitter schematic simulation result (a) eye diagram TX 50 ohms 
termination at 6 Gb/s (b) eye diagram TX 60 ohms Termination at 6 
Gb/s............................................................................................................81 
Fig. 4.14. Linear voltage regulator.............................................................................82 
Fig. 4.15. Measurement setup.............................................................................. .....83 
Fig. 4.16. Die photograph..........................................................................................83 
Fig. 4.17. Low-frequency transmitter output waveform with 6 dB equalization…..84 
Fig. 4.18. Equalization peaking versus digital code for 400mVppd peak output  
swing and 120 uA IREF..............................................................................84 
 
 xiv 
 
Fig. 4.19. 6 Gb/s eye diagrams with a channel that has 4 dB loss at 3GHz,  
                        (a) without equalization, and (b) with equalization...................................85 
Fig. 4.20. Clock patterns (1010…) at 6 Gb/s data rates (a) without Equalization      
(b) with 6 dB Equalization.........................................................................86 
Fig. 4.21. 4.8 Gbps/ eye diagrams with a channel that has 6 dB loss at 2.4 GHz,      
(a) without equalization, and (b) with equalization...................................87 
Fig. 4.22. Measured clock duty cycle versus data rate..............................................87 
Fig. 4.23. Measured clock patterns ( 1010…) (a) at 2.5 Gbps and (b) at 6 Gb/s......88 
Fig. 4.24. Measured transmitter output impedance versus VREF............................89 
Fig. 4.25. Energy efficiency versus data rate for channel output 50mV eye height  
and 0.6 UI eye width.................................................................................90 
Fig. 5.1. Multi-channel serial-link transmitter architecture…………….................95 
Fig. 5.2. Low swing global clock distribution techniques: (a) CML buffer      
driving resistively-terminated on-die transmission line, (b) CMOS     
buffer driving distribution wire through a series coupling capacitor….....96 
Fig. 5.3. Simulated comparison of CML and capacitively-driven clock    
distribution over a 2mm distance: (a) output swing versus frequency,      
(b) power versus frequency…………………………................................97 
Fig. 5.4. 2-TapFIR equalization in low-swing voltage-mode drivers......................98 
Fig. 5.5. Multi-channel transmitter architecture.....................................................100 
Fig. 5.6. Capacitively-driven global clock distribution and local quadrature-     
phase generation injection lock oscillator................................................102 
Fig. 5.7. Transmitter block diagram with clock phase calibration details……….104 
Fig. 5.8. Transmitter output driver circuitry..........................................................106 
Fig. 5.9. Global output driver control (a) output driver termination impedance 
control loop (b) output driver de-emphasis impedance modulation 
loop..........................................................................................................108 
Fig. 5.10. Fast power on-off dual supply replica based linear voltage 
regulator...................................................................................................111 
 xv 
 
Fig. 5.11. Regulator power state transient simulation comparison with and      
without proposed fast power state transition............................................112 
Fig. 5.12. Micrograph of the 2-channel transmitter with on-chip 2mm clock 
distribution...............................................................................................112 
Fig. 5.13. Four eye diagrams without and with phase calibration (a) at 8Gb/s and    
(b) 16Gb/s after 2" FR4 trace...................................................................113 
Fig. 5.14. (a) Measured equalization impedance versus de-emphasis amount with      
a 300mVppd output swing, (b) Low-frequency transmitter output  
waveform with 3dB, 6dB, 9dB and 12dB equalization...........................114 
Fig. 5.15. (a) Measured frequency response of 5.8” FR4 trace and interconnect  
cables (b) Channel pulse response at 16Gb/s ( input normalized to 
1V )………………………………………………………………...…...115 
Fig. 5.16. Eye diagrams after 5.8'' FR4+0.6m SMA cable at 16Gb/s (a) without 
equalization and (b) with equalization.....................................................115 
Fig. 5.17. Eye diagrams after 5.8'' FR4+0.6m SMA cable (a) at 8Gb/s and (b) at 
12Gb/s.....................................................................................................116 
Fig. 5.18. Measured transmitter (a) energy efficiency versus data rate and                 
(b) power breakdown versus data rate....................................................117 
Fig. 5.19. Measured transient response of the transmitter output under (a) fast   
power-down and (b) start-up...................................................................117 
Fig. 5.20. Transmitter 4:1 output multiplexing block diagram with clock phase 
calibration details and output driver circuitry..........................................120 
Fig. 5.21. 4:1 output multiplexing transmitter layout..............................................121 
Fig. 5.22. Measured 4:1 output MUX and input MUX transmitter architecture 
performance comparisons (a) Digital power comparison versus         
DVDD  and (b) eye opening width versus DVDD at 12Gb/s..................122 
Fig. 5.23. Digital power comparison between 4:1 output MUX transmitter 
architecture and input MUX transmitter architecture versus eye        
opening width at 12Gb/s…………………………………………..........123 
Fig. 6.1. Energy efficiency versus data rate comparison with serial I/O 
transceiver................................................................................................125 
 
 xvi 
 
LIST OF TABLES 
Page 
 
 
Table. 3.1.  Transceiver power breakdown at 6.4Gb/s..................................................62 
Table. 3.2.  Low-power I/O transceiver comparisons...................................................63 
Table. 4.1.  Transmitter 2-Tap equalization comparisons (Vppd,max = 400mV,           
Vppd,min = 200mV, α = 0.25, and Zo = 50Ω )...............................................71 
 
Table. 4.2.  Transmitter performance summary............................................................90 
Table. 4.3.  Transmitter performance comparisons.......................................................91 
Table. 5.1.  Transmitter power breakdown at 16 Gb/s................................................118 
Table. 5.2.  Transmitter performance comparisons.....................................................119 
Table. 5.3.  Power state transient time comparisons...................................................119 
  
 1 
 
I. INTRODUCTION 
I.1. Motivation 
Both the advanced CMOS technology and the demand of various multi-media in 
computing systems have allowed multi-core microprocessor I/O bandwidth to improve 
aggressively at a rate of 2-3X every two years [1]. Based on current bandwidth scaling 
rates, high-ended microprocessors are expected to operate 1 Tb/s in the following decade 
with significantly improved serial I/O energy efficiency. However, I/O energy efficiency 
has been improved by only 20 % per year [2], and it is the main obstacle to achieving 1 
Tb/s operation due to thermal power limitation as well as unacceptable power 
consumption.  
2006 2008 2010 2012
10
0
10
1
10
2
Year
E
n
e
rg
y
 E
ff
ic
ie
n
c
y
 [
p
J
/b
]
-35%/Year
 
Fig. 1.1.   Energy efficiency versus year of published serial I/O transceivers. 
 2 
 
In addition, mobile processing performance is expected to increases ten times over 
the next five years to support the various advanced multi-media features [3]. This 
requires that I/O circuitry in mobile applications dramatically improves energy 
efficiency for longer usage time in battery operation. These requirements based on a  35 
% improvement in energy efficiency of serial I/O transceivers reported at the 2006 
ISSCC and VLSI symposium, which is shown in Fig. 1.1 [4]-[50]. However, this 
improvement still did not satisfy the need for the I/O power in future demands.  
0 5 10 15 20 25 30
10
0
10
1
10
2
Data Rate [Gb/s]
E
n
e
rg
y
 E
ff
ic
ie
n
c
y
 [
p
J
/b
]
Fixed Rate Design
Scalable-Rate
Target 
Performance
 
Fig. 1.2.   Energy efficiency versus data rate of serial I/O transceivers. 
 3 
 
High-speed serial I/O energy efficiency must improve in order to enable continued 
scaling of these parallel computing platforms in applications ranging from data centers 
to smart mobile devices. 
The main purpose of this dissertation was to understand both the achievements and 
limitations of previous works and to develop new design techniques for low-power 
multi-Gb/s serial I/O transceivers, which will significantly improve energy efficiency. A 
target data rate is a scalable rate that is from 4 to 8 Gb/s and from 8 Gb/s to 16 Gb/s with 
near 1 pJ/b energy efficiency as shown in Fig. 1.2 [4]-[50].     
I.2 Dissertation Organization 
This dissertation starts with the overview of serial link transceiver architectures in 
order to understand how the serial I/O transceivers can be implemented both 
systemically and in circuitry to maximize the energy efficiency in Section II.   
Section III discusses key circuit trade-offs associated with supply-scaling and 
multiplexing factor choices at both the transmitter and receiver. The proposed 
transmitter, which to the authors’ knowledge, is the first to implement a level-shifting 
pulse-clock pre-driver to reduce the transistor size and stack count in a voltage-mode 
output-multiplexing driver is detailed in this section. Also, it discusses the use of a 
passive poly-phase filter for transmitter quadrate clock generation, which has been 
shown in previous work [51] as an efficient technique to generate quadrature receiver-
side clocks. In addition, this section presents the 1:8 input de-multiplexing receiver, 
which employs eight parallel input samplers clocked from an 8-phase injection-locked 
oscillator that provides more than 1UI de-skew range and utilizes AC-coupling injection 
 4 
 
for improved phase uniformity relative to transconductance injection [52]. The single-
data-channel transceiver experimental results are summarized and a discussion on 
scaling this architecture to higher per-pin data rates is included.  
Section IV, presents a hybrid voltage-mode transmitter with current-mode 
equalization, which enables independent control over termination impedance, 
equalization settings, and pre-driver supply, allowing for a significant reduction in pre-
driver complexity and power. Transmitter equalization techniques are reviewed in the 
following section, which compares the hybrid transmitter with voltage-mode and 
previous equalization implementations. In addition, this section shows details in 
transmitter architecture, which includes local clocking circuitry with duty-cycle 
correction, low-complexity scalable-supply serialization and pre-driver, hybrid driver, 
and global impedance control. Also, experimental results from an LP 90 nm CMOS 
prototype are presented. 
Section V describes a scalable high data rate transmitter architecture that allows for 
low overall power consumption in a manner that allows for dynamic power management 
to optimize system performance for varying workload demands. Also, this section 
reviews key low-power design techniques employed in this design, including 
capacitively-driven wires for long-distance clock distribution and impedance-modulation 
equalization. An overview of the proposed multi-channel quarter-rate transmitter 
architecture, which is able to maintain low-swing clocking through the global 
distribution and local multi-phase generation, is given. Furthermore, it discusses the 
power/data rate scalable transmitter channel design which adopts an impedance-
 5 
 
modulated 2-tap equalizer with analog tap control, employs automatic phase calibration 
for low-voltage operation, and utilizes a replica-biased voltage regulator to enable fast 
power-state transitioning. Finally, experimental results from a GP 65 nm CMOS 
prototype are presented. 
Finally, Section VI summarizes the contributions of this dissertation and proposes 
suggestions for future works. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 6 
 
II. BACKGROUND 
II.1. Energy Efficiency Transceiver Design Consideration 
Utilizing circuit parallelism in I/O transceivers allows for potential power savings as 
the parallel transmitter and receiver segments operate at lower frequencies and 
potentially lower voltages [29], [53]. In addition, this system has an opportunity to share 
common blocks such as analog control block, calibration circuitry, and so on. In many 
cases, those blocks have to operate high-supply voltage because it is hard to scale by 
advanced CMOS technology. Therefore, it has significant power saving when it is 
shared globally. In order to utilize parallel links, we can consider the primary two I/O 
transceiver architectures according to clock recovery system for multi-Gb/s transceivers, 
which are embedded clock and forwarded clock architectures.  
 
TX
Data
N Data TX
RX
N Data RX
Differential
 Data
Differential
Data
CK
TX
CK
CDR
RX
CDR
Data
TX
PLL
RX
PLL
 
                                         
Fig. 2.1. A multi-data-channel embedded-clock I/O architecture.  
 7 
 
TX
PLL
TX
Data
N Data TX
BUF
N Data RX
Differential
 CK
Differential
Data
CK
FWD CK
TX
CK
Deskew
RX
1100
Pattern
 
Fig. 2.2. A multi-data-channel forwarded-clock I/O architecture.  
In embedded clock architecture as shown in Fig. 2.1, the clock is recovered for the 
incoming data directly, therefore it requires frequency detection, frequency correction, 
and optimal sampling data phase selection circuitry. Hence, it consists of complex 
clocking circuits, which result in considerable circuitry overhead and power 
consumption [10]. Compared with the embedded clock architecture, forwarded clock 
architecture as shown in Fig. 2.2, requires an additional lane to deliver clock to receiver; 
however, this extra clock lane and all these blocks power and circuitry overhead can be 
amortized by all the data links. Hence, this clock system will only require phase deskew 
circuitry in receiver [8], [11], [29]. In addition, the generation of clock and data is done 
by same transmitters, therefore, it increases the correlation jitter between clock and data 
[52], [54]. Therefore, the most energy efficient I/O architecture that reduces clocking 
circuit complexity, while also allowing for wide-bandwidth jitter tracking, is a 
 8 
 
forwarded-clock system where a clock signal is transmitted in parallel with multiple data 
channels. 
II.1.1. Channel 
IC-SerDes
Package
IC-SerDes
Package
Back drilled Via
 
Fig. 2.3. Single board channel. 
Connector
Package Package
Backplane via
(reflections)
Backplane trace
(dispersion)
IC-SerDes IC-SerDes
 
Fig. 2.4. Backplane channel. 
In an electrical link system, chip-to-chip interconnections consist of a copper trace on 
a printed circuit board as communication channel. Based on application, it can be 
 9 
 
designed in single board as shown in Fig. 2.3 or as a backplane FR4 board as shown in 
Fig. 2.4 where it has more connector and longer trace. 
0 2 4 6 8 10 12 14 16
-70
-60
-50
-40
-30
-20
-10
0
Frequency [GHz] 
S
2
1
 [
d
B
]
 
 
15" Via Stub
30" Backdrilled
15" Backdrilled
12 12.2 12.4 12.6 12.8
0
0.1
0.2
0.3
0.4
0.5
0.6
Time [ns]
A
m
p
li
tu
d
e
 [
V
]
 
 
Channel Pulse Response
MMSE EQ Pulse Response
Pre Cursor
Post Cursors
Main Cursor
 
                                       (a)                                                            (b) 
Fig. 2.5. The channel (a) frequency response and (b) single pulse bit response. 
The bandwidth of this electrical channel is limited by skin effect and dielectric losses 
of transmission lines. As shown in Fig. 2.5 (a), the channel frequency response has low-
pass filter characteristic as the attenuation increases with distance, and it generates null 
in frequency response due to impedance discontinuity by via-stub. In addition, the pulse 
data will disperse in a general low-pass nature. This causes inter-symbol interference 
(ISI) which creates the pre-cursors and post-cursors as shown in Fig. 2.5 (b). As pre-
cursors interfere with previously sent bits, while post-cursors interfere with the 
following bits, ISI from multiple bits reduces timing and voltage margin in receiver.  
 10 
 
Both ISI and reflection degrade signal integrity at multi-Gb/s data rates. In order to 
compensate for ISI and reflection, the I/O circuit has to be more complicated which 
includes equalization. These extra blocks will increase dramatically power consumption 
in multi-Gb/s serial I/O systems. 
 
0 10 20 30 40
10
0
10
1
10
2
Channel Loss at Symbol Rate [dB]
E
n
e
rg
y
 E
ff
ic
ie
n
c
y
 [
p
J
/b
]
~2x/10dB loss
 
Fig. 2.6.   Energy efficiency versus channel loss of serial I/O transceivers. 
Therefore, the energy efficiency of a high speed link is highly related to channel 
response. Fig. 2.6 shows energy efficiency versus channel loss based on published 
papers [4]-[50]. Based on these references, the energy efficiency increases 2 times as 
channel loss at 10 dB increases, and the link has less than 2 pJ/b energy efficiency when 
 11 
 
channel loss is less than 20 dB at Nyquist frequency.  In this low channel loss, previous 
publications have reported I/O transceivers achieving a good energy efficiency since 
they can employ a simple equalization scheme, low swing transmitter design, and offset-
corrected RX comparators in serial I/O architecture. 
II.1.2. Data rate 
0.6 0.7 0.8 0.9 1
10
20
30
40
50
60
70
VDD [V] 
IN
V
 F
O
4
 D
e
la
y
 [
p
s
]
 
 
FF 0deg
TT 27deg
SS 75deg
1X 4X 16X
1FO4
 
Fig. 2.7. Inverter FO4 delay versus VDD in general 65 nm CMOS technology. 
To maximize the power efficiency of the transceiver, the operation data rate has to be 
decided based on the target process technology. Normally, it is chosen with a 4~6 
fanout-of-4 (FO4) inverter chain delay in target process technology to minimize power 
consumption in half rate architecture because it is relatively easy to clock buffering, 
 12 
 
which gives 2:1 multiplexing in transmitter and data sampling in receiver [32], [55]. 
Also, this data rate allows extensive use of CMOS logics in an I/O system, wherein they 
have power-saving benefits in multi-data operations such as scaling power supply 
voltage.  
Fig. 2.7 shows the inverter FO4 delays with process and temperature variation versus 
power supply in 65 nm CMOS general process technology. Although the FO4 delay time 
exponentially increases by reducing power supply voltage, low power supply 
significantly improves dynamic power efficiency as shown in following equation for 
low-data rate operation.    
       (2-1) 
where C is capacitance, f is frequency, and V is power supply voltage.  
In order to operate at a higher data rate than this metric, links will be implemented by 
massive current mode differential logics, which use a large static current while it is 
generating low output swing.  Therefore, it will significantly reduce energy efficiency in 
serial I/O.  
Although FO4 delay limitation in a given technology can be overcome by employing 
multi-phase clock generation, it will add severe circuitry complexity and power 
consumption overhead without an innovative solution for multi-phase clock generation. 
II.2. Transmitter Design Consideration 
The transmitter output driver usually consumes the majority of the static power due to 
low channel characteristic impedance. This allows an energy-efficient high-speed 
 13 
 
transmitter to be implemented by a voltage-mode driver, which ideally is 4 times more 
efficient in power consumption compared to a conventional current-mode driver as 
shown in Fig. 2.8 [4]. Besides, this only NMOS pair output driver design in the low 
common-mode voltage and low voltage swing operation can potentially further improve 
power efficiency by employing a linear regulator which operates on a low supply voltage 
[55], [56]. 
 
X[n]X[n]
100Ω 
4I
50Ω 
AVDD
50Ω 
3I
I
100Ω 
I
V-Reg
AVDD
I
X[n]
X[n]
ZUP =50Ω 
ZDN =50Ω 
VREF
                            (a)                                                                      (b) 
Fig. 2.8. (a) Current mode driver versus (b) voltage mode driver with current 
consumption comparison. 
However, in the voltage mode driver, the power saving benefit is degraded by the 
higher complexity of either the segmented [55] or supply regulated predriver for 
impedance control [56] as shown in Fig. 2.9.  
 14 
 
2Zo
VREF
VZcont
X[n]
X[n]
 
(a) 
2Zo
VREF
ON
OFF
n
X[n]
n
n
Segment
Selection
Logic nX[n]
DVDD
 
(b) 
Fig. 2.9. Voltage-mode driver with impedance control (a) by supply regulated predriver 
(b) by the selection of segmented predriver. 
Both segment selection logic and segmented output driver increase the significant 
capacitive loading in the high speed data path due to circuitry loading and wiring 
parasitic. In addition, a supply regulated predriver uses a different supply voltage which 
can generate deterministic jitter due to different supply voltages, and it limits the supply 
scaling. Also, when equalization is utilized, the potential output stage’s power saving 
 15 
 
benefit of its voltage-mode driver normally degrades due to the simultaneous control of 
both impedance matching and de-emphasis operation [57], [58].  
II.2.1.Transmitter equalization techniques 
Channel frequency-dependent loss, which causes inter-symbol interference (ISI), is 
often compensated by equalization implemented at the transmitter in the form of a finite 
impulse response (FIR) filter. Assuming a standard 2-tap high-pass FIR filter with a 
negative post-cursor tap, [1- , -] the equalization coefficient, , is 
  
 
 
    
       
       
  (2-2) 
 and the amount of equalization peaking is 
             
 
    
  (2-3) 
Fig. 2.10 shows the differential output waveform of a 2-tap transmitter equalization, 
which has two different transmitter swing levels. When the main cursor, X[n], does not 
equal the post cursor, X[n-1], it makes the maximum output swing during 1UI; however, 
it generates the minimum voltage swing and depends on equalization coefficient, α, by 
channel frequency characteristic, while the main cursor is identical to the post cursor.  
 16 
 
1-2α
1
X[n] ≠ X[n-1] X[n] = X[n-1]
-1+2α
-1
Vppd,max
Vppd,min
1UI NUI
 
 
Fig. 2.10. 2-Tap de-emphasis waveform with equalization key specification. 
The first technique to implement FIR equalization in the low-swing voltage-mode 
driver includes a resistive voltage divider shown in Fig 2.11 (a). This technique utilizes 
segmentation of the output driver to implement the different output voltage levels for 
equalization. In the design of [55], a 1- percentage of the output segments is controlled 
by the main cursor tap, and  percentage is controlled by the post-cursor tap with the 
output segments sized to insure that all parallel combinations maintain proper source 
termination. 
 17 
 
2Zo
TXP
TXN
VREF
n
X[n]
X[n-1]
n
n
n
n
Segment
Selection
Logic n
Zo
Zo
X[n-1]
X[n]
 
(a) 
 
2Zo=100
RP
VREF
TXP-TXN
X[n] = X[n-1]
RN
RN RP
RN
RP RN
RP
2Zo=100
RP
VREF
TXP-TXN
RN
RN RP
RN
RP RN
RP
X[n] ≠ X[n-1]
 
(b) 
 
Fig. 2.11.  (a) Implementation of 2-tap FIR equalization in low-swing voltage-mode 
drivers with segmented resistive voltage divider (b) equivalent output driver circuitry. 
The detail operations of this configuration can easily analyze equivalent output 
driver circuitry which is shown in Fig. 2.11 (b). VREF will control transmitter maximum 
output swing which can express VREF = Vppd,max. The equivalent resistor values of both 
RP and RN are  
 18 
 
   
  
   
 (2-4) 
   
  
 
 (2-5) 
where Zo is channel characteristic impedance, and α is equalization coefficient, and 
Rp//Rn is always equal to Zo. Those resistor values are digitally controlled which 
requires a large number of segments due to the non-linear mapping for fine resolutions.  
When X[n] is not equal to X[n-1], the output stage current consumption is  
        
    
    
 (2-6) 
where Ivpp,max is the current with maximum differential output swing level.  
In this operation, total current is used to generate the maximum output swing. In order 
to generate a low voltage swing level, an extra shunt path was utilized; hence, with a low 
output swing voltage, it consumes more current, which becomes evident following 
equalization when X[n] is equal to X[n-1]; hence, 
        
    
    
            (2-7) 
where Ivpp,min is the current with minimum differential output swing level. 
This is the main disadvantage of this architecture which causes the signaling power to 
go up as the coefficient of equalization is increased. In addition, the other drawback 
associated with these voltage-mode driver designs involves the overhead in the predrive 
logic required to distribute the tap weights among the segments, which grows with 
equalization resolution.  
 19 
 
TXP
TXN
2Zo
Io
RTX RTX
I-1
AVDD
X[n-1] X[n-1]
AVDD
X[n]
X[n]
Zo
Zo
 
(a) 
 
2Zo=100
AVDD
TXP-TXN
X[n] ≠ X[n-1]
RTX RTX
2Zo=100
AVDD
TXP-TXN
X[n] = X[n-1]
RTXRTX
 
(b) 
 
Fig. 2.12. (a) Implementation of 2-tap FIR equalization current-mode driver (b) 
equivalent output driver circuitry. 
 20 
 
Due to advanced CMOS technology, the data-rate is constantly increased and the 
digital dynamic power, also rises along with it, Therefore, the power consumed by the 
complex predriver and segment selection logic necessary to support the voltage mode 
driver with equalization can eliminate any benefit from reduced transmitter output 
signaling power. 
In contrast, as shown in Fig. 2.12 (a), current-mode drivers offer the potential to 
implement high-resolution equalization without significant predriver complexity by 
setting the tap coefficients with tail current source DACs [29], [57]. If the output 
switches of the current steering stages are sized to handle the maximum tap current, only 
a single predrive buffer is required per equalization tap. However, this reduction in 
predrive dynamic power is greatly overshadowed by the 4x increase in output stage 
static current due to the parallel termination scheme. However, total current consumption 
is identical either X[n] = X[n-1] or X[n] ≠ X[n-1]. The different transmitter output 
swings allow different amounts of current use in receiver termination impedance; 
however, total current is the same due to the extra current path in transmitter termination 
impedance which is shown in Fig. 2.12 (b). The following equation shows the total 
current consumption 
                
    
  
 (2-8) 
where both Ivpp,max and Ivpp,min represent total current consumption with both maximum 
and minimum differential output swing levels.  
 
 21 
 
II.3. Receiver Design Consideration  
FWD CLK
ILO
CTLE
N
DATA [0]
DATA [1]
DATA [N]
BUF CLK Distr
FWD CLK
DLL/PLL
CTLE
N
DATA [0]
DATA [1]
DATA [N]
BUF CLK Distr
PI
 
(a)                                                                (b) 
Fig. 2.13. Forwarded clock with (a) DLL/PLL and PI based architecture and (b) ILO 
based receiver architecture. 
Two distinguished forwarded clock receivers were utilized in previous works such as 
DLL or PLL and PI based architecture [8] and ILO based architecture [52], [59], [60] as 
shown in Fig. 2.13. In the forwarded clock system, low swing differential clock is 
forwarded to receiver. Due to channel loss, this swing level is relatively low, therefore a 
clock buffer was utilized to distribute clock to all data lanes. In order to find the 
optimum phase position, which is normally the center of incoming data, a phase 
interpolator must be employed. However, it requires a multi-phase clock to manipulate 
clock position, which uses significant power. Besides, due to the reduced static phase 
offset and deterministic jitter, multi-phase clock generation, DLL or PLL, was utilized 
 22 
 
by each data lane locally, which increased receiver circuitry complexity and power [54]. 
Therefore, as an alternative method, recently, an injection-locked oscillator was used to 
deskew clock signals in receiver.  
The clock deskewing by ILO has several advantages in an energy efficient receiver. 
The main advantages are that it generates a multi-phase clock which allows applying 
high de-multiplexing, and it is deskewing for the optimal sampling position 
simultaneously in receiver [52], [59]. 
 Due to this high de-multiplexing, both comparator and de-serializing blocks operate 
at low data rate, which can further reduce supply voltage for significantly reducing 
receiver power consumption. In addition, to lock the ILO does not require rail to rail 
CMOS clock signal; hence, it can reduce clock buffer and distribution power.  
However on the downside, it also has an uninformed multi-phase spacing, narrow 
locking range, and non-linear phase deskew. This dissertation will further shows that the 
design issues and implementation in low supply voltage have a better phase spacing and 
linear deskewing range when using ILO. 
II.3.1. Receiver data path 
The low supply operation in a receiver data path is still a challenge in high-speed 
operations because it is hard to design both high-performance receiver equalization and 
high-speed comparators. A special concern is the comparator clock to data delay since 
that can be a critical factor affecting the performance of receivers [61]. Therefore, it is 
important to know the trend and overview of receiver equalization as it pertains to the 
comparator. 
 23 
 
There are mainly three different types of receiver equalization configurations, and 
both continuous time linear equalization and decision feedback equalization are utilized 
together in most receiver architectures. However, decision feedback equalization has 
timing constraints wherein the closed loop has to settle on 1 UI and only post cursor can 
be canceled [62]. Also, to achieve power efficiency, the receiver equalization scheme has 
to be simple, which assumes interconnect channel has low loss and less impedance 
discontinuity. Therefore, the continuous time linear equalization is employed in a high 
energy efficiency receiver architecture to cancel both pre-cursor and long-tail ISI [63]. 
IBias
AVDD
RL RL
To 
Comparators
Rs
Cs
CL
CL
2Zo
EQ Rs CtrlEQ Cs Ctrl
 
Fig. 2.14. Schematic of RX CTLE with tuning circuitry.  
 
 24 
 
Active CTLE can be implemented through a differential pair with RC degeneration 
with gain at Nyquist frequency as shown in Fig. 2. 14.  
The transfer function of the active CTLE is written as  
     
  
  
  
 
    
   
        
    
    
 
    
 
 (2-9) 
DC gain is expressed as  
       
    
        
 (2-10) 
Ideal peak gain is equal to gm*RD. Ideal peaking can be expressed as 
             
             
      
          
(2-11) 
 
 
CS Increases
RS Increases
                               (a)                                                                   (b) 
Fig. 2.15. Simulated AC response of CTLE by (a) capacitor tuning (b) resistor tuning. 
 
 25 
 
At the high frequency, degeneration capacitor impedance is low compared to the 
degeneration resistor, therefore, effective circuit Gm will be high which creates peaking. 
The peaking frequency can be controlled by the degeneration capacitor and DC gain can 
be tuned through adjustment of degeneration resistor as shown in Fig. 2.15. 
 
DOUTP
DOUTN
VDD
DINP
DINN
CLK
VDD
CLK CLK
CLKB
ION IOP
DINP
DINN
DOUTP
DOUTN
CLK
CLK
CLK CLK
VDD
                                
(a)                                                                     (b) 
 
Fig. 2.16. (a) One-stage StrongARM comparator (b) two-stage low-voltage comparator 
with integrating stage.  
The most popular high-speed comparator architecture is one stage strong-arm latch 
which is shown in Fig 2.16 (a) [62]. The strong-arm latch makes a decision based on the 
polarity of the differential inputs DP and DN. When clock signal has high voltage, the 
bottom nMOS transistor is enabled and the difference of input voltages place the 
 26 
 
different currents into a regeneration stage, which builds a rail-to-rail output signal at 
output nodes DOP and DON.  The output signal goes to high by pMOS transistors when 
the clock is low.  This reset operation makes this comparator generates less than 1 UI 
period pulse; therefore, it usually requires a keeper circuit such as SR latch to hold the 
output signal when CLK is low.  The strong-arm latch has the advantages of no static 
power dissipation and rail to rail output signal. 
The major drawback of this StrongARM latch is that there are 4 stacking transistors 
from power supply to ground, which severely degrades the performance of the 
comparator in the low power supply operation. To achieve high energy efficiency, power 
supply reduction is essential; thus, as an alternative two stage comparator which operates 
with 3 stacking transistors was utilized, which is shown in Fig. 2.16 (b) [64].  
In this two stage comparator, the first stage performs as an integrator, and the second 
stage is a regeneration stage. In the integrating stage, the difference in input voltages 
provides the different discharging times between the parasitic capacitor of two nodes 
when the clock is high. Finally, they convert to the voltage, which is the input of the 
regeneration stage, and cross-coupled inverters amplify those signals by positive 
feedback using rail-to-rail signal. The separation between sampling and regeneration in 
the two stages gives this comparator the ability to implement only three stacking 
transistors. 
 27 
 
0.6 0.7 0.8 0.9 1
20
40
60
80
100
120
140
160
VDD [V] 
C
L
K
 t
o
 Q
 D
e
la
y
 [
p
s
]
 
 
one stage DCVS
2 stages Integ+Rege
0.6 0.7 0.8 0.9 1
2
4
6
8
10
VDD [V] 
P
o
w
e
r 
[u
W
]
 
 
one stage DCVS
2 stages Integ+Rege
                                      (a)                                                                (b) 
 
Fig. 2.17. One-stage StrongARM comparator and two-stage low-voltage comparator 
with integrating stage comparison (a) clock to data delay versus power supply (b) power 
versus power supply. 
 Fig. 2.17 (a) shows both comparators simulation results of clock to data delay versus 
power supply voltage, and Fig. 2.17 (b) shows the power consumption versus power 
supply for the two comparators. Based on this simulation result, the two-stage 
comparator has better power efficiency for same clock to data delay as that achieved by 
reducing the power supply. Consequently, high-speed operation can be achieved with 
only three devices stacked between the positive supply and ground, enabling low-voltage 
operation.  
II.4. Power Management 
A serial I/O link system has to design enough bandwidth to supporting a maximum 
data rate operation; however, the system does not always operate at the maximum data 
 28 
 
rate. Therefore, when the system’s required performance reduces, it can significantly 
save power as reducing supply voltage and fast power state transitioning enables and 
disables the number of lanes in multi-channel I/O architectures.  
II.4.1. Power supply voltage scaling 
In order to save dynamic power consumption in a digital system, adaptive power 
supply regulation technique must be utilized. The main goal is to the reduce supply 
voltage until it no longer degrades performance at lower frequency operation when the 
system does not need to be at peak performance [65]-[70]. 
 
Digital
System
Ref Circuit
V
Controller
Fref
f
Duty
Vdd*Duty
 
Fig. 2.18. Adaptive power-supply regulator overview. 
 29 
 
 As shown in Fig 2.18, an adaptive power supply regulator is a feedback control loop, 
and it consists of three components, the reference circuit, the controller, and the buck 
converter. The desired optimal supply voltage is produced by a buck converter, which 
has high power efficiency as the controller compares two frequencies; therefore, the 
delay of the reference circuit matches the desired operation frequency [71], [72]. In order 
to generate precise supply voltage, the delay of the reference circuit has to track 
accurately the critical path delay of the system. Hence, the reference circuit is generally 
designed as a delay line or a ring oscillator by digital gates [71], [72]. For example, a 
VCO reference circuit could be used to generate a given supply voltage to ensure a 
certain operation frequency and circuit delay [57]. 
II.4.2. Fast power switching bandwidth scaling 
1.0
0.8
0.6
0.4
0.2
0.0
Energy
Savings
Time
BW
Max.BW
BW 
Demand
Interface
 BW
Transition
latency
Fixed
 BW
 
Fig. 2.19. Interface bandwidth adapting to instantaneous bandwidth requirements. 
 
 30 
 
In multi-channel architecture operation, the supply voltage scale by bandwidth 
demands is the adjusting per-pin bandwidth; however, bandwidth modulation can also be 
done by changing the number of active channels by enabling or disabling I/O lanes as 
needed [1]. This power switching method significantly saves power compared with 
conventional fixed bandwidth operation; however, this power saving method requires 
minimization of the power state transition latency time as shown in Fig. 2.19 [1]. 
Recently, many serial I/O links were applied as fast power state transition techniques, 
especially CPU-to-CPU, CPU-to-memory, and mobile memory interface [29], [73]-[75].  
Especially the mobile memory interface and forwarded clock system require low 
power states with fast transition times to support over a wide ranges of bandwidths; thus, 
it is implemented by a global synchronous clock, which pauses and then proceeds to an 
on-and-off digital circuit which allows to the system to save dynamic power 
consumption as when using CMOS circuit topologies extensively. Also, in order to 
control the power state of signaling circuitry, a linear voltage regulator, which is 
coarsely fast, settles in an open loop mode and maintains fine control by utilizing a close 
loop structure [73]. In addition, an injection-locked clock multiplier is employed to 
achieve both frequency-agile and fast power-on in clock generation [74], [75]. 
 
 
 
 
 
 31 
 
III. ENERGY EFFICIENT TRANSCEIVER DESIGN 
III.1. Introduction  
Significant I/O energy efficiency improvements necessitate both advances in 
electrical channel technologies and circuit techniques in order to reduce complexity and 
power consumption. Examples of advanced inter-chip physical interfaces include high-
density interconnect and Flex cable bridges, which allow operation at data rates near 10 
Gb/s while only requiring modest equalization [29]. 
The improvements in energy-efficiency are possible through reduction of the supply 
voltage VDD. Previously, this has enabled excellent energy/computation for digital 
systems [76] due to the exponential dependence of power on VDD. Leveraging supply 
scaling to improve energy efficiency motivates I/O architectures that employ a high level 
of output/input multiplexing, as this allows for the parallel transmit and receive segments 
to operate at lower voltages [72]. However, challenges exist in the design of an efficient 
output-multiplexed voltage-mode driver due to the relatively large driver transistor sizes 
required for output impedance control, as well as the reduced supply headroom for the 
output stage regulator. Furthermore, widespread adoption of low-VDD transceivers has 
been limited due to questions regarding robust operation and severe sensitivity to 
process variations. In particular, the generation of precise multi-phase clocks and the 
ability to compensate for circuit mismatch is an issue both at the transmitter and 
receiver. 
 
 
 32 
 
III.2.Transceiver Architecture Considerations 
Utilizing circuit parallelism in I/O transceivers allows for potential power savings, as 
the parallel transmit and receive segments operate at lower frequencies and potentially 
lower voltages [72]. Unfortunately, challenges exist in generating power-efficient 
multiple-phase clocks and maintaining critical circuit transmitter/receiver circuit 
bandwidths while operating under low voltage. This section analyzes the trade-offs 
associated with supply-scaling and multiplexing factor choices at both the transmitter 
and receiver. 
III.2.1. Transmitter 
 
                                  (a)                                                                   (b) 
 
Fig. 3.1 Output multiplexing approaches for voltage-mode drivers: (a) producing an 
output data pulse with two-transistor output segments, (b) producing an output data 
pulse with a pulse-clock and a single-transistor output segment. 
Voltage-mode output stages are desired in low-power transmitter architectures due to 
the potential for significant current savings for a given output voltage swing. It is 
possible to implement output multiplexing in current-mode drivers through multiple 
two-transistor current-switch segments controlled by two overlapping clock signals and 
CK0
CK270
VDD
DIN0
Data
CLK
DIN0
Data
VREG
D0
PCLK
CK0
CK270
Data
Data
DIN0
DIN0
VREG
D0
 CK0
D0
CK270
D1 D2 D3
DIN0
PCLK
DIN0 &
 CK0
D0
VDD & 
CK270
D1 D2 D3
 33 
 
the data, thus avoiding any full data-rate signals until the final pad outputs Fig. 3.1 (a) 
[72], [77].   
Unfortunately, utilizing this approach in voltage-mode driver results in large output 
transistors in order to maintain proper channel impedance termination to minimize 
reflection-induced inter-symbol interference and allow predictable transmit output swing 
levels. Driving these large output transistors increases dynamic power consumption and 
the series transistor combination degrades the output signal edge rates. Another output 
multiplexing approach suitable for a voltage-mode driver involves combining a one unit 
interval (UI) pulse-clock with the data before the output switch transistor, allowing for 
only one single-transistor output segment to be activated at a time Fig. 3.1 (b).  
Hence, impedance control is achieved using smaller output transistors, resulting in 
reduced pre-driver power consumption and improved output signal edge rates. This 
pulse-clock output multiplexing scheme is utilized in the voltage-mode driver presented 
in this work. 
The optimal output multiplexing ratio, with respect to power efficiency, is a function 
of both the minimum swing required to maintain the output eye margins and the 
complexity associated with the generation of precise multiple-phase clocks. Fig. 3.2 
compares three 8 Gb/s transmitters that utilize output-multiplexing factors of 1:1 
(multiplexing before the output driver), 4:1, and 8:1, respectively. The transmitters 
leverage supply scaling in the clock generation and serialization while the output stage is 
powered from a low-voltage regulator, discussed in Section III, which is capable of 
operating from a fixed 0.65 V supply.  
 34 
 
 
                                 (a)                                                                           (b) 
 
 
(c) 
 
Fig. 3.2. Transmitter architectures with different output multiplexing factors: (a) 1:1,  
(b) 4:1, (c) 8:1. 
In order to avoid the challenges associated with global multiple-phase clock 
distribution in a multi-channel I/O system, all these topologies utilize a low-swing global 
differential clock distribution, with multiple-clock phases generated locally. The 1:1 
multiplexing transmitter is a half-rate architecture [56], [57], [78] that utilizes a 2:1 
CMOS mux before the output stage which is switched by two-phases of a 4GHz clock 
generated by the local CML-to-CMOS clock buffer circuitry.  For the 4:1 multiplexing 
transmitter, a 2 GHz low-swing global clock passes through a passive poly-phase filter 
to produce four clock phases, which are then converted to CMOS levels to actuate the 
CML to CMOS
8
:4
 M
U
X
+
4
:2
 M
U
X
2:1 MUX
Predriver
8Gbps8
DIV
Scalable DVDD
0.65V
4GHz 
CML CK
2
1:1 MUX
Voltage Mode
Output Driver
DATA
8x1Gbps
2
Scalable DVDD
4:1 MUX
Voltage Mode
Output Driver
8
8
:4
 M
U
X
Pulse-
Clock
Predriver
DIV
Passive Poly Phase Filter
CML to CMOS 
0.65V
4
2GHz 
CML CK
DATA
8x1Gbps
8Gbps
8Gbps
8:1 MUX
Voltage Mode
Output Driver
Pulse-
Clock
Predriver
1GHz 
CML CK
ILO
8 Phases CK GEN
DATA
8x1Gbps
Scalable DVDD
0.65V
 35 
 
pulse-clock pre-driver. The eight clock phases required for the 8:1 multiplexing 
transmitter are produced with a local injection-locked oscillator (ILO) locked to a 1 GHz 
low-swing global clock input. 
 
 
                                   (a)                                                               (b) 
 
Fig. 3.3. Simulated 8 Gb/s transmitter performance with varying output multiplexing 
factors: (a) deterministic jitter versus supply voltage, (b) dynamic power consumption. 
Schematic simulation results are presented in Fig. 3.3 (a), which compares the 8 Gb/s 
deterministic jitter (DJ) of the three transmitters driving an ideal channel as a function of 
the supply voltage. The 1:1 input multiplexing transmitter’s DJ increases rapidly as the 
supply is reduced near 0.6 V due to degraded timing margin in the 2:1 CMOS 
multiplexer that switches at 4 GHz, while both the 4:1 and 8:1 output multiplexing 
designs display similar performance and operate with reasonable DJ at lower voltages.  
0.55 0.6 0.65 0.7
0
10
20
30
40
DVDD [V] 
8
G
b
/s
 D
e
te
rm
in
is
ti
c
 J
it
te
r 
[%
U
I]
 
 
1:1MUX
4:1MUX
8:1MUX
    1:1                    4:1                     8:1
N
o
rm
a
li
z
e
d
 T
X
 D
ig
it
a
l 
P
o
w
e
r 
[%
] DVDD
0.61V
DVDD
0.57V
DVDD
0.6V
Serializer
AND &
Level Shifter
27.5%
PPF &
CML 
TO CMOS
26%
AND &
Level Shifter
23%
ILO
for
8 Phases 
CK 
generator
77%
Serializer
Pre Driver
Level Shifter
34%
CML 
TO CMOS
43%
23%
46.5%
 36 
 
Fig. 3.3 (b) compares the dynamic power consumption of the three transmitters 
normalized to the highest-power 8:1 architecture. Here the transmitter supply is set based 
on two constraints of 5 % output DJ and acceptable output phase mismatch across Monte 
Carlo simulations.  While the 8:1 transmitter is capable of less than 5 % DJ at a supply 
lower than 0.6 V, the ILO displays excessive phase variation at these low voltages. 
Overall, the 4:1 output multiplexing architecture displays the best power consumption 
due to the superior timing margins relative to the 1:1 transmitter and reduced sensitivity 
to multi-phase clock generation enabled through the two-stage passive poly-phase filter. 
Hence, the 4:1 architecture is chosen and is discussed in detail. 
III.2.2. Receiver 
 
 
Fig. 3.4  A forwarded-clock 1:N receiver architecture. 
DATA
OUT
CTLE
N
Scalable DVDD
1:N DEMUX
BUF
4b Amp
Ctrl N phases 
CLK
1GHz 
CLK
8Gbps
Data
4b EQ
Setting
AC coupled ILO
 37 
 
At the receiver, the optimal input de-multiplexing ratio, in terms of power efficiency, 
is a function of the minimum voltage required to produce precise multi-phase clocks 
while maintaining adequate circuit speed. An input continuous-time linear equalizer 
(CTLE), consisting of a RC-degenerated differential amplifier, is used to compensate for 
the channel loss. Fig. 3.4 shows a high-level diagram of the receiver architecture in 
which it drives the N quantizers clocked by multi-phase clocks from an ILO locked to 
the forwarded clock. The ILRO also provides the ability to adjust for the skew between 
data and the sampling clock by adjusting its own free-running frequency, as 
demonstrated in [52].  
CTLE equalization is chosen versus transmit feed-forward equalization (FFE) in this 
transceiver architecture, as link modeling studies [79] have found that including a CTLE 
can achieve less power than a design without TX equalization or designs which include 
2-tap TX equalization without a CTLE. This is because the CTLE allows for a peak gain 
above 0 dB near the Nyquist frequency, which improves the sensitivity of the RX and 
allows scaling down the transmit output swing significantly. TX FFE, on the other hand, 
reduces the effective transmitted signal swing, placing more stringent requirements on 
the RX and also increases the TX circuit complexity. This is especially true for voltage-
mode drivers, where significant output-stage segmentation and pre-drive logic is often 
necessary to achieve a given equalization range and resolution, both in designs which 
control the output impedance [58] and those that don’t [80]. 
All of the receiver circuits share the same scalable power supply. A higher de-
multiplexing ratio relaxes the quantization delay requirement for each quantizer, 
 38 
 
allowing quantization speed to be traded off for lower supply voltage. For the chosen 
quantizer structure, which is similar to [64], near-quadratic power reduction is observed 
associated with supply voltage scaling. 
 
 
                                       (a)                                                             (b) 
 
Fig. 3.5.  Key receiver circuitry simulated performance versus supply voltage: (a) ring 
oscillator phase variation, (b) quantizer delay. 
 
 
It is important to note that while a highly parallel architecture sees improved power 
efficiency by operating at lower voltage, several limitations prevents carrying out this 
methodology indefinitely. The first limitation is that lower overdrive and headroom 
reduce the performance of analog components in the critical high-speed path. In the case 
of the CTLE, larger current is needed to maintain its bandwidth at a lower supply 
voltage, contradicting the effort to reduce power consumption. In turn, larger current and 
lower headroom also limit the size of the load resistor, making it difficult to achieve the 
required gain. The second limitation is that the use of more quantizers in parallel 
0.4 0.5 0.6 0.7 0.8 0.9 1
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
VDD [V]
P
h
a
s
e
 S
p
a
c
in
g
 [
U
I]
0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.5
1
1.5
2
VDD [V]
D
E
L
A
Y
 [
n
s
] VDD=0.8V, DELAY=132ps
VDD=0.6V, DELAY=298ps
VDD=0.5V, DELAY=624ps
 39 
 
increases the loading of CTLE, thus decreasing the bandwidth. This loading includes the 
input capacitance of the quantizer itself, as well as the wiring parasitic, which becomes 
more significant as longer wires are needed for higher parallelism. The third limitation is 
that the variation of certain blocks is more sensitive to supply voltage than others.  
For example, Fig. 3.5(a) shows the simulated phase mismatch from 100 Monte-Carlo 
runs of an 8-phase ring oscillator across different supply voltages. Here the phase 
mismatch is normalized to the UI value corresponding to the frequency achievable at a 
given supply voltage. It can be observed that σ grows faster as it approaches the near-
threshold region. In a receiver, large phase mismatch makes it difficult to align every 
clock edges for all the parallel quantizers to the proper position in the data eye 
simultaneously. As a result, the combined BER becomes worse as phase mismatch 
increases. While individual skew adjustment could be added to each clock phase, this 
comes at the expense of additional mismatch detection and correction circuitry.  
 
 
 
Fig. 3.6. Receiver power consumption versus de-multiplexing factor. 
 
    1:4                    1:8                     1:16
Quantizers
CTLE
CTLE
26%
25%
N
o
rm
a
li
z
e
d
 R
X
 P
o
w
e
r 
[%
]
VDD
0.8V
VDD
0.6V
VDD
0.5V
ILRO
20%
Quantizers
30%
ILRO
14%
CTLE
35%
ILRO
33%
Quantizers
42%
20%
26%
31%
 40 
 
To evaluate the effectiveness of different de-multiplexing ratio and supply voltage 
combinations in the presence of these limitations, three receivers with different de-
multiplexing ratios and supply voltages are simulated. The de-multiplexing ratios are 
chosen according to the different quantizer delays shown in Fig. 3.5 (b) to meet the same 
8Gb/s throughput target, with constant CTLE output bandwidth maintained for all three 
designs. Fig. 3.6 summarizes the power consumption obtained from schematic 
simulations. Although the power consumption of quantizers and oscillator generally 
scales down with increased de-multiplexing factor and reduced supply voltage, the 
CTLE consumes the most power at 0.5 V for the reasons discussed above. This increase 
in CTLE power consumption nearly cancels all the power savings from scaling VDD 
from 0.6 V to 0.5 V. Moreover, comparator offset increases significantly at extremely 
low voltages [61], necessitating excessive offset cancellation circuitry range. 
Considering the limited total power savings, corresponding CTLE bandwidth 
degradation, and the increased susceptibility to variation, reducing supply voltage 
beyond 0.6 V exhibits diminishing returns. 
III.2.3. Proposed transceiver architecture 
Fig. 3.7 shows the block diagram of the entire implemented transceiver. In order to 
optimize power efficiency, the transceiver is implemented with a 4:1 output multiplexing 
transmitter and an 8:1 de-multiplexing receiver. Except for the transmitter output stage, 
which is powered by a fixed 0.65V regulator, all circuitry utilizes a supply which is 
scaled to the minimum voltage that satisfies the target BER specification for a given data 
rate. 
 41 
 
 
 
 
Fig. 3.7. The implemented single-data-channel low-power forwarded-clock transceiver 
block diagram. 
 
 
III.3. Transmitter 
Fig. 3.8 shows the I/O transmitter block diagram configured for 8 Gb/s operation. 
Eight bits of parallel input data are serialized in two stages, an initial 8:4 multiplexer and 
a final 4:1 output multiplexing voltage-mode driver. The clocks which synchronize the 
serialization are generated by passing a differential quarter-rate clock through a poly-
phase filter to generate four quadrature-spaced phases. Two of these phases are divided 
by two to perform the initial 8:4 multiplexer operation, generating 4 parallel input data 
streams for the output multiplexing driver. A 4:1 output multiplexing voltage-mode 
driver is utilized in order to allow low-VDD operation of the serialization stages. 
 
 
 
DATA
OUT
BUF
Scalable DVDD
CTLE
Scalable DVDD
4:1 MUX
Voltage Mode
Output Driver
2 -1
7
PRBS
FIXED
Pattern
Gen
8
8
:4
 M
U
X
AND &
Level 
Shifter
DIV
CML 
CK
Passive Poly Phase Filter
CML to CMOS 
0.65V
Differential
Data
Differential
 Forwarded CKTX0 (DATA)
4
TX1 (CK)
ILRO
Injection
clk
Other RXs
Skew
Control
 42 
 
 
 
Fig. 3.8. 4:1 output multiplexing transmitter block diagram.  
III.3.1 Local multi-phase clock generation 
A passive poly-phase filter is utilized to generate the four quadrature clock phases 
from a globally distributed low-swing quarter-rate clock. In order to enable operation 
over a wide range of data rates, a two-stage design with staggered time constants is 
implemented [81], [82]. As shown in Fig. 3.9, this two-stage design provides quadrature 
outputs over a range of 1 to 2 GHz with a phase error less than 6, which is far superior 
to a single-stage design. In addition, this passive quadrature clock generation structure is 
well suited for scalable-supply designs, as the clock phase spacing is decoupled from the 
Level 
Shifter
Level 
Shifter
DFF
D      Q
Q
8:4
CK0
CK180
CK0
CK0
Pulse
Generator
TXP
TXN
VZDN
CKP
CKN
Txdata
CK180
CK90
CK270
CP0
CP90
CP180
CP270
2Gb/s
8Gb/s
2GHz
8x1Gb/s
/2
CK0/90/
180/270
CP0/90/
180/270
Scalable DVDD VZUP
VREF
0.65 V
Cdec4:1 Voltage Mode 
Output Driver
2 Stages PPF CML to CMOS Converter
Scalable DVDD
8:4MUX, AND Gate, and Level Shifter
I
QB
Q
IB
ERROR
AMP
 43 
 
supply voltage. 
 
 
 
Fig. 3.9 Passive poly-phase filter I and Q phase spacing versus frequency. 
 
 
The quadrature poly-phase filter outputs are converted to CMOS levels by a CML-to-
CMOS converter, as shown in Fig. 3.10. AC-coupling from the poly-phase filter outputs 
directly to the input inverter with resistive feedback improves the level converter duty 
cycle performance [83]. A combination of programmable p-n ratio inverter buffers and 
two stages of capacitive DACs compensate for both errors in duty cycle and quadrature 
phase spacing. As shown in Fig. 3.8, the final pulse-clocks for the output-multiplexing 
driver are produced by passing the CMOS level quadrature clocks through a 
transmission-gate AND logic block. 
1 1.2 1.4 1.6 1.8 2 2.2
60
70
80
90
100
110
Frequency [GHz] 
I&
Q
 P
h
a
s
e
 D
if
f 
[D
e
g
]
 
 
1-Stage
2-Stage
< 6°
 44 
 
 
 
Fig. 3.10. CML-to-CMOS converter with duty-cycle and phase spacing compensation. 
III.3.2. Level-shifting pre-driver 
One of the challenges associated with scalable-supply designs with voltage-mode 
output drivers involves maintaining proper channel termination at low-supply voltages 
without dramatic increases in the output stage transistors.  
In order to alleviate this problem, a level-shifting pre-driver block (Fig. 3.11) is 
utilized to drive the final switch transistors of the voltage-mode output stage with a full 
DVDD swing above the nominal nMOS threshold voltage, Vthn. This level shifting stage, 
consisting of a feed-forward capacitor that biases the output switches near Vthn when off 
and pulses up to Vthn+DVDD when on, allows for a full-DVDD gate overdrive on the 
output switch transistors, as shown in the simulation results of Fig. 3.12. This minimizes 
the size of the output switch transistors required to match the channel impedance, 
allowing for low supply operation and reduced dynamic power consumption. 
 
CKI
CKIB
1 2
21
CAP 2bits CAP 4bits
Duty Cycle Corrector
CAP 2bits CAP 4bits
 CKQ
CKQB
CML to CMOS Converter
I
IB
Q
QB
 45 
 
 
 
Fig. 3.11. Level-shifting pre-driver. 
 
 
 
 
                                        (a)                                                                (b) 
 
Fig. 3.12. Level-shifting pre-driver simulated operation: (a) input pulse-clock and data 
signals, (b) output data pulse before and after level shifting. 
III.3.3. Output driver 
The low-swing voltage-mode driver is comprised of nMOS transistors, with four 
parallel switch segments implementing the 4:1 output multiplexing. Driver output 
impedance is formed by the series combination of the switch transistors driven by the 
level-shifting pre-drivers and the impedance control transistors shared by the four output 
AND Gate and Level Shifter
CP0
Before LS
DATA0
Diode Clamp
Ileakage
D0
D0
D2
D3
D1
Vthn
Vthn+DVDD
0 1 2 3 4 5
0
0.2
0.4
0.6
0.8
Time [ns] 
A
m
p
li
tu
d
e
 [
V
]
 
 
CP0
Data0
0 1 2 3 4 5
0
0.2
0.4
0.6
0.8
Time [ns] 
A
m
p
li
tu
d
e
 [
V
]
 
 
Before LS
D0
 46 
 
segments. A global impedance control loop produces VZUP and VZDN voltages to 
independently set the pull-up and pull-down impedance, respectively. A voltage 
regulator sets the power supply of the voltage-mode driver to a value VREF, which due to 
impedance control is equal to the peak-to-peak differential output swing, allowing for an 
adjustable output swing from 100-200 mVppd. The driver’s low common-mode output 
voltage allows for the regulator to have a source-follower output stage, which offers 
improved supply-noise rejection relative to common-source output stages.  
Utilizing a low supply voltage to power the output stage regulator dramatically 
improves the transmitter power efficiency. In a multi-channel I/O system, this common 
regulator supply could be generated by a global I/O regulator with high efficiency, such 
as a switching regulator topology, where the per-channel voltage regulators would allow 
for improved isolation and output swing optimization. For the per-channel voltage 
regulator, it is important to achieve a high gain-bandwidth within the error amplifier to 
minimize the output swing error and provide noise rejection. However, this can be 
difficult to achieve as the voltage headroom is reduced in low-voltage operation. 
In order to achieve a high gain-bandwidth error amplifier at a low 0.65 V supply 
voltage, a pseudo-differential topology with negative resistance gain boosting is utilized 
in this design, rather than a conventional simple OTA stage [84] in Fig. 3.13.  
 47 
 
 
 
Fig. 3.13. Low-voltage regulator utilizing a pseudo-differential error amplifier with 
partial negative-resistance load. 
 
 
 
 
 
                                    (a)                                                                   (b) 
 
Fig. 3.14.  Low-voltage regulator simulated performance with various negative 
resistance settings: (a) error amplifier gain versus frequency, (b) supply step response 
from 0 to 0.65 V with VREF=120 mV. 
 
 
 
M1
VREG
0.65V  Voltage Regulator
3 Bits Negative Resistor Bank
VREF
-R -R
VM
Driver
M1
M2 M2
(4:2:1)
M4
M3 M3
10
4
10
6
10
8
10
10
-10
0
10
20
30
Frequency [Hz] 
A
m
p
li
fi
e
r 
G
a
in
 [
d
B
]
 
 
VREG(-R="110") TT
VREG(-R="100") FF
VREG(-R="100") TT
VREG(-R="100") SS
NO Neg R
0 5 10 15
0
0.04
0.08
0.12
0.16
Time [ns] 
V
R
E
G
 [
V
]
 
 
VREG(-R="110") TT
VREG(-R="100") FF
VREG(-R="100") TT
VREG(-R="100") SS
 48 
 
Low voltage operation is enabled by the transmit output impedance control, which 
allows for a tight range of VREF values for a given output swing, and eliminating the 
typical tail current source while still maintaining a simulated 22 dB power-supply 
rejection ratio. A programmable negative resistance load increases the DC gain of the 
error amplifier to 
       
   
             
 (3-1) 
Fig. 3.14 shows that this negative resistive load boosts the low frequency error 
amplifier gain by approximately 12dB, while still maintaining adequate stability. The 
low frequency error amplifier gain can be further increased to near 30 dB by increasing 
the negative resistance strength; however stability is compromised, as shown in the 
supply step response simulations. In order to guarantee regulator stability over process 
variations, a three-bit digital control is utilized to tune the negative impedance value. 
III.3.4.Global impedance controller 
Fig. 3.15 shows the global output driver impedance controller that produces the 
output voltages, VZUP and VZDN, which controls multiple output drivers’ pull-up and 
pull-down impedance, respectively, allowing for impedance control loop power 
amortization among the number of transmitter channels [84]. A replica transmitter stage 
with a precision off-chip 100 Ω resistor is placed in two feedback loops, one which sets 
the top-most transistor gate voltage, VZUP, to force a value of (3/4)*VREF at the replica 
transmitter positive output, and the other which sets the bottom-most transistor gate 
voltage, VZDN, to force a value of (1/4)*VREF at the replica transmitter negative 
 49 
 
output. While other voltage-mode impedance control schemes primarily utilize the pre-
driver supply voltage [41], [56], utilizing dedicated transistors for impedance control 
allows the pre-drive swing value to be decoupled from the impedance control, providing 
a degree of freedom to allow for potential pre-drive voltage scaling for improved power 
efficiency [84].  
 
 
 
Fig. 3.15. Global output driver impedance controller. 
 
 
A replica bias circuit consisting of a diode-connected nMOS whose source is 
connected to the scalable DVDD biases the replica switch transistors to a voltage level, 
VLS = Vthn+DVDD, consistent with the level shifting pre-driver output. The driver 
output resistance is partitioned with nominally 30 Ω switch transistors and 20 Ω 
impedance control transistors in order to reduce the switch transistor size and obtain 
lower dynamic power consumption. 
VZUP
1/4VREF
VZDN
VLS
100Ω
VREF
3/4VREF
VLS
Replica TX
DVDD
VLS
Ileakage
Replica
Bias
ZUP
ZDN 
 50 
 
III.4. Receiver 
III.4.1. CTLE and quantizers 
The receiver consists of an input CTLE that drives eight parallel data quantizers [61] 
and provides up to 8 dB of peaking by switching the value of the degenerated binary 
weighted resistor to support low-loss channels which is shown in Fig. 3.16. While a 
multi-stage CTLE could potentially provide higher gain and peaking, it would lower 
bandwidth due to additional poles in the signal path. 
 
  
 
Fig. 3.16. Simulated AC response of CTLE by resistor tuning. 
The quantizers are each clocked from eight phases generated by an ILRO locked to an 
eighth-rate forwarded clock from the transmitter chip. In order to operate the low supply 
voltage, two-stage comparator was utilized, integrator stage and regeneration stage, 
which is shown in Fig 3.17 [64]. A 6bit binary current source can be injected to cancel 
the quantizer offset by current unbalancing.  
 51 
 
 
 
Fig. 3.17. Two-stage comparator with current offset control. 
 
III.4.2. ILRO clocking 
Injection locking has been demonstrated as an energy-efficient scheme for both clock 
generation and de-skewing due to its reduced complexity relative to other approaches 
such as PLL- or DLL-based timing recovery [52], [59]. In addition, when ILRO-based 
de-skew is combined with aggressive supply voltage scaling, excellent receiver energy-
efficiency of <0.2 pJ/b at 8 Gb/s has been demonstrated in a previous work [61]. 
 
 
 
Fig. 3.18. ILRO schematic. 
  
frequency 
control
6-bit binary 
deskew control
dummydummydummy
Injection
 clk
cs
cs
cs
 52 
 
Fig. 3.18 shows the ILRO used in this design, which consists of a 4-stage differential 
current-starved ring oscillator. The oscillation frequency is controlled by a tail current 
source that is split into two parts, one controlled by an external frequency-locked loop to 
nominally oscillate at the forwarded eighth-rate frequency, and the other portion 
controlled by a 6-bit binary code for de-skew. In order to enable ILRO operation over a 
wide frequency range, the relative strength between the frequency-tuning current source 
and de-skewing current sources is adjustable, effectively decoupling the frequency 
tuning range from the de-skew step resolution. The frequency locking process, which is 
performed at start-up or during periodic link re-training, insures that the ring oscillator 
free-running frequency is at the desired forwarded eighth-rate clock frequency. This also 
ensures that the ring oscillator operates near the center of the locking range before 
injection, and has enough tuning range to provide either positive or negative skew. 
 
 
 
Fig. 3.19. Simulated impact of clock injection approach on phase spacing uniformity. 
1 2 3 4 5 6 7 8
0.96
0.98
1
1.02
1.04
1.06
Phase
P
h
a
s
e
 S
p
a
c
in
g
 [
U
I]
 
 
1X I injection
2X I injection
4X I injection
AC injection
 53 
 
 The forwarded differential clock is first buffered and converted to full scale before 
being distributed to the ILRO. In order to support different data rates and channel 
conditions, 4-bit amplitude control is included in the clock input buffer. The buffered 
clocks are then injected into two complementary oscillator stages through coupling 
capacitors, with dummy capacitors placed at the other stages to equalize the load 
capacitances. Fixed injection strength is used for this design in order to minimize 
excessive phase spacing errors. As shown in the simulation results of Fig. 3.19, this 
fixed-strength AC-coupled injection approach results in a more uniform phase spacing 
compared to DC-coupled injection schemes that use V/I converters, such as the 
technique incorporated in [52], while exhibiting a similar locking range. Similar to the 
transmitter multi-phase clocking paths, capacitive DACs in the clock buffer stages 
following the ILO compensate for phase spacing errors. 
III.5. Experimental Results 
 
 
 
Fig. 3.20. I/O transceiver chip micrograph.  
 
Global Impedance 
Controller
Cascade
PPF
CLK
Dis
PRBS
8:4 
MUX
Voltage
Regulator
VM OD
Pre Driv
TX 0
Cascade
PPF
CLK
Dis
PRBS
8:4 
MUX
Voltage
Regulator
VM OD
Pre Driv
TX 1
ILRO
QuantizersCTLE
RX
 54 
 
 
(a) 
 
 
(b) 
 
Fig. 3.21. (a) Measurement Setup. (b) Testing PCB board. 
 
 
The transceiver was fabricated in a 65nm CMOS general purpose process. As shown 
in the die micrograph of Fig. 3.20, the total active area for the transmitter is 214×104 
µm2, the global impedance controller is 140×31 µm2, and the receiver is 139×230 µm2, 
for a total transceiver area of 0.057 mm2 and a bandwidth density of 0.007 mm2/Gb/s. 
 55 
 
Conservatively considering a minimum of 4 wire-bond pads at a 100 µm pitch for the 
differential TX and RX data signals, the design has a circuit/pad area ratio of 2.9, and 
could be considered active-area limited. While if the design was implemented with 
coarser-pitch C4 bumps [29], the circuit/bump area ratio falls to 0.46 for 4 C4 bumps, 
and could be considered bump-limited. Given the slower pitch scaling of both bondpads 
and C4 bumps, this architecture is projected to be both pad and bump limited in a 22 nm 
CMOS node. 
 
 
                               (a)                                                                      (b) 
 
 
(c) 
 
Fig. 3.22. (a) 4.8 Gb/s, (b) 6.4 Gb/s, and (c) 8 Gb/s transmitter output eye diagrams.  
 56 
 
 A chip-on-board test setup is utilized, with the die directly wirebonded to the FR4 
board as shown in Fig 3.21. In order to demonstrate the transmitter functionality, the eye 
diagrams of Fig. 3.22 are produced with a short 1.5” channel. In order to demonstrate 
transmitter operation, both the transmitter scalable power supply and output swing are 
optimized at a given data rate to achieve a minimum 40 mVppd eye height and 0.6 UI eye 
width at the channel output, with 0.65 V and a 150 mVREF DC output swing at 6.4 Gb/s. 
The clock signal is generated by fix data patterns at 8 Gbps, and both duty cycle and 
clock jitters are shown in Fig. 3.23. It shows 49.2 % duty cycle and 19 ps peak to peak 
jitter.  
 
 
                               (a)                                                                      (b) 
 
Fig. 3.23. Clock pattern (1010...) at 8 Gb/s Data rates (a) duty cycle (b) clock jitter. 
 
 
Fig. 3.24 shows the results of the 4:1 output-multiplexing transmitter for its phase-
spacing mismatch versus the scalable power supply. Phase spacing mismatches increase 
 57 
 
with higher data rate, resulting in a minimum supply voltage for an acceptable phase 
DNL at a given data rate. Duty-cycle control circuitry and tunable-delay quadrature 
clock buffers allow for calibration that improves phase DNL. For example, calibration at 
6.4 Gb/s and 0.65 V improves from the max phase DNL from 28 % UI to 15 % UI, with 
further improvement limited by an oversight in the chip layout that resulted in 
asymmetrical clock routing. 
 
  
 
Fig. 3.24. 4:1 output-multiplexing transmitter phase spacing maximum DNL versus 
supply voltage. 
  
 
Fig. 3.25 shows the effectiveness of the impedance loop, where both ZUP and ZDN are 
between 48 to 59 Ω as the output swing, VREF, varies from 100-200 mVppd. While tighter 
impedance control is not essential [80], this could be achieved by sizing the output 
0.65 0.7 0.75 0.8
0
5
10
15
20
25
30
DVDD [V]
M
A
X
 D
N
L
 [
%
U
I]
 
 
4.8Gb/s
6.4Gb/s
6.4Gb/s-Cal
8Gb/s-Cal
6.4Gb/s w/Calibration
 58 
 
drivers’ impedance control transistors to achieve a wider tuning range, at the cost of 
larger switch transistors and increased dynamic power. 
Fig. 3.26 shows the measured de-skew range of the receiver ILRO versus data rate. 
When normalized to the clock period, the achievable de-skew range is more than 120° 
across the entire operating range. Since in the 1:8 de-multiplexing receiver 1UI is 45°, 
this translates into a de-skew range that exceeds 2 UI. 
 
 
 
Fig. 3.25.  Transmitter output impedance versus VREF.  
 
100 125 150 175 200
46
48
50
52
54
56
58
60
VREF [mV]
IM
P
E
D
A
N
C
E
 [
O
h
m
s
]
 
 
ZUP
ZDN
 59 
 
 
 
Fig. 3.26. Receiver de-skew range. 
 
 
 
 
 
 
Fig. 3.27. Frequency response of 3.5” FR4 trace and interconnect cables. 
 
4.8 5.6 6.4 8
400
500
600
700
800
D
e
s
k
e
w
 R
a
n
g
e
 [
p
s
]
Data Rate [Gb/s]
 
 
0
30
60
90
120
150
180
N
o
rm
a
li
z
e
d
 D
e
s
k
e
w
 R
a
n
g
e
 [
d
e
g
]
0 2 4 6 8 10
-50
-40
-30
-20
-10
0
Frequency [GHz] 
S
2
1
 [
d
B
]
 60 
 
 
                                     (a)                                                                (b)  
 
Fig. 3.28. (a) Transceiver BER performance with optimal TX/RX supply voltages and 
CTLE settings, (b) transceiver BER with minimum CTLE peaking settings. 
 
 
Transceiver performance is verified with BER measurements of 27-1 PRBS data over 
the channel shown in Fig. 3.27, which consists of a 1.5 inch FR4 TX-side trace, a 0.5 m 
SMA cable, and a 2 inch FR4 RX-side trace, and displays -8.4 dB loss at 4 GHz. BER 
results with optimized TX/RX supply voltages, TX output swing, and CTLE settings are 
shown in Fig. 3.28 (a), and CTLE performance impact is shown in Fig. 3.28 (b). A fixed 
130fF capacitor and a programmable 100-650 Ω resistor makes up the CTLE 
degeneration network. At 4.8 Gb/s, a 16 % UI timing margin is achieved with a 100 
mVppd TX swing and the minimum 100Ω CTLE degeneration resistor setting. While the 
CTLE could perhaps be eliminated at 4.8 Gb/s, operation at 6.4 Gb/s requires 350 Ω 
degeneration and 8Gb/s requires the maximum 650 Ω setting. Due to the channel loss 
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
10
-10
10
-5
10
0
Unit Interval [UI] 
B
it
 E
rr
o
r 
R
a
te
 
 
4.8Gb/s(TX Swing=100mVppd, Min CTLE Peaking)
6.4Gb/s(TX Swing=150mVppd)
8Gb/s(TX Swing=200mVppd)
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
10
-10
10
-5
10
0
Unit Interval [UI] 
B
it
 E
rr
o
r 
R
a
te
 
 
4.8Gb/s(TX Swing=100mVppd)
6.4Gb/s(TX Swing=200mVppd)
8Gb/s(TX Swing=200mVppd)
 61 
 
and increased sensitivity to phase mismatches, the required transmit swing is increased 
to 150 mVppd and 200 mVppd at 6.4 Gb/s and 8 Gb/s, respectively. 
 
 
 
Fig. 3.29. Transceiver energy efficiency versus data rate.  
Fig. 3.29 shows transceiver energy efficiency measurement results at various data 
rates and supply voltages. The transmitter and receiver supply is equal at 0.6 V and 0.65 
V for 4.8 Gb/s and 6.4 Gb/s, respectively. However in order to achieve 8 Gb/s operation, 
the transmitter requires a slightly higher 0.8 V supply to maintain sufficient margin in 
the 4:1 output multiplexing phase spacing, which has a greater impact on the output 
transmitter eye at high data rates due to the low-pass filtering of the high-speed off-chip 
data. While the receiver CTLE and quantizers would work fine at this 0.8 V supply at 8 
Gb/s, unfortunately this voltage is somewhat high for the ILRO and pushes the injection 
4.8 6.4 8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Data Rate [Gb/s] 
E
n
e
rg
y
 E
ff
ic
ie
n
c
y
 [
p
J
/b
]
 
 
TX+RX
TX
RX
TX and RX 
(VDD=0.6V)
TX (VDD=0.8V)
RX (VDD=0.75V)
TX and RX 
(VDD=0.65V)
 62 
 
lock range above 1 GHz. Thus, 0.75 V is required at the receiver to allow the ILRO to 
operate at the 1 GHz frequency required for 8 Gb/s operation. In the event the I/O 
system demands that the transmitter and receiver operate with equal supply voltages, this 
could be achieved by adding switchable capacitor loads to the ILRO. While the 
transceiver operates at the lowest voltage at 4.8 Gb/s, optimal energy efficiency is 
achieved at 6.4 Gb/s due to the amortization of the static power consumed in the final 
output line driver. Table 3.1 shows the measured transceiver power breakdown at 6.4 
Gb/s. The total transceiver energy-efficiency is 0.47 pJ/b, with 0.3 pJ/b and 0.17 pJ/b 
efficiency achieved in the transmitter and receiver, respectively.  
 
 
Table 3.1: Transceiver power breakdown at 6.4 Gb/s 
 
TX Power Breakdown (6.4 Gb/s at 0.65 V) 
LDO & Output Driver 
(150mVppd) 
793 uW 
Serializer, Pre-drivers, Clocking 933 uW 
Global Impedance Control 
(amortized across 9 TX) 
193 uW 
TX Energy Efficiency 0.3 pJ/b 
RX Power Breakdown (6.4 Gb/s at 0.65 V) 
CTLE, Quantizers, ILRO 1.07 mW 
Clock Distribution 38 uW 
RX Energy Efficiency 0.17 pJ/b 
Total Energy Efficiency 0.47 pJ/b 
 
 
 
 
 
 
 
 63 
 
Table 3.2: Low-power I/O transceiver comparisons 
 
 [29] [41] This Work 
Technology 45 nm CMOS 90 nm CMOS 65 nm CMOS 
Supply Voltage 0.8 V/1.5 V 1.2 V 0.6-0.8 V 
Data Rate 10 Gb/s 0.5-4 Gb/s 4.8-8 Gb/s 
Clocking 
Source- 
Synchronous 
Plesiochronous 
Source- 
Synchronous 
Energy Efficiency 1.4pJ/b @ 10Gb/s 1.9 pJ/b @ 3.2Gb/s 0.47pJ/b @ 6.4Gb/s 
Transmitter 
Driver 
CML 
2:1 Input MUX 
VM 
2:1 Input MUX 
VM 
4:1 Output MUX 
Swing 150 mVppd 100 mVppd 100-200 mVppd 
Equalization 2-Tap FFE None None 
Energy Efficiency 0.65 pJ/b 0.6 pJ/b 0.3 pJ/b 
Channel 
 2"HDI Not Reported 3.5"FR4+0.5 SMA 
Loss at Nyqu Freq 8 dB  8.4 dB 
Receiver 
Equalization None CTLE CTLE 
Energy Efficiency 0.75 pJ/b 1.3 pJ/b 0.17 pJ/b 
 
 
Table 3.2 compares this design with recent energy-efficient serial links that either 
employ source-synchronous clocking [29] or utilize a voltage-mode driver [41]. On the 
transmitter side, compared to the current-mode output driver in [29] and conventional 
2:1 input multiplexing voltage-mode output driver in [41], the 4:1 output multiplexing 
voltage-mode driver in this design improves energy efficiency by more than 50%. On 
the receiver side, supply scaling and the use of ILRO have also resulted in significant 
 64 
 
power efficiency improvements over similar designs with linear equalization to 
compensate for moderate-loss channels. 
III.6. Summary 
This chapter presented an energy-efficient transceiver architecture that operates at 
low supply voltages. In order to reduce the transmitter dynamic power consumption, a 
passive poly-phase filter is utilized to produce the multi-phase clocks that switch a 4:1 
output-multiplexing voltage-mode driver. A low power-supply linear regulator with 
negative-resistance gain-boosting allows further improvement in transmitter energy 
efficiency. In the forwarded-clock receiver, the use of injection-locked oscillator de-
skew and a high 1:8 de-multiplexing ratio receiver architecture allows operation at low 
supply voltages. Overall, this I/O architecture provides scalable voltage and data rate 
operation at energy-efficiency levels demanded by future systems. 
 
 
 
 
 
 
 
 
 
 
 65 
 
IV. HYBRID VOLTAGE-MODE TRANSMITTER WITH CURRENT MODE 
EQUALIZATION  
IV.1. Introduction 
A large percentage of serial link power is often consumed in the transmitter, which 
must provide adequate signal swing on the low impedance channel, maintain proper 
source termination, and include equalization to compensate for channel frequency-
dependent loss. In low-power designs, the output driver often consumes the majority of 
the static power due to the low impedance channel. This leads link architects to consider 
voltage-mode drivers to improve energy efficiency, as with differential receiver-side 
termination, these drivers have the potential to consume one-quarter of the output stage 
power relative to conventional current-mode drivers [57]. 
While obtaining significant improvements in I/O energy efficiency will require 
improvements in electrical channel loss characteristics [57], the ability to efficiently 
include some transmit equalization allows for more loss compensation and increased 
flexibility in equalization circuitry partitioning. However, the potential power savings of 
voltage mode drivers generally degrades with the introduction of transmit equalization 
and overheads associated with maintaining proper source termination. In order to 
generate the different output voltage levels for transmit equalization, significant output 
stage segmentation is required in voltage-mode drivers which implement resistor divider 
[39], [55], [80], and channel shunting approaches [58]. This segmentation increases pre-
driver complexity, resulting in degraded dynamic power consumption. Additional output 
stage segmentation is often implemented to digitally tune the driver termination to match 
 66 
 
the channel [58], further degrading energy efficiency. While analog control loops which 
scale the pre-driver supply can be utilized to set the driver output impedance [41], [55], 
[56], this doesn’t allow independent optimization of the pre-driver supply to minimize 
dynamic power with data rate. 
 
 
 
Fig. 4.1.  The proposed transmitter for clock forwarded link. 
 
 
This section presents a hybrid voltage-mode transmitter with current-mode 
equalization, which enables independent control over termination impedance, 
equalization settings, and pre-driver supply, allowing for a significant reduction in pre-
driver complexity and power in clock forwarded link shown in Fig. 4.1. Transmitter 
equalization techniques are reviewed in following section, with a comparison of the 
hybrid transmitter with voltage-mode and current-mode drivers. In addition it will show 
details the transmitter architecture, which includes local clocking circuitry with duty-
cycle correction, low-complexity scalable-supply serialization and pre-driver, hybrid 
 67 
 
driver, and global impedance control. Also, experimental results from an LP 90 nm 
CMOS prototype are presented.  
 IV.2. Proposed Transmitter Equalization Techniques 
To eliminate extra power consumption during de-emphasis [55], the output stage of 
transmitter is designed with the inclusion of an additional shunting resistor network as 
shown in Fig 4.2 (a) [58]. Fig 4.2 (b) illustrates how the extra Rs resistors are able to 
maintain constant current consumption as varies equalization coefficient, and those 
resistor values are decided by following equations. 
   
    
       
 (4-1) 
   
    
     
 (4-2) 
   
    
        
 (4-3) 
where Zo is channel characteristic impedance, and α is equalization coefficient, and 
Rp//Rn//RS is always equal to Zo. Although adding an additional resistor allows constant 
current consumption, which as shown in Eq. 4-4, it significantly increases predriver 
complexity while three parallel resistor are matching at channel characteristic impedance,   
                
    
    
 
(4-4) 
 
where both Ivpp,max and Ivpp,min represent a current with both maximum and minimum 
differential output swing levels.  
 68 
 
The main drawback associated with these voltage-mode driver designs involves the 
overhead in the predrive logic required to distribute the tap weights among the segments, 
which grows with equalization resolution.  
 
 
(a) 
 
2Zo=100
RP
VREF
TXP-TXN
X[n] = X[n-1]
RN
RN RP
RN
RP RN
RP
Rs Rs
VREF/2
2Zo=100
RP
VREF
TXP-TXN
RN
RN RP
RN
RP RN
RP
Rs Rs
VREF/2
X[n] ≠ X[n-1]
  
(b) 
 
Fig. 4.2.  (a) Implementation of 2-tap FIR equalization in low-swing voltage-mode 
driver with shunting resistor network  (b) equivalent output driver circuitry. 
 
 69 
 
Due to advanced CMOS technology, the data rate is constantly increased and digital 
dynamic power, also rises along with it. Therefore, the power consumed by the complex 
predriver and segment selection logic necessary to support these voltage mode drivers 
with equalization reduce any benefit from reduced transmitter output signaling power. 
 
 
 
(a) 
 
X[n] ≠ X[n-1]
2Zo=100
VREF
TXP-TXN
X[n] = X[n-1]
RTXRTX
RTX RTX
AVDD AVDD
2Zo=100
VREF
TXP-TXN
RTXRTX
RTX RTX
AVDD AVDD
 
 
(b) 
 
Fig. 4.3. (a) Implementation of 2-Tap FIR equalization in proposed low-swing voltage 
mode driver with current-mode equalization and (b) equivalent output driver circuitry. 
 
 
 70 
 
Fig. 4.3(a) shows a simplified schematic of the hybrid driver proposed in this work 
which combines the low output current levels of a voltage-mode driver to implement the 
main tap and a parallel current-mode driver to implement the post-cursor tap with 
minimal predriver complexity. While parallel current drivers have previously been 
implemented with voltage-mode drivers as swing enhancers [39], this implementation 
improves driver energy efficiency by eliminating the voltage-mode driver segmentation 
as the equalization coefficient is set via the current-mode driver tail DAC setting. 
In addition, it eliminates the current shunt path, reducing the current by 14.3 % 
current when it operates in de-emphasis mode compared to previous work [55]. 
Furthermore, it maintains transmitter termination impedance, which is channel 
characteristic Zo due to high output impedance of extra differential pair.  
For the hybrid 2-tap driver, the voltage-mode output stage reference voltage is 
reduced to a value of Vppd.max*(1-) and the maximum swing is  
          
  
      
                         
(4-5) 
 
where RTX is transmitter impedance.  
Therefore, the current value of Vppd.max is  
        
    
    
 
(4-6) 
 
The minimum differential voltage swing is 
          
  
      
                         
(4-7) 
 71 
 
 
Therefore, the current value of Vppd.min is 
        
    
    
       
(4-8) 
 
 
 
Table 4.1: Transmitter 2-Tap equalization comparisons (Vppd,max = 400 mV, Vppd,min = 
200 mV, α = 0.25, and Zo = 50 Ω ) 
 
 
[55]  [58] [57] Proposed TX 
IVppd,max 4Zo
maxVppd,  2mA 
4Zo
maxVppd,  2mA 
Zo
maxVppd,  8mA 
4Zo
maxVppd,  2mA 
IVppd,min ))1(41(4Zo
maxVppd,
   3.5mA 
4Zo
maxVppd,  2mA 
Zo
maxVppd,  8mA )21(
4Zo
maxVppd,
  3mA 
∆I ))1(4(
4Zo
maxVppd,
   1.5mA 0  0  0  0  )2(
4Zo
maxVppd,
  1mA 
RTX Zo  50Ω Zo  50Ω Zo  50Ω Zo  50Ω 
VREF maxVppd,  400mV maxVppd,  400mV - - )-max(1Vppd,   300mV 
PreDriver 
Complexity 
High   High   Simple   Simple   
*Zo: channel characteristic impedance, α: equalization coefficient, Vppd,max: differential peak-to-peak maximum swing, Vppd,min: 
differential peak-to-peak minimum swing, IVppd,max: current with maximum differential output swing level, IVppd,min: current with 
minimum differential output swing level, ∆I = | IVppd,max - IVppd,min|,  RTX: transmitter termination impedance, VREF: output driver 
reference voltage  
 
Table 4-1 shows the summary of previous voltage mode driver work with 
equalization, current mode driver, and proposed transmitter analysis and an example of 
the current consumption, termination impedance, and complexity of predrivers.   
Note that the current drawn from the output driver supply, VREF, varies with output 
level, with all current flowing out into the channel during the maximum output swing 
and a portion being sunk at the transmitter during the de-emphasized level. This current 
variation can be a problem since it necessitates more stringent voltage regulation of the 
VREF supply. While a constant current draw is achieved in [5] by switching both a shunt 
 72 
 
resistor network in addition to the main output transistors, it significantly increases pre-
driver complexity. 
In Fig. 4.4 shows the comparison of the transmitters’ output driver static power 
versus normalized de-emphasis swing levels. Proposed voltage mode driver with current 
mode equalization reduces signaling power compared to current mode driver with 2-tap 
equalization [57] and voltage mode driver with resistor divider equalization [55], and, it 
uses more current than voltage mode driver with series R implementation [58]. However, 
as mentioned earlier, the proposed architecture eliminates high speed encoder in data 
path, which reduces significantly digital dynamic power with fine equalization resolution. 
 
 
 
Fig. 4.4. Normalized transmitter output driver static power comparison. 
  
 
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
Vdpp.min/Vdpp.max 
N
o
rm
a
li
z
e
d
 P
o
w
e
r
 
 
CM EQ
VM EQ BY Rdiv
VM EQ BY Rs and Rdiv
VM EQ BY I
 73 
 
Although 2-tap equalization is implemented in this prototype due to the intended 
low/medium-loss channel application, the proposed equalization scheme can easily 
extend to a multi-tap implementation with additional parallel current drivers placed in 
parallel to implement additional taps. For example, the simulation results, shown in Fig. 
4.5 demonstrates the operation of  a 3-tap version with α1=0.1 and α2=0.1 
                              
(4-9) 
where X[n] is current data bit, X[n-1] is 1UI delay bit, and X[n-2] is 2 UI delay bit. 
 
 
 
 
Fig. 4.5. Schematic simulation eye diagram of proposed 3-tap transmitter with 1 main 
tap and two post cursor taps. 
 
 
 
IV.3. Proposed Transmitter Architecture 
Fig. 4.6 shows the block diagram of the serial link transmitter which utilizes two 
power supplies, a fixed 1.2 V AVDD and a scalable DVDD. The local clock distribution, 
 74 
 
serialization MUXes, and pre-driver buffers are powered from DVDD which is scaled 
with data rate in order to improve the transmitter power efficiency. While an external 
supply was used for DVDD in this design, an adaptive switching regulator [72] could 
efficiently generate this scalable supply. A fixed 1.2 V AVDD supply is used to supply 
sufficient voltage headroom for the voltage-mode output stage regulator, current-mode 
equalizer stage, and the global impedance controller.  
 
 
 
Fig. 4.6.  TX block diagram. 
 
 
Two bits of parallel input data from on-die test circuitry capable of generating either a 
215-1 PRBS or 16-bit fixed data pattern serve as the input to the half-rate output stage. 
The output stage includes two sets of 2:1 muxes to implement a 2-tap FIR equalization 
 75 
 
filter, with the top mux driving the main cursor voltage-mode driver and the bottom mux 
driving the post-cursor current-mode equalizer stage. In order to reduce power 
consumption for operation when equalization is not necessary, the data in the equalizer 
path is gated to disable the equalization serializer and any output equalization current. 
The detail scheme that explains differential implementation of 4:2 MUX and 2:1 MUX 
with 1 UI data delay cell for equalization including power-down capability is shown in 
Fig. 4.7.  Also, all digital logic designed with CMOS logic configuration instead of CML 
due to power saving benefit. 
 
 
 
Fig. 4.7.  Implementation 4:2 MUX and differential 2:1 MUXs with 1 UI delay. 
 76 
 
In order to provide compatibility with low-swing global clock distribution present in 
low-power multi-channel link systems, an AC-coupled CML-to-CMOS local clock 
distribution stage generates the serializer clocks. The transmitter utilizes inverter-based 
clock buffers with 4-bit digitally-adjustable pMOS/nMOS size ratio in order to tune out 
errors in input duty cycle and clock distribution network mismatches, allowing the 
output duty cycle to be corrected to within 1 % over a data rate range of 2-6 Gb/s.  
After serialization with half-rate clocks, the main cursor data signals drive the 
switches (M2 and M3) of an nMOS low-swing voltage-mode driver, while the delayed 
data signals drive the switches of a pMOS differential current-mode driver to implement 
the post-cursor tap, which is shown in Fig. 4.8.  
 
 
 
Fig. 4.8.  Hybrid voltage-mode driver with current mode equalization. 
 77 
 
Here equalization adjustment is possible with minimal overhead, with the tail current 
source of the current-mode stage having 4-bit binary control. A reference current 
switchable between 60 to 120 µA allows for the addition of a total equalization current 
of 0.9 to 1.8 mA into the output stage at 4-bit resolution. The equalization current is 
steered between the driver outputs by switching the pMOS output switches, which are 
sized to handle the maximum equalization current setting. This allows the use of a single 
non-segmented pre-driver to switch the pMOS output switches, greatly simplifying the 
output driver pre-drive complexity relative to other voltage-mode drivers which include 
equalization taps [39], [55], [58], [80]. Higher resolution is achievable with ideally no 
power overhead simply by increasing the tail current DAC bits. While this design is 
intended for low/medium loss channels, and thus only implemented two taps, the scheme 
is easily extendable to higher tap values with additional parallel current drivers. 
The driver pull-up impedance, ZUP, is set by the M2 top switches and an additional 
shared M1 transistor whose gate is controlled by VZUP, while the pull-down impedance, 
ZDN, is set by the M3 bottom switches and an additional shared M4 transistor whose gate 
is controlled by VZDN. A global impedance control loop allows for both the driver ZUP 
and ZDN impedance to be set near the channel impedance by utilizing a replica 
transmitter with dual feedback amplifiers that forces VZUP to a value consistent with a 
high output level of  
        
       
           
       
(4-10) 
 
and sets VZDN to a value consistent with a low output level of  
 78 
 
        
   
           
       
(4-11) 
          
In this design an external supply was used for the adjustable reference voltage, VREF, 
that sets the output driver swing and an on-chip resistive divider generates the 
impedance control loop UPVREF and DNVREF signals from VREF.  
While other voltage-mode impedance control schemes primarily utilize the pre-driver 
supply voltage [41], [55], [56] the method implemented in this work allows the pre-drive 
swing value (DVDD) to be decoupled from the impedance control, providing a degree of 
freedom to allow for potential pre-drive voltage scaling for improved energy efficiency. 
In order to reduce the gate capacitance of the switch transistors and save power, this 
design intentionally targets a 60 Ω single-ended output impedance. While not an exact 
channel match, this still provides a simulated low-frequency return loss of -22 dB, which 
meets industry-standard return loss specifications shown in Fig 4.9 [56]. 
 79 
 
 
 
Fig. 4.9.  Simulated return loss for transmitter and the CEI-SR return loss limit. 
 
 
Simulation with measured backplane channel models that have loss -6.4 dB at 3 GHz , 
in Fig 4.10, indicate that the eye height degradation is 0.4 % with the 60 Ω implemented 
driver in Fig 4.11 (b), relative to a 50 Ω design shown in Fig 4.11 (a). In addition, 
simulations with measured backplane channel models that have loss up to -10dB at 3 
GHz, which channel frequency response is shown in Fig 4.12, indicate that the eye 
height degradation is less than 3 % with the 60 Ω implemented driver in Fig 4.13 (b), 
relative to a 50 Ω design in Fig 4.13 (a). While this would require increasing the output 
swing in order to maintain the same eye height, overall the power saved with the smaller 
pre-driver and clock buffers results in a more power efficient design. 
 
 80 
 
 
 
Fig. 4.10. S21 response for Channel with -6.4 dB loss at 3 GHz. 
 
 
 
 
 
 
                                       (a)                                                             (b) 
 
Fig. 4.11.  Transmitter schematic simulation result (a) eye diagram TX 50 ohms 
termination at 6 Gb/s (b) eye diagram TX 60 ohms Termination at 6 Gb/s. 
 
 81 
 
 
 
Fig. 4.12.  S21 response for channel with -10 dB loss at 3 GHz. 
 
 
 
                                       
                                        (a)                                                             (b)   
 
Fig. 4.13.  Transmitter schematic simulation result (a) eye diagram TX 50 ohms 
termination at 6 Gb/s (b) eye diagram TX 60 ohms Termination at 6 Gb/s. 
 
 
An on-chip linear voltage regulator sets the power supply of the voltage-mode driver 
to a value VREF, which is equal to the peak-to-peak differential output swing without 
equalization, and allows for an adjustable output swing from 100-400m Vppd. The linear 
voltage regulator is designed by two stages, which the first stage is level shifter, and the 
 82 
 
second stage is conventional amplifier with current mirror load shown in Fig 4. 14. The 
bandwidth of regulator has to be high in order to improve return loss performance at 
high frequency. The driver’s low common-mode output voltage allows for the regulator 
to have a source-follower output stage, which offers improved supply-noise rejection 
relative to common-source output stages [56]. The low output impedance of the source-
follower allows for the use of a 40 pF de-coupling capacitor to improve the power 
supply rejection ratio, while still maintaining stability. 
 
 
 
 
Fig. 4.14. Linear voltage regulator. 
 
 
IV. 4. Experimental Results 
In order to demonstrate transmitter performance, testing board is setup with signal 
generator, Agilent E8267D for clock signal generation, and high performance real time 
oscilloscope, DSA91304A for transmitter transient and eye diagram measure, which is 
 83 
 
shown  Fig. 4. 15. The transmitter was fabricated in an LP 90 nm CMOS process. As 
shown in the die photograph of Fig. 4.16, the total transmitter active area is 250 µm x 
140 µm. 
 
 
 
 
Fig. 4.15.  Measurement setup. 
 
 
 
 
 
 
Fig. 4.16.  Die photograph. 
 
 84 
 
 
 
Fig. 4.17. Low-frequency transmitter output waveform with 6 dB equalization. 
 
 
 
 
Fig. 4.18. Equalization peaking versus digital code for 400 mVppd peak output swing 
and 120 uA IREF. 
 
 
Fig. 4.17 shows low frequency output patterns with a peak output swing near 400 
mVppd and a maximum equalization value of 6 dB. The measured equalization settings 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
1
2
3
4
5
6
7
Digital Code 
E
q
u
a
li
z
a
ti
o
n
 [
d
B
]
 
 
LSB=Iref (Measurement)
LSB=Iref (Ideal)
 85 
 
match well with the linear in dBs value with a slope of 0.4 dB/code for a 400 mVppd max 
swing and a 120 µA reference current setting which is shown in Fig. 4.18. In the hybrid 
driver, if the current-mode equalization settings are increased beyond 6dB, the regulator 
is required to sink a portion of the equalization current. While this is not possible with 
the current regulator implementation, for low-power serial link transceivers which often 
also implement efficient receiver-side continuous-time linear equalizers, this level of 
transmit equalization is generally suitable for channels with 15-20 dB of loss at the 
Nyquist frequency. For increased equalization settings the regulator output stage can be 
modified to sink a portion of the equalization current for equalization settings above 6 
dB. 
 
 
 
 
(a)                                                       (b) 
 
Fig. 4.19.  6 Gb/s eye diagrams with a channel that has 4 dB loss at 3 GHz, (a) without 
equalization, and (b) with equalization. 
 
 86 
 
 
 
(a)                                                               (b) 
 
Fig. 4.20. Clock patterns (1010…) at 6 Gb/s data rates (a) without Equalization (b) 
with 6 dB Equalization. 
 
 
The transmitter transient performance at the maximum 6 Gb/s data rate is verified in 
the 215-1 PRBS eye diagrams with operation over a 3” FR4 channel with 4 dB loss at 3 
GHz, shown in Fig. 4.19.  
By enabling the current-mode equalization, improvement is achieved in both eye 
height, 127 mV to 163 mV, and eye width, 106 ps to 115 ps. Testing with 3 GHz fixed 
clock patterns show no significant degradation in output jitter with equalization enabled, 
with 3.43 psrms jitter without equalization and 3.17 psrms jitter with 6 dB equalization and 
the same 200 mVppd output swing, which is shown in Fig. 4. 20. With the addition of an 
SMA cable to the 3” FR4 channel, the total channel loss increases to 6 dB at 2.4 GHz 
and the performance with maximum equalization settings is verified in the 4.8 Gb/s eye 
diagram of Fig. 4.21. Again, improvement is achieved in both eye height, 87 mV to 146 
mV, and eye width, 123 ps to 150 ps. 
 87 
 
 
 
(a)                                                       (b) 
 
Fig. 4.21. 4.8 Gbps/ eye diagrams with a channel that has 6 dB loss at 2.4 GHz, (a) 
without equalization, and (b) with equalization. 
 
 
 
 
Fig. 4.22.  Measured clock duty cycle versus data rate. 
 
 
 88 
 
The transmitter utilizes clock buffers with digital-adjustable capacitive loads to tune 
out mimatches in input duty cycle and clock distribution network. This allows the 
trasmitter output duty cycle to be corrected to within ±1 % over a data rate range of 2-6 
Gb/s shown in Fig 4.22. In addtion, measured transmitter clock output waveforms is 
shown at 2.5 Gb/s, Fig 4.23(a) and at 6 Gb/s, Fig 4.23(b). 
 
 
 
 
(a)                                                       (b) 
 
Fig. 4.23.  Measured clock patterns ( 1010…) (a) at 2.5 Gbps and (b) at 6 Gb/s. 
 
 
 
Fig. 4.24 shows how ZUP and ZDN vary as the output swing without equalization, 
VREF, varies from 100 to 400 mVppd. Relative to the 60 Ω target output impedance, ZUP 
and ZDN vary by a maximum of 7 % and 10 %, respectively. This is due to the driver 
output impedance increasing because of the reduced amplifier gain at the higher VZUP 
and VZDN output voltages required as the output swing increases. 
 89 
 
 
 
Fig. 4.24.  Measured transmitter output impedance versus VREF. 
 
 
Fig. 4.25 illustrates the efficiency of the equalization technique implemented in the 
hybrid driver. For the 3” FR4 and cable channel used in the eye diagrams, transmitter 
energy efficiency versus data rate for a minimum channel output 50 mV eye height and 
0.6UI eye width are shown with and without equalization. For data rates of 4 Gb/s and 
lower, equalization is not required for the target eye opening and an optimal 1.11 pJ/b 
energy efficiency is achieved at 4 Gb/s. Including equalization improves overall eye 
margins, and is necessary above 4 Gb/s to achieve 0.6 UI eye width. Activating the 
equalization circuitry to achieve the target eye margins raises the energy efficiency by 
less than 0.2 pJ/b up to 6 Gb/s. 
 90 
 
 
 
Fig. 4.25.  Energy efficiency versus data rate for channel output 50 mV eye height and 
0.6 UI eye width. 
 
 
Table 4.2: Transmitter performance summary 
 
 
6 Gbps 4 Gbps 2 Gbps 
TX swing 
300 mV with  
3.72 dB EQ 
300 mV 100 mV 
Analog power Supply 1.2 V 1.2 V 1.2 V 
LDO & Output Driver 3.22 mW 2.84 mW 1.96 mW 
Global Impedance Control 
(amortized across 8 TX) 
219 uW 236 uW 187 uW 
DVDD 1.2 V 1 V 0.8 V 
Serializer, Pre-drivers, 
Clocking 
4.1 mW 1.79 mW 0.56 mW 
Energy Efficiency 1.26 pJ/b  1.22p J/b  1.36pJ/b  
 
 
 91 
 
Table 4.2 shows a measured power breakdown at different data rates and equalization 
conditions. For the 6 Gb/s settings used in the eye diagram, 1.26 pJ/b energy efficiency 
is achieved, with the largest power consumption from the 1.2 V DVDD supply. As the 
data rate is dropped to 2 Gb/s, significant DVDD power savings are achieved by 
reducing the supply to 0.8 V. However, the total transmitter energy efficiency is 
dominated by the output stage power and 1.36 pJ/b is achieved with 100 mVppd output 
swing and no equalization.  
 
Table 4.3: Transmitter performance comparisons 
 
 
 [41]  [55] This Work 
Technology 
90 nm 
CMOS 
0.18 um 
CMOS 
90 nm  
CMOS 
Supply Voltage 1.2 V 1.8 V 0.8~1.2 V 
Data Rate 0.5~4 Gb/s 3.6 Gb/s 2~6 Gb/s 
TX Swing 
100 
mVppd 
250 
mVpps 
100 mVppd~ 
400 mVppd 
Equalization None 2-Tap FIR 2-Tap FIR 
Energy Efficiency 
0.6 p/b 
@3.2 Gb/s  
2.68 pJ/b 
@3.6 Gb/s  
1.26 pJ/b  
@6 Gb/s  
 
 
Table 4.3 compares this design with other low-swing voltage-mode transmitters. 
Relative to the design of [7] which was implemented in a similar process, the presented 
design allows for higher data rate operation with the efficient inclusion of 2-tap FIR 
equalization and four times the output swing. The efficiency of the equalization is 
evident by comparing this work with [2], which implemented 2-tap output equalization 
 92 
 
via a segmented resistor divider approach. 
IV.5. Summary 
This chapter presented a hybrid voltage-mode transmitter with current-mode 
equalization, which enables independent control over termination impedance, 
equalization settings, and pre-driver supply. By controlling the equalization settings with 
a tail current source DAC in the parallel current-mode driver, segmentation is eliminated 
in the voltage-mode output stage, allowing for significant reduction in pre-driver 
complexity and power. Output impedance control is maintained in a manner compatible 
with supply scaling with additional series transistors in the voltage-mode output stage 
which are controlled by a global impedance control loop. These techniques allow for 
efficient transmit equalization over a wide range of data rate, supply voltages, and output 
swing levels. 
 
 
 
 
 
 
 
 
 
 
 93 
 
V. IMPEDANCE-MODULATED VOLTAGE-MODE TRANSMITTER WITH 
FAST POWER STATE TRANSITIONING 
V.1. Introduction 
Supporting the dramatic growth in high-performance and mobile processors’ I/O 
bandwidth [1], [73] requires per-channel data rates to increase well beyond 10Gb/s due 
to packaging technology allowing only modest increases in I/O channel number. At 
these relatively high data rates, complying with thermal design power limits in high-
performance systems and battery lifetime requirements in mobile platforms necessitates 
improvements in I/O circuit energy efficiency [29], [53] and dynamic power 
management techniques [29], [73]. 
Serial-link transmitters consume both significant dynamic power due to the high-
speed serialization operation and static power due to driving the low-impedance channel. 
The inclusion of equalization at high data rates to compensate for frequency-dependent 
channel loss adds to the design complexity and power consumption. Circuit and parasitic 
mismatch also create challenges in long-distance clock distribution and maintaining 
proper phase spacing for the critical serialization clocks which determine the output eye 
quality. In order to improve I/O energy efficiency at high data rates, improvements in 
static and dynamic power consumption are required in a manner that allows for robust 
operation at both low-voltage and with the growing mismatch found in nanometer 
CMOS technologies. 
Significant static power savings are possible by utilizing low-swing voltage-mode 
drivers [53], [55], [84] as differential channel termination allows the same output voltage 
 94 
 
swing at one-quarter the current consumption of current-mode drivers. However, 
implementing transmit equalization with voltage-mode drivers is generally more 
difficult, with resistive divider [55], channel-shunting [58], [85], impedance-modulation 
[80], and hybrid current-mode [84] approaches being proposed. These topologies often 
set the equalizer taps’ weighting via output stage segmentation [55], [58], [80], [85], 
which adds complexity to the high-speed predriver circuitry and degrades the transmitter 
dynamic power efficiency. 
Scaling the power supply voltage with data rate is an effective technique to achieve 
non-linear dynamic power-scaling at reduced-speeds [57], [72]. While architectures 
which utilize a high multiplexing factor allow for reduced frequency operation of the 
transmit slices, and thus the potential for low supply voltages, they are more sensitive to 
timing offsets amongst the multiple clock phases [53], [72], [86]. Furthermore, efficient 
generation and distribution of these multi-phase clocks is challenging in large channel-
count transmitters. 
Another effective approach to saving I/O power is to dynamically operate the 
required number of channels in a burst-mode manner based on the system bandwidth 
demand at a given time [73]. In order to effectively leverage this technique, transmitters 
with rapid turn-on/off capabilities are necessary. It is important to quickly disable both 
switching and static power, which can be particularly challenging with voltage-mode 
drivers due to output-stage regulator de-coupling capacitance. 
 
 95 
 
V.2. Low Power Transmitter Design Techniques 
A typical low-power multi-channel serial-link transmitter architecture is shown in 
Fig. 5.1. In order to amortize clocking power, the output of a global clock generation 
circuit, such as a phase-locked loop (PLL), is distributed to all of the transmit channels. 
Here efficient global clock distribution techniques, such as low-swing CML signaling 
[56], [57], are often employed in high channel count systems which span several mm. 
Each transmit channel performs parallel data serialization, implements equalization to 
compensate for frequency-dependent channel loss, and allows for dynamic power 
management (DPM) with rapid turn-on/off capabilities. This section reviews key low-
power design techniques employed in this design, including capacitively-driven wires 
for long-distance clock distribution [87] and impedance-modulation equalization [80]. 
 
ON
OFF
S
e
ri
a
li
z
e
r
TX PLL
TX Out-Driver
with FIR
 Equalization
N
Pre-Driver
f
RRX
N
CML to CMOS
BUFF CLK Distribution
Dynamic Power
Management
 
Fig. 5.1. Multi-channel serial-link transmitter architecture. 
 96 
 
V.2.1. Global clock distribution 
 
Global
PLL
CML to 
CMOS
TX1
CML to 
CMOS
TXN
CML to 
CMOS
CLKTX
Global
PLL
CML to 
CMOS
TX1
CML to 
CMOS
TXN
CML to 
CMOS
CLKTX
50Ω
50Ω
 
(a)                                                                            (b) 
 
Fig. 5.2. Low swing global clock distribution techniques: (a) CML buffer driving 
resistively-terminated on-die transmission line, (b) CMOS buffer driving distribution 
wire through a series coupling capacitor. 
 
 
Distributing high-frequency clock signals over on-chip wires with multi-millimeter 
lengths is challenging due to wire RC parasitic that limit bandwidth, resulting in 
amplified input jitter and excessive power dissipation with repeated full-swing CMOS 
signaling [54]. As shown in Fig. 5. 2(a), in order to reduce clocking power and avoid 
excessive jitter accumulation, low-swing non-repeated global clock distribution with an 
open-drain CML buffer driving on-die restively-terminated transmission lines has been 
previously implemented [57]. However, maintaining a minimum clock swing at high 
frequencies can still result in significant static power dissipation due to the transmission 
lines’ loss and relatively low-impedance. While reduction of this static power is possible 
with inductive termination of the distribution wire [56], this creates a narrow-band 
resonant structure that prohibits scaling the per-channel data rates over a wide range. 
 97 
 
Another non-repeated technique to drive long wires involves AC-coupling a full-swing 
CMOS driver to the distribution wire through a series capacitor, as shown in Fig. 5.2(b). 
Relative to simple DC-coupling, this technique allows for smaller drivers due to the 
reduced effective load capacitance, savings in signaling power due to the reduced 
voltage swing on the long-wire, and bandwidth extension due to the inherent pre-
emphasis caused by the wire resistance [87].  
 
0 5 10 15
0.1
0.15
0.2
0.25
Frequency [GHz]
O
u
tp
u
t 
S
w
in
g
 [
V
p
p
s
]
 
 
Cap Driven
CML
0 5 10 15
0
2
4
6
8
Frequency [GHz]
P
o
w
e
r 
[m
W
]
 
 
Cap Driven
CML
 
                                            (a)                                                              (b) 
 
Fig. 5.3. Simulated comparison of CML and capacitively-driven clock distribution over 
a 2mm distance: (a) output swing versus frequency, (b) power versus frequency. 
 
 
The 65nm CMOS simulation results of Fig. 5.3 show that, relative to CML clock 
distribution, this capacitvely-driven approach offers 1.6X bandwidth extension at -1dB 
frequency and 78.7 % power savings when distributing a differential 4 GHz clock over a 
2 mm distance. Also, the power of the capacitively-driven approach reduces significantly 
 98 
 
at lower clock frequencies. This provides the potential for further power savings at a 
given data rate with an increased multiplexing-factor transmitter, i.e. quarter-rate, 
provided that there is efficient multi-phase clock generation and low-to-high-swing 
conversion at the local transmit channels. 
V.2.2 Voltage-mode transmitter equalization 
 
2Zo
VREF
n
Segment
Selection
Logic n
DVDD
X[n]
X[n-1]
n
n
n
n
X[n-1]
X[n]
1-2α
1
X[n] ≠ X[n-1] X[n] = X[n-1]
-1+2α
-1
Vppd,max
Vppd,min
1UI NUI
 
Fig. 5.4. 2-tap FIR equalization in low-swing voltage-mode drivers. 
 
 
While it is relatively easy to implement FIR equalizer structures at the transmitter by 
summing the outputs of parallel current-mode stages weighted by the filter tap 
coefficients onto the channel and a parallel termination resistor [57], voltage-mode 
implementations are more difficult due to the series termination control. As shown in Fig. 
5.4, these voltage-mode topologies often set the equalizer taps’ weighting via output 
stage segmentation [55], [58], [80], [85]. One approach is to distribute the output 
segments among the main and post-cursor taps to form a voltage divider that produces 
the four signal levels necessary for a 2-tap FIR filter [55]. 
 99 
 
Here all segments are in parallel during a transition (X[n] ≠ X[n-1]) to yield the 
maximum signal level and the post-cursor segments shunt to the supplies to produce the 
de-emphasis level for run lengths greater than one (X[n]=X[n-1]). As ideally all the 
segments have equal conductance, a constant channel match is achieved independent of 
the equalizer setting. However, shunting the post-cursor segments to the supplies results 
in dynamic current being drawn from the regulator powering the output stage and a 
significant increase in current consumption with higher levels of de-emphasis [85]. To 
address this, adding a shunt path in parallel with the channel can either eliminate 
dynamic current variations [58] or allow for a decrease in current consumption with 
higher levels of de-emphasis [85]. Further power reduction is possible if a constant 
channel match is sacrificed by implementing the different output levels via impedance 
modulation, allowing for minimum output stage current [80]. Here all segments are on 
during a transition to yield the maximum signal level, while for run lengths greater than 
one the post-cursor segments are tri-stated to generate a higher output resistance and 
produce the de-emphasis level. 
While impedance-modulated equalization may yield the best signaling current 
consumption, the output stage segmentation associated with this and other approaches 
can result in significant complexity and power consumption in the predriver logic. 
Overall, this predriver dynamic power, which increases with data rate and equalizer 
resolution, should be addressed in order to not diminish the benefits offered by a 
voltage-mode driver. 
 
 100 
 
V.3. Multi-Channel Transmitter Architecture  
Fig. 5.5 shows a conceptual diagram of the proposed multi-channel transmitter 
architecture, with 10 quarter-rate transmitter channels spanning across a 2mm distance. 
All transmitters share both a global regulator to set the nominal output swing, and two 
analog loops to set the driver output impedance during the maximum and de-emphasized 
levels of the implemented 2-tap FIR equalizer. Utilizing a single global voltage regulator 
to provide a stable bias signal that is distributed to all the channels provides for 
independent fast power-state transitioning of each output driver. The sharing of these 
global analog blocks allows for their power to be amortized by the channel number and 
improves the overall I/O energy efficiency. 
 
CLK
TXData
CLK
TXData
ILO Global 
Impedance
Control Loop
&
De-emphasis
Impedance 
Modulation Loop
Global 
Voltage 
Regulator
2
m
m
TX bundle [1]
TX bundle [5]
PC
PC
GCLK
Cs
C
w
 
Fig. 5.5. Multi-channel transmitter architecture. 
 
 101 
 
In order to reduce dynamic power, low-swing clocks are maintained throughout the 
global distribution and local generation of the clocks used by the quarter-rate 
transmitters. Rather than distributing four quarter-rate clocks globally, which offers 
challenges in maintaining low static phase errors and power consumption, a differential 
quarter-rate clock is distributed globally in a repeater-less manner via capacitively-
driven low-swing wires [87]. A voltage swing of  
       
  
     
        (5-1) 
is present on the long global distribution wires from the voltage divider formed by the 
series coupling capacitor, Cs, and the clock wire capacitance, Cw. The Cs value is set for 
a swing of Vdd/4, which is 250mV for the 4GHz clocks used in 16Gb/s operation with a 
1V supply. These low-swing distributed clocks are then buffered on a local basis by AC-
coupled inverters with resistive feedback for injection into a two-stage injection-locked 
oscillator (ILO) which produces four full-swing quadrature clocks that are shared by a 
two-channel bundle. As quarter-rate transmit architectures are sensitive to timing offsets 
amongst the four clock phases, particularly with the aggressive supply scaling employed 
in this low-power design, digitally-calibrated buffers controlled by an automatic phase 
calibration (PC) loop produce the final clocks that control the data serialization. 
 102 
 
Dummy
Injection Lock Oscillator
GCLK
2mmCs
I
IB Q
QB
IN  INB
OUTOUTB
ENBCLK
EN_VCTL
1V
VCTL
ENCLK ENBCLK
EN_VCTL
Cw
 
Fig. 5.6. Capacitively-driven global clock distribution and local quadrature-phase 
generation injection lock oscillator 
 
 
 
 
Fig. 5.6 shows the two-stage ILO schematic, where quadrature output phase spacing 
is improved by AC-coupling the injection clocks, adding dummy injection buffers, and 
optimizing the locking range via digital control of the injection buffers' drive strength. 
The ILO employs cross-coupled inverter delay cells which, relative to current-starved 
delay cell-cells [53], generate a rail-to-rail output swing with better phase spacing over a 
wide frequency range. Coarse frequency control is achieved via a dedicated power 
supply and finely set using the analog voltage, EN_VCTL, that sets the pull-down 
strength. This analog control voltage can also be rapidly switched between GND and its 
 103 
 
nominal value, enabling fast power-up/shut-down of the clock signals on a two-channel 
resolution. 
V.4. Transmitter Channel Design 
V.4.1. Transmitter block diagram with digital phase calibration 
Fig. 5.7 shows the transmitter block diagram with the proposed phase calibration 
module configured for 16Gb/s. 27 - 1 pseudorandom binary sequence (PRBS) wherein 
eight bits of parallel input data are serialized into two stages, an initial 8:4 and final 4:1 
multiplexer. The output stage includes two sets of 4:1 MUXes to implement a two-tap 
FIR equalization filter, with the top MUX driving the main-cursor voltage-mode driver 
and the bottom MUX driving the post cursor. The serialized data passes through a level-
shifting pre-driver [53] that boosts the voltage swing by a fully scalable supply value, 
DVDD, above the nominal nMOS threshold voltage, enabling reduced transistor sizing 
for a given impedance value. In addition, post-cursor pre-drivers can disable to save 
power when equalization is not applied. The clocks which synchronize the serialization 
are buffering by local clock distribution block. Two of these phases are divided by two 
to perform 8:4 multiplexer operation, and four phases are used to generate a 4 phase 
pulse clock which generates main data and 1 UI delay data without overhead digital 
circuitry for multiplexing timing margin in data path. 
 104 
 
CounterFSM
2 -1
7
PRBS
FIXED
Pattern
Gen
8
8
:4
 M
U
X
DIV/2
4
4
:1
 M
U
X
4
PI PQ PIB PQB
4
:1
 M
U
X
4
PQ PIB PQB PI
X[n]
X[n]
X[n-1]
X[n-1]
Diff
Voltage 
Mode
Output
Driver
With EQ
ILO
CLK
4
VREG0
10
4
TX0
External Async CLK
TX1
CAP 5bitsDuty Cycle 5bits Corrector
Sample and 
Count ‘1’ for 
Pattern A
Sample and 
Count ‘1’ for 
Pattern B
Compare
T1 & T2  
Adjust 
Control Code
Pass
Fail
 Next Step
Quadrature Correction
Duty Cycle Correction
T1
T2
Pattern A :
“1010”
Pattern B :
“0101”
Pattern A :
“1100”
Pattern B :
“0011”
T1
T2
BUF&LS
BUF&LS
EQ_ENB
4
 
Fig. 5.7. Transmitter block diagram with clock phase calibration details. 
As the data rate increases to 16 Gb/s, output data eye is highly sensitive to 
deterministic jitter due to the static phase error and duty cycle distortion of quadrature 
clocks. To solve this problem, both a delay and duty cycle tuning unit are implemented 
in the proposed transmitter for mismatch compensation in the clock path. An offline 
phase calibration module is also implemented to realize close-loop calibration during the 
initialization of the transmitter. During the calibration process, the transmitter 
continuously is generating a fixed data pattern which contains the phase error 
information. The output data sequence for fixed pattern “1100” is equal to a 4 GHz clock 
 105 
 
which contains the duty cycle distortion information of a quadrature clock. Similarly 
pattern “1010” is equal to an 8 GHz clock whose duty cycle is determined by the phase 
difference between quadrature clocks. This fixed output data is sampled by a comparator 
by an external 100 MHz asynchronous clock. After counting and comparing the number 
of ‘1’ of the comparator output for two complementary patterns, the phase error 
information can be extracted for close-loop calibration. 
V.4.2. Output driver 
The low swing and low common-mode voltage-mode driver is comprised only of 
nMOS transistors in order to improve data rate and power efficiency [56]. To achieve 2-
tap impedance modulation equalization and transmitter impedance control, extra nMOS 
transistors are stacked in the main data transistor. These stacking transistors decouple for 
high resolution control equalization coefficients by controlling the gate voltage, VzmeqUP 
and VzmeqDN and termination impedance by controlling the gate voltage of nMOS, 
VzceqUP and VzceqDN, from high speed data path shown in Fig. 5.8. 
 106 
 
X[n-1]
VzceqUP
VzcUP
X[n]
X[n]
X[n-1]
VzceqDNVzcDN
X[n] = X[n-1] X[n] ≠ X[n-1]
VzMeqUP
VzMeqDN
VREG0
TX0 VM Output Driver
Rrx
M1
M2M3 M5
M4
EQ Mode
NO EQ Mode
  
 
Fig. 5.8. Transmitter output driver circuitry. 
 
 
 
 
Furthermore, in order to reduce power consumption for low data rate operation or low 
loss channel, when equalization is unnecessary, extra stack transistors for equalization 
are disabled and only stack transistors for impedance control and main data transistors 
operate. The impedance of these transistors is controlled by a global impedance control 
loop, which provides analog voltage both VzcUP and VzcDN, and it matches channel 
characteristic impedance. When transmitter operates in equalization mode, if the main 
cursor is not equal to post cursor, the transmitter impedance at both pull up and pull 
down remains at Zo, that is, 
 107 
 
                       (5-2) 
The resistance of RM4 is controlled by impedance control loop and normally the effect 
of RM3  can be ignored due to RM3>> (RM4 + RM5). The amount of the total current is 
        
       
   
 (5-3) 
Actually de-emphasis occurs by manipulating the resistance of RM3 when main cursor 
and post cursor are identical. The total transmitter impedance increases as de-emphasis 
coefficient increases, which is shown in the following equation 
            
      
      
   (5-4) 
This reduces the current consumption and output voltage swing, which is shown as 
        
       
   
       (5-5) 
                        (5-6) 
where, α is equalization coefficient and Zo is channel characteristic impedance. 
Therefore, the impedance modulated equalization gives the best output stage power 
efficiency compared to previous reported works. 
V.4.3. Global impedance control and modulation loop 
In order to control transmitter termination impedance and manipulate equalization 
impedance, both the global impedance control and modulation loop are utilized in the 
proposed transmitter shown in Fig. 5.9.  The first global impedance controller loop 
provides the analog voltages, VzcUP and VzcDN, in multiple output drivers for controlling 
 108 
 
transmitter termination pull-up and pull-down impedance independently during 
equalization disable mode. Especially in the equalization mode, it is used for controlling 
the maximum differential output swing when the main cursor is not equal to post cursor 
for generating equivalent voltages, VzceqUP and VzceqDN, shown in Fig 5.9(a). In this 
design an external supply was used for the adjustable reference voltage, VREF, which sets 
the output driver differential maximum swing, and an on-chip resistive divider generates 
3/4 VREF and 1/4 VREF signals for the impedance control loop. After level shifter, 
predriver power supply voltage, VLS = DVDD+Vthn, is used to imitate the data path 
voltage by replica bias circuit [53].  
 
 
VLS
100Ω
VREF
VLS
EQEN
EQEN
Global D-EMP Impedance Modulation Loop
VzMeqUP
VzMeqDN
1/4VREF
VLS
100Ω
VREF
3/4VREF
VLS
VLS
VREF
VLS
EQENB
EQEN
EQEN
EQENB
VzcUP
VzcDN
Global Impedance Control Loop
VzceqDN
VzceqUP
EQ Mode
NO EQ Mode
 
EQ Mode - ON
NO EQ Mode - OFF
 
VzMeqUP
VzMeqDN
DVDD
VLSIleakage
Replica
Bias
3/4VREF - 1/2αVREF
1/4VREF + 1/2αVREF
            (a)                                                                 (b) 
Fig. 5.9. Global output driver control  (a) output driver termination impedance control 
loop (b) output driver de-emphasis impedance modulation loop. 
 
 
 109 
 
To control output driver equivalent resistance by requiring de-emphasis level, two 
reference voltages require in impedance modulation loop which is shown in following 
equations 
              
        
 
 
 
     
 
 
      (5-7) 
        
        
 
 
 
     
 
 
      (5-8) 
This reference voltages which are set by global DAC represent the high and low 
voltage level during de-emphasis, and the dual loop produces two DC voltages, VzmeqUp 
and VzmeqDN to control transmitter output drivers that pull-up and pull-down impedance 
in Fig 5.9(b). This proposed configuration allows achievement of a fine equalization 
resolution, which only depends on low frequency global DAC performance; therefore, 
the highest operation speed pre-driver complexity are significantly reduced compared to 
segmented equalization operation [80]. Furthermore, both the global impedance control 
and modulation loop power and circuitry overhead amortize among the number of 
transmitter channels. 
V.4.4. Fast switching replica based voltage regulator 
To control transmitter output swing and improve supply-noise rejection, source 
follower output stage with only nMOS pairs output driver has been utilized. In low 
swing and low common-mode operation, this output stage suffers less headroom issue 
compared to current mode driver, except for error-amplifier. Previous work shows this 
output stage configuration which includes a pseudo-differential error-amplifier with 
 110 
 
negative resistance gain boosting  topology reduced significantly output stage signaling 
power as applying 0.65 V supply [53]. However, it had less output swing tuning range, 
100 to 200 mVppd due to error amplifier headroom issues, which limits the 0.65 V power 
supply operation. Therefore, the proposed transmitter employs a dual supply replica 
based linear regulator, and furthermore, the fast power state transitioning capability was 
added to this regulator in order to reduce multi-channel link average power shown in 
Fig. 5.10. Of course, dual supply increases circuitry complexity due to extra switching 
buck convertor, however, in multi-data channel system, this overhead will be amortized. 
Besides, its power and area saving benefit is further enhanced by replica-based 
architecture as sharing the error amplifier and replica output stages.  
In order to achieve higher gain bandwidth and more output swing tuning ranges, 
which is 100 mVppd ~ 300 mVppd, the nominal power supply, 1 V, for 65 nm CMOS 
technology was applied in error amplifier as applied to 0.5 V power supply in source 
follower output stage in replica output driver, and this regulator was shared by two 
transmitters. 
 111 
 
VREF
Replica
TX
Output
Driver
0.5 V
1 V
TX
Output
Driver
0.5 V
 ENOD
 ENDCAP
TX0
TX1
Cdec
ENTX
 ENDCAP
 ENOD
2
 ENOD
 ENDCAP
 
Fig. 5.10. Fast power on-off dual supply replica based linear voltage regulator. 
The fast power state transition with minimum latency is another essential feature 
needed for the multi-data-channel system to achieve energy efficiency and to manipulate 
the number of active data channels. The main fast power switching limitation of voltage-
mode drivers with voltage regulators is their slow setting time due to decoupling 
capacitor. To overcome this limitation, the replica based open loop output state was 
utilized in the proposed transmitter with different a switching time, 550 ps, between the 
output stage’s transistor and decoupling capacitor.  Fig. 5.11 shows how the power 
supply of the output driver with proposed scheme settles much faster than conventional 
voltage regulator configuration in both power-down and power-up stages.  
 112 
 
 
Fig. 5.11. Regulator power state transient simulation comparison with and without 
proposed fast power state transition. 
   
 
V.5. Experimental Results 
 
 
1
m
m
1mm
2mm
GCLK
DIST
PHASE CAL
FSM &
 Scan Chains 
TX0
TX1
ILO VR
Com
Com
GICGIM
BIAS
Global 
Impedance 
Modulation
Global 
Impedance 
Control
Comparator
Voltage
Regulator
Level Shifter
4:1 MUX &
Pre-Driver
Pulse
CLK
Output Driver
PRBS+FIX Pattern GEN,8:4 MUX and DIV/2
LCLK BUFF 
TX0
 
 
Fig. 5.12. Micrograph of the 2-channel transmitter with on-chip 2mm clock distribution. 
0 30 60 90 120
0
0.05
0.1
0.15
0.2
0.25
Time [ns]
V
R
E
G
 [
V
]
 
 
VREG FF
VREG TT
VREG SS
VREG Conv
 113 
 
Without Phase Calibration
DVDD = 0.75V at 8Gb/s
144ps 103ps 133ps 120ps
With Phase Calibration
DVDD = 0.75V at 8Gb/s
121ps 126ps 126ps 127ps
Without Phase Calibration
DVDD = 1V at 16Gb/s
With Phase Calibration
DVDD = 1V at 16Gb/s
58.7ps 64.1ps 60.3ps 66.9ps
61.2ps 61.2ps 63ps 64.6ps
 
(a)                                                       (b) 
 
Fig. 5.13. Four eye diagrams without and with phase calibration (a) at 8Gb/s and (b) 
16Gb/s after 2" FR4 trace. 
 
 
The transmitter was fabricated in a 65nm CMOS general purpose process. As shown 
in the die micrograph of  Fig. 5.12, a total testing chip was implemented in a 1 x 1 mm2 
area, which included a phase calibration finite state machine with scan chains, a 2 mm 
cock distribution wire, one injection lock ring oscillator, two transmitters with 
comparator for phase calibration, global impedance control loops, and voltage 
regulators. While chip area constrains prevented a full 10-channel prototype, the concept 
was accurately emulated by placing a two-transmitter bundles at the end of a snaked on-
chip 2mm clock distribution. The total active two transmitter size is 0.012 mm2, while 
the combined area of the injection lock oscillator, global impedance control and 
modulation loop, bias circuitry, and voltage regulator size are 0.014 mm2.  A chip-on-
 114 
 
board test setup was utilized, with the die directly wire-bonded to the FR4 board. Fig. 
5.13 shows how the proposed digital phase calibration improves the eye width variation 
from an uncorrected 28.5% to 4.7% at 8Gb/s operation and 13.1% to 5.4% at 16Gb/s 
operation.  
 
 
3 6 9 12
50
100
150
200
250
300
350
400
De-emphasis [dB] 
Im
p
e
d
a
n
c
e
 [
o
h
m
s
]
 
 
Zeq (Ideal)
ZeqUP (Measured)
ZeqDN (Measured)
TXVmax = 300mVppd with 3, 6, 9, and 12dB EQ
 
(a)                                                                  (b) 
 
Fig. 5.14. (a) Measured equalization impedance versus de-emphasis amount with a 
300mVppd output swing, (b) Low-frequency transmitter output waveform with 3dB, 6dB, 
9dB and 12dB equalization. 
 
 
Fig 5.14 (a) shows that the global impedance modulation loop precisely controls the 
required impedance for a given equalization coefficient at less than 7% variation, and 
Fig. 5.14 (b) shows low frequency output patterns with a peak output swing of 300 
mVppd and  3, 6, 9, and 12 dB equalization. 
 115 
 
0 2 4 6 8 10 12
-25
-20
-15
-10
-5
0
Frequency [GHz] 
S
2
1
 [
d
B
]
 
 
5.8 inch FR4+SMA
12 12.2 12.4 12.6 12.8
0
0.1
0.2
0.3
0.4
0.5
Time [ns]
A
m
p
li
tu
d
e
 [
V
]
 
 
 
                                       (a)                                                                (b) 
 
Fig. 5.15. (a) Measured frequency response of 5.8” FR4 trace and interconnect cables 
(b) Channel pulse response at 16 Gb/s (input normalized to 1V). 
 
 
 
 
 
     
16Gb/s with NO EQ 16Gb/s with EQ 50mV 13ps
55mV
33.4ps
                     (a)                                                                (b) 
 
Fig. 5.16. Eye diagrams after 5.8'' FR4 + 0.6 m SMA cable at 16 Gb/s (a) without 
equalization and (b) with equalization. 
 
 116 
 
8Gb/s with NO EQ 40mV 25ps
66ps
53mV
12Gb/s with EQ 40mV 16ps
54mV
45ps
 
(a)                                                                         (b) 
 
Fig. 5.17. Eye diagrams after 5.8'' FR4+0.6 m SMA cable (a) at 8 Gb/s and (b) at 12 
Gb/s 
 
 
The channel frequency response is shown in Fig. 5.15(a), which consists of a 5.8 inch 
FR4 channel and a 0.6 m SMA cable, and it displays 15.5 dB attenuation at 8 GHz, and 
the simulated pulse response shows that the post-cursor ISI dominates in Fig. 5.15(b). 
The transmitter transient performance at 16 Gb/s is verified in the 27-1 PRBS eye 
diagrams with this channel, shown in Fig. 5.16. Fig. 5.16(a) shows a near-closed eye 
diagram due to no transmitter equalization, and Fig. 5.16(b) shows a 55 mVppd and 0.53 
UI eye opening when the impedance-modulation equalization is enabled. In addition, 
Fig. 5.17(a) shows eye diagram with 53 mVppd and 0.53 UI eye opening at 8 Gb/s 
without equalization and Fig. 5.17(b) shows eye diagram with 54 mVppd and 0.54 UI eye 
opening at 12 Gb/s with equalization. As shown in Fig. 5.18(a), the transmitter achieves 
8-16 Gb/s operation at 0.54-0.92pJ/b energy efficiency by optimizing the transmitter's 
scalable supply and output swing for a minimum 50 mVppd eye height and 0.5 UI eye 
width at the channel output, and Fig. 5.18(b) shows power breakdown at 8,12, and 16 
 117 
 
Gb/s. It is clear that the global clocking power and transmitter dynamic power 
significantly increases as the data rate increases. 
 
8 12 16
0.5
0.6
0.7
0.8
0.9
1
Data Rate [Gb/s] 
E
n
e
rg
y
 E
ff
ic
ie
n
c
y
 [
p
J
/b
]
DVDD=0.75V
No EQ
DVDD=0.85V
with EQ
DVDD=1V
with EQ
8 12 16
0
5
10
15
Data Rate [Gb/s]
P
o
w
e
r 
[m
W
]
 
 
TX Dynamic Power
GCLK+ILO Power
TX Static Power
66.6%
18.7%
14.7%
71.4%
12.7%
15.9%
73.5%
18.4%
8.1%
 
(a)                                                                  (b) 
 
Fig. 5.18. Measured transmitter (a) energy efficiency versus data rate and (b) power 
breakdown versus data rate. 
 
 
TXOut with Fix Pattern “11110000"
ILO and Voltage Regulator - OFF
TX Power-OFF Time = 0.5ns
ILO and Voltage Regulator - ON
TX Power-On Time = 2.9ns
TXOut with Fix Pattern “11110000"
 
(a)                                                                             (b) 
 
Fig. 5.19. Measured transient response of the transmitter output under (a) fast power-
down and (b) start-up. 
 
 118 
 
The transmitter power state transition control signal, which controls injection lock 
oscillator and voltage regulator in the output stage, buffers out to measure with delay 
matched cable, and its responses are shown with transmitter output signal in Fig 5.19. 
The measurement results demonstrate that the proposed techniques allow transmitter 
power state transition to be powered down to 0.5 ns and started up to 2.9 ns. 
 
Table 5.1: Transmitter power breakdown at 16 Gb/s 
 
LDO (amortized across 2 TX)& Output 
Driver (300 mVppd with EQ ) 
985 uW 
Serializer, Predrivers, Clocking 10.8 mW 
Global Impedance Control 
& Modulation loop, Bias Circuit 
(amortized across 10 TX) 
 
220 uW 
Global Clocking 
(amortized across 10 TX) 
300 uW 
ILO 
(amortized across 2 TX) 
2.4 mW 
Total Energy Efficiency 0.92 pJ/b 
 
 
Table 5.1 shows the measured transmitter power breakdown at 16 Gb/s. The total 
transmitter energy efficiency is 0.92 pJ/b, and it shows the most dominant power 
consumption is dynamic power consumption at 16 Gb/s operation with 65 nm GP 
technology. Hence it will significantly improve energy efficiency by utilizing advance 
CMOS technology such as 22 nm with proposed transmitter design. Table 5.2 compares 
this work with recent voltage-mode driver with 2-tap equalization, and it demonstrates 
 119 
 
that the proposed transmitter architecture achieves the best energy efficiency even if it 
includes 2mm global clock distribution and operates at 16 Gb/s [58], [80], [85]. 
Furthermore, Table 5.3 shows power transitioning times compared to previous work, and 
this work achieved the fastest power-state transitioning [26], [29], [88]. 
  
 
Table 5.2: Transmitter performance comparisons 
 
 
[58] [80] [85] This Work 
Technology 45 nm 90 nm 65 nm 65 nm 
Supply Voltage 1.08V & 0.93V 1.15V 1.2V 1&0.5V 
Data Rate 7.4 Gb/s 4 Gb/s 10 Gb/s 16 Gb/s 
TX Swing 800 mVppd 0-1 Vppd 
160mV~ 
500mVppd 
100mV~ 
300mVppd 
Channel Loss 
At Nyqu Freq 
Not 
Reported 
-8 ~ -10 dB -13 dB -15.5 dB 
Equalization 2-Tap FIR 2-Tap FIR 2-Tap FIR 2-Tap FIR 
Power 32 mW 8 mW 10 mW 14.7 mW 
Energy 
Efficiency 
4.32 pJ/b 2 pJ/b 1 pJ/b 0.92 pJ/b 
                     
 
 
Table 5.3: Power state transient time comparisons 
 
 
[26] [88] [29] This Work 
Technology 40 nm 40 nm 45 nm 65 nm 
Data Rate 4.3 Gb/s 5.6 Gb/s 10 Gb/s 16 Gb/s 
Power State Transient time <5 ns 8 ns <5ns 0.5ns (Off), 2.9ns (On) 
 
 
 120 
 
V.6. 4:1 Output Multiplexing Transmitter 
Figure 5.20 shows the transmitter block diagram configured for 4:1 output 
multiplexing with the 2-tap equalization which is adding equalization capability from 
previous work [53]. 27 - 1 pseudorandom binary sequence (PRBS) eight bits of parallel 
input data was serialized two stages, an initial 8:4 and final 4:1 output multiplexing 
voltage-mode driver. After the initial 8:4 serializing, the pulser data was generated and 
distributed in four segmented predrivers for output multiplexing, which consisted of 
AND gate, buffer, and level shifter. In addition, to generate post cursor data for a two-
tap equalization, an extra four segmented predrivers were added, and the 1 UI delay data 
implemented by using 90˚ shifting pulse clocks were compared to the main cursor. Also 
the output driver employs 4 segmentations for 4:1 output multiplexing. 
 
 
 
Fig. 5.20. Transmitter 4:1 output multiplexing block diagram with clock phase 
calibration details and output driver circuitry. 
 121 
 
Due to driving fully differential 4-segmented output drivers for output multiplexing 
with 2-tap equalization, the transmitter requires 16 pre-driver segments with level 
shifters which make a significant increase in the transmitter’s active area as shown in Fig. 
5.21. Compared with input multiplexing, output multiplexing transmitter implementation 
increases the active area by 1.5 times. In addition, because of this, the wiring parasitic 
capacitance is dramatically increased so that it causes extra dynamic power consumption. 
For instance, the transmitter output parasitic capacitance is 5 times higher than the input 
multiplexing implementation. 
 
 
 
 
Fig. 5.21. 4:1 output multiplexing transmitter layout. 
 
 122 
 
Actual measurement performance comparison shows output multiplexing transmitter 
implementation is approximately 20% higher in digital power than input multiplexing 
architecture with the same power supply as shown in Fig 5.22(a). However, output 
multiplexing eye opening width increases 12.7% with the same power supply in Fig 
5.22(b) because it has less data dependent delay. In other words, on chip bandwidth 
limitation increases the deterministic jitter in input multiplexing transmitter which 
causes a reduction in the eye width opening. To reduce further power supply voltage 
level in output multiplexing transmitter, the eye opening of both timing margin and 
voltage margin dramatically reduce because of setup time violation during multiplexing. 
 
0.8 0.85 0.9 0.95 1
7
8
9
10
11
12
13
DVDD [V] 
D
ig
it
a
l 
P
o
w
e
r 
[m
W
]
 
 
Output MUX TX
Input MUX TX
0.8 0.85 0.9 0.95 1
40
45
50
55
60
DVDD [V] 
E
y
e
 W
id
th
 [
p
s
]
 
 
Output MUX TX
Input MUX TX
                                         (a)                                                                  (b) 
 
Fig. 5.22. Measured 4:1 output MUX and input MUX transmitter architecture 
performance comparisons (a) Digital power comparison versus DVDD and (b) eye 
opening width versus DVDD at 12 Gb/s. 
 
 123 
 
 
 
Fig. 5.23. Digital power comparison between 4:1 output MUX transmitter architecture 
and input MUX transmitter architecture versus eye opening width at 12 Gb/s. 
 
 
 
 
 Fig 5.23 shows that digital power versus eye opening width affect both output and 
input multiplexing transmitters at 12 Gb/s. At 0.5 UI eye opening width, input 
multiplexing has better power efficiency; however, if the receiver requires 0.6 UI eye 
opening width, the transmitter has to employ an output multiplexing scheme. The 
relaxing 0.1 UI time margin at receiver can save 24% digital power consumption after 
utilizing input multiplexing architecture. 
V.7. Summary 
This section presented the utilization of forwarded-clock I/O architecture in 
transmitter deign as sharing clocking and common analog control block to achieve 
maximum power efficiency with fast power state transition in parallel interface. It was 
0.5 0.55 0.6 0.65
6
7
8
9
10
11
12
13
Eye Opening Width [UI]
D
ig
it
a
l 
P
o
w
e
r 
[m
W
]
 
 
Output MUX TX
Input MUX TX
 124 
 
demonstrated that each transmitter can power on at 2.9 ns and power off at 0.5 ns by ILO 
and replica based dual power supply voltage regulator, which allow modulating 
aggressive power-efficient bandwidth. Also, the clocking power considerably reduced 
with capacitive driven low-swing quarter-rate global clock distribution and multi-phase 
generation by ILO, which were shared by two transmitters. This combination eliminated 
the number of buffer stages in local clock distribution which had a low to high clock 
swing conversion block even applying low supply power supply due to off-ship phase 
calibration. Furthermore, the transmitter equalization settings by global impedance 
modulation loop eliminated segmentation in the voltage-mode output stage, which 
allowed significant reduction in pre-driver complexity and dynamic power. Also it 
showed impedance modulated equalization with a dual power supply voltage regulator 
offers the minimum static average current in the output driver. These techniques allow 
for efficient transmitter equalization in high data rate operation at the energy-efficiency 
levels demanded by future computing system. 
 
 
 
 
 
 
 125 
 
VI. CONCLUSION AND FUTURE WORK 
VI.1.  Conclusion 
Without significantly improving energy efficiency in serial links, it is impossible to 
continue aggressively improving microprocessor I/O bandwidth due to thermal power 
limitation, and it limits mobile processing performances to support the various advanced 
multi-media features due to battery operation. This dissertation has developed techniques 
for improving the energy efficiency in multi-Gb/s serial I/O transceivers to support the 
trend of serial links. 
 
2 4 6 8 10 12
10
-1
10
0
10
1
10
2
Data Rate [Gb/s]
E
n
e
rg
y
 E
ff
ic
ie
n
c
y
 [
p
J
/b
]
This Work
 
Fig. 6.1. Energy efficiency versus data rate comparison with serial I/O transceiver.  
 
 126 
 
The first work, the proposed low-power forwarded-clock I/O transceiver architecture 
achieved 4.8-8 Gb/s at 0.47-0.66 pJ/b energy efficiency for VDD=0.6-0.8 V in a general 
purpose 65 nm CMOS process. In order to improve the transmitter dynamic energy 
efficiency, a passive poly-phase filter was utilized to produce the 4-phase clocks that 
allows a 4:1 output-multiplexing voltage-mode driver, which employs pulse clock 
operation to reduce stacking transistor and eliminates clock buffering in output driver. In 
addition, to reduce both signaling power and accurately controlling transmitter output 
swing, 100-200 mVppd, a low-voltage pseudo-differential regulator was implemented, 
which employs a partial negative-resistance load for improved low frequency gain. In 
the forwarded-clock receiver, it demonstrated that the use of an injection-locked 
oscillator de-skew and 1:8 de-multiplexing ratio receiver architecture can operate at low 
supply voltages to achieve a high receiver energy efficiency. As shown in Fig. 6.1, the 
proposed architecture has the best energy efficiency performance compared to 
previously developed serial I/O transceivers. In other words, it demonstrates that 
proposed low-power design techniques are the solution for future I/O systems.  
Low-power high-speed serial I/O transmitters, which include equalization to 
compensate for channel frequency dependent loss are required to meet the aggressive 
link energy efficiency targets of future systems. A low power serial link transmitter 
equalization is designed, and it shows that the utilization of current-mode equalization 
decouples the equalization settings and termination impedance, allowing for a significant 
reduction in pre-driver complexity relative to segmented voltage-mode drivers. In 
addition, further reductions in dynamic power dissipation are achieved through scaling 
 127 
 
the serializer and local clock distribution supply with data rate. These transmitter 
equalization design techniques allows transmitters to achieve 6 Gb/s operation at 1.26 
pJ/b energy efficiency with 300 mVppd output swing and 3.72 dB equalization in 90 nm 
CMOS process, and this measurement result successfully demonstrates that this 
research’s hybrid equalization technique is an efficiently transmits equalization over a 
wide range of data rates, supply voltages, and output swing levels.  
The final work describes a low power up to 16 Gb/s operation serial link transmitter 
design with a 2-tap non-segmented equalization method by global impedance 
modulation, which offers low complexity and dynamic power dissipation in the 
predriver as well as low static-power dissipation in the output driver. Besides, the 
transmitter utilizes a voltage-mode output stage with a fast power state transition replica 
based dual supply linear regulator to reduce transmitter averaging power consumption in 
multi-data rate operation. Also, to reduce significantly clock distribution and buffering 
power both quarter-rate capacitive driven global clock distribution and locally 4-phase 
clock generation by ILO, which were shared by two transmitters with digital phase 
calibration, were employed. These proposed ideas were successfully demonstrated by 
measured results, wherein the transmitter achieved 8-16 Gb/s operation at 0.54-0.92 pJ/b 
energy efficiency with 5.8'' FR4 and 0.6 m SMA cable, and 0.5 ns power-down and 2.9 n 
start-up times in 65 nm CMOS process. 
Accordingly, the proposed architecture and circuitry level design approaches are 
applicable to a wide data rate operation with high energy efficient I/O serial links, and 
provide a potential solution for future microprocessor and mobile application demands.  
 128 
 
VI.2.  Recommendations For Future Work 
As future systems will demand per-pin data rates in excess of 16 Gb/s, it will be 
interesting to consider how those proposed architectures can support and modify to 
support this high data rate and still achieve better energy efficiency.  
Due to on chip interconnection bandwidth limitation, it is hard to distribute high clock 
frequency signals to multi-lanes with low power consumption in long wire. The first 
solution lies in the clock distribution, which will modify sub-rate global clock frequency 
distribution by a capactively-driven method, and the second solution is in the sub-rate 
injection lock oscillator needed to generate a locally multi-phase clock. However, in 
order to obtain sufficient locking range, a pulse clock had to be utilized in sub-harmonic 
injection by extra delay cell and XOR gate [59]. Especially in CMOS logic oscillator 
implementation, it will increase dynamic power consumption due to its low to high 
swing convertor and those extra circuits. Therefore, it is worth the investment to 
eliminate this injection circuitry by applying frequency doublers method or low swing 
pulse clock generation. 
The channel frequency depend loss will increase beyond 16 Gb/s data rate; therefore, 
the architecture would need to be modified to include extra equalization. At the transmit 
side, this can be accomplished in an efficient manner by modifying the existing voltage-
mode driver with current mode equalization if the channel generates many ISI cursors 
and reflection by impedance discontinuity. To achieve better signaling power efficiency 
in this architecture, a differential pair with current mirror implementation has to be 
modified which requires high supply voltage due to headroom issue. For instance, 
 129 
 
pseudo-differential architecture with input driving pre-driver power supply regulation 
[14] can be implemented to operate low power supply because it does not require low 
turn-on impedance in this current mode equalization of differential pairs. Also, common 
source output stages with low voltage linear regulator can be employed to support a 
higher de-emphasis level. Including a multi-stage CTLE topology [48] at the receiver 
would also allow support of higher-loss channels with relatively low overhead. The 
inclusion of these two efficient equalization blocks would allow operation with an 
additional 10-15 dB loss. 
In addition, in order to further reduce the power-on time of the proposed voltage 
regulator, a decoupling capacitor has to connect to VREF voltage during it's power-off 
time period. This will eliminate slow settling time issue, and allow the transmitter to 
have less power-on time.  
 
 
 
 
 
 
 
 
 
 
 130 
 
REFERENCES 
[1] F. O’Mahony, et al., “The Future of Electrical I/O for Microprocessors,” in 
 Proc.2009 Int. Symp. VLSI Design, Automation and Test, pp. 31–34. 
 
[2]  J. Poulton, et al., “A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS,” IEEE J. 
 Solid-State Circuits, vol. 42, no. 12, pp. 2745-2757, Dec. 2007. 
 
[3] Delagi, G. “Harnessing Technology to Advance the Next-Generation Mobile 
User-Experience,” in IEEE ISSCC Dig. Tech. Papers, pp. 18-24, Feb. 2010 
 
[4] M. Meghelli, et al., “A 10Gb/s 5-Tap-DFE/4-Tap FFE Transceiver in 90nm 
CMOS,” in IEEE ISSCC Dig. Tech. Papers, pp. 213-222, Feb. 2006 
 
[5] K. Wong, et al., “A Serial-Link Transceiver with Transition Equalization,” in 
IEEE ISSCC Dig. Tech. Papers, pp. 223-232, Feb. 2006 
 
[6] Y. Moon, et al., “A Quad 6Gb/s Multi-rate CMOS Transceiver with TX 
Rise/Fall-Time Control,” in IEEE ISSCC Dig. Tech. Papers, pp. 233-242, Feb. 
2006 
 
[7] E. Prete, et al., “A 100mW 9.6Gb/s Transceiver in 90nm CMOS for Next-
Generation Memory Interfaces,” in IEEE ISSCC Dig. Tech. Papers, pp. 253-262, 
Feb. 2006 
 
[8] B. Casper, et al., “A 20Gb/s Forwarded Clock Trasnceiver in 90nm CMOS,” in 
IEEE ISSCC Dig. Tech. Papers, pp. 263-272, Feb. 2006 
 
[9] J. Kenney, et al., “A 9.95 to 11.1Gb/s XFP transceiver in 0.13m CMOS,” in 
IEEE ISSCC Dig. Tech. Papers, pp. 864-873, Feb. 2006 
 
[10] B. Casper, et al., “A 20Gb/s Embedded Clock Transceiver in 90nm CMOS,” in 
IEEE ISSCC Dig. Tech. Papers, pp. 1334-1343, Feb. 2006 
 
[11] G. Balamurugan, et al., “A Scalable 5-15Gbps, 14-75mW Low Power I/O 
Transceiver in 65nm CMOS,” IEEE Symp. on VLSI Circuits, pp. 270-271, June. 
2007. 
 
[12] M. Harwood, et al., “A 12.5Gb/s SerDes in 65nm CMOS Using a Baud-Rate 
ADC with Digital Receiver Equalization and Clock Recovery,” in IEEE ISSCC 
Dig. Tech. Papers, pp. 436-591, Feb. 2007. 
 131 
 
[13] T. Masuda, et al., “A 250mW Full-Rate 10Gb/s Transceiver Core in 90nm 
CMOS Using  a Tri-State Binary PD with  100ps Gated Digital Output,” in IEEE 
ISSCC Dig. Tech. Papers, pp. 438-614, Feb. 2007. 
 
[14] R. Palmer, et al., “A 14mW 6.25Gb/s Transceiver 9n 90nm CMOS for Serial 
Chip-to-Chip Communications,” in IEEE ISSCC Dig. Tech. Papers, pp. 440-614, 
Feb. 2007. 
 
[15] Y. Hidaka, et al., “A 4-Channel 3.1/10.3Gb/s Transceiver Macro with a Pattern-
Tolerant Adaptive Equalizer,” in IEEE ISSCC Dig. Tech. Papers, pp. 442-443, 
Feb. 2007. 
 
[16] A. Hayashi, et al., “A 21-Channel 8Gb/s Transceiver Macro with 3.6ns latency in 
90nm  CMOS for 80cm Backplane Communication,” IEEE Symp. on VLSI 
Circuits, pp. 202-203, June. 2008. 
 
[17] J. Nasrullah, et al., “A TeraBit/s-Throughput, SerDes-Based Interface for a 
Third- Generation 16 Core 32 Thread Chip-Multithreading SPARC Processor,” 
IEEE Symp. on VLSI Circuits, pp. 200-201, June. 2008. 
 
[18] J.-K. Kim, et al., “A 40-Gb/s Transceiver in 0.13-m CMOS Technology,” IEEE 
Symp. on VLSI Circuits, pp. 196-197, June. 2008. 
 
[19] N. Nguyen, et al., “A 16-Gb/s Differential I/O Cell with 380fs RJ in an Emulated 
40nm  DRAM Process,” IEEE Symp. on VLSI Circuits, pp. 128-129, June. 2008. 
 
[20] K. Chang, et al., “A 16Gb/s/link, 64Gb/s Bidirectional Asymmetric Memory 
interface Cell,” IEEE Symp. on VLSI Circuits, pp. 126-127, June. 2008. 
 
[21] K. Fukuda, et al., “An 8Gb/s Transceiver with 3x-Oversampling 2-Threshold 
Eye-Tracking CDR Circuit for -36.8dB-loss Backplane,” in IEEE ISSCC Dig. 
Tech. Papers,  pp. 98-598, Feb. 2008. 
 
[22] J. Lee, et al., “A 20Gb/s Duobinary Transceiver in 90nm CMOS,” in IEEE 
ISSCC Dig. Tech. Papers, pp. 102-599, Feb. 2008. 
 
[23] D. Pfaff, et al., “A 1.8W 115Gb/s Serial Link for Fully Buffered DIMM with 
2.1ns Pass-through Latency in 90nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, 
pp. 462-628, Feb. 2008. 
 
[24] H. Wang, et al., “A 21-Gb/s 87-mW Transceiver with FFE/DFE/Lineear 
Equalizer in 65-nm CMOS Technology,” IEEE Symp. on VLSI Circuits, pp. 50-
51, June. 2009. 
 
 132 
 
[25] S. Joshi, et al., “A 12-Gb/s Transceiver in 32-nm Bulk CMOS,” IEEE Symp. on 
VLSI Circuits, pp. 52-53, June. 2009. 
 
[26] R. Palmer, et al., “A 4.3GB/s Mobile Memory Interface With Power-Efficient 
Bandwidth Scaling,” IEEE Symp. on VLSI Circuits, pp. 136-137, June. 2009. 
 
[27] Y. Hidaka, et al., “A 4-Channel 10.3Gb/s Backplane Transceiver Macro with 
35dB  Equalizer and Sign-Based Zero-Forcing Adaptive Control,” in IEEE 
ISSCC Dig. Tech. Papers, pp. 188-189a, Feb. 2009. 
 
[28] Y.-S. Kim, et al., “A 40nm 7Gb/s/pin Single-ended Transceiver with Jitter and 
ISI Reduction Techniques for High-Speed DRAM Interface,” IEEE Symp. on 
VLSI Circuits,  pp. 193-194, June. 2010. 
 
[29] F. O’Mahony, et al., “A 47x10Gb/s 1.4mW/(Gb/s) Parallel Interface in 45nm 
CMOS,” in IEEE ISSCC Dig. Tech. Papers, pp. 156-157, Feb. 2010. 
 
[30] K. Maruko, et al., “A 1.296-to-5.184Gb/s Transceiver with 2.4mW/(Gb/s) Burst-
mode CDR using Dual-Edge Injection-Locked Oscillator,” in IEEE ISSCC Dig. 
Tech. Papers,  pp. 364-365, Feb. 2010. 
 
[31] F. Spagna, et al., “A 78mW 11.8Gb/s Serial Link Transceiver with Adaptive RX 
Equalization and Baud-Rate CDR in 32nm CMOS,” in IEEE ISSCC Dig. Tech. 
Papers, pp. 366-367, Feb. 2010. 
 
[32] K. Fukuda, et al., “A 12.3mW 12.5Gb/s Complete Transceiver in 65nm CMOS,” 
in IEEE ISSCC Dig. Tech. Papers, pp. 368-369, Feb. 2010. 
 
[33] G. Balamurugan, et al., “A 5-to-25Gb/s 1.6-to-3.8mW/(Gb/s) Reconfigurable 
Transceiver in 45nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, pp. 372-373, 
Feb. 2010. 
 
[34] T.O. Dickson, et al., “An 8x10-Gb/s Source-Synchronous I/O System Based on 
High-Density Silicon Carrier Interconnects,” IEEE Symp. on VLSI Circuits, pp. 
80-81,June.2011. 
 
[35] J. Zerbe, et al., “A 5.6Gb/s 2.4mW/Gb/s Bidirectional Link With 8ns Power-
On,” IEEE Symp. on VLSI Circuits, pp. 82-83, June. 2011. 
 
[36] G.-S. Byun, et al., “An 8.4Gb/s 2.5pJ/b Mobile Memory I/O Interface Using 
Simultaneous Bidirectional Dual (Base+RF) Band Signaling,” in IEEE ISSCC 
Dig. Tech. Papers, pp. 488-489, Feb. 2011. 
 133 
 
[37] M. Ramezani, et al., “An 8.4mW/Gb/s 4-Lane 48Gb/s Multi-Standard-Compliant 
Transceiver in 40nm Digital CMOS Technology,” in IEEE ISSCC Dig. Tech. 
Papers, pp. 352-353, Feb. 2011. 
 
[38] A. Joy, et al., “Analog-DFE-Based 16Gb/s SerDes in 40nm CMOS That 
 Operates Across 34dB Loss Channels at Nyquist with a Baud Rate CDR and 
 1.2Vpp Voltage-Mode  Driver,” in IEEE ISSCC Dig. Tech. Papers, pp. 350-351, 
 Feb. 2011. 
 
[39] S. Quan, et al., “A 1.0625-to-14.025Gb/s Multimedia Transceiver with Full-rate 
 Source-Series-Terminated Transmit Driver and Floating-Tap Decision-Feedback 
 Equalizer in 40nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, pp. 348-349, Feb. 
 2011. 
 
[40] Y. Hidaka, et al., “A 4-Channel 10.3Gb/s Transceiver with Adaptive Phase 
 Equalizer for 4-to-41dB Loss PCB Channel,” in IEEE ISSCC Dig. Tech. Papers, 
 pp. 346-347, Feb. 2011. 
 
[41] R. Inti, et al., “A Highly Digital 0.5-4Gb/s 1.9mW/Gb/s Serial-Link Transceiver 
Using Current-Recycling in 90nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, 
pp. 152-153, Feb. 2011. 
 
[42] G. Ono, et al., “10:4 MUX and 4:10 DEMUX Gearbox LSI for 100-Gigabit 
 Ethernet Link,” in IEEE ISSCC Dig. Tech. Papers, pp. 148-149, Feb. 2011. 
 
[43] M.-S. Chen, et al., “A 40Gb/s TX and RX Chip Set in 65nm CMOS,” in IEEE 
 ISSCC  Dig. Tech. Papers, pp. 146-147, Feb. 2011. 
 
[44] N. Kocaman, et al., “11.3Gb/s CMOS SONET-Compliant Transceiver for Both 
 RZ and NRZ Applications,” in IEEE ISSCC Dig. Tech. Papers, pp. 142-143, 
 Feb. 2011. 
 
[45] A. Agrawal, et al., “A 19Gb/s Serial Link Receiver with Both 4-Tap FFE and 5-
 Tap DFE Functions in 45nm SOI CMOS,” in IEEE ISSCC Dig. Tech. Papers, 
 pp. 134-135, Feb. 2012. 
 
[46] Y.-S. Kim, et al., “An 8Gb/s Quad-Skew-Cancelling Parallel Transceiver in 
 90nm CMOS  for High-Speed DRAM Interface,” in IEEE ISSCC Dig. Tech. 
 Papers, pp. 136-137, Feb. 2012. 
 
[47] A. Amirkhany, et al., “A 4.1pJ/b 16Gb/s Coded Differential Bidirectional 
 Parallel Electrical Link,” in IEEE ISSCC Dig. Tech. Papers, pp. 138-139, Feb. 
 2012. 
 134 
 
[48] J. Bulzacchelli, et al., “A 28Gb/s 4-Tap FFE/15-Tap DFE Serial Link 
 Transceiver in 32nm SOI CMOS Technology,” in IEEE ISSCC Dig. Tech. 
 Papers, pp. 324-325, Feb. 2012. 
 
[49] J. Savoj, et al., “A Wide Common-Mode Fully-Adaptive Multi-Standard 
 12.5Gb/s Backplane Transceiver in 28nm CMOS,” IEEE Symp. on VLSI 
 Circuits, pp. 104-105, June. 2012. 
 
[50] T. Ali, et al., “A 100+ meter 12Gb/s/Lane Copper Cable Link Based on Clock-
 Forwarding,” IEEE Symp. on VLSI Circuits, pp. 108-109, June. 2012. 
 
[51] R. Reutemann, et al., “A 4.5mW/Gb/s 6.4Gb/s 22+1-lane Source Synchronous 
 Receiver Core with Optional Cleanup PLL in 65nm CMOS,” IEEE J. Solid-State 
 Circuits, vol. 45, no. 12, pp. 2850-2860, Dec. 2010. 
 
[52]  K. Hu, et al., “A 0.6 mW/Gb/s, 6.4-7.2 Gb/s Serial Link Receiver Using Local 
 Injection-Locked Ring Oscillators in 90 nm CMOS,” IEEE J. Solid-State 
 Circuits, vol. 45, no. 4, pp. 899-908, Apr. 2010. 
 
[53]   Y.-H. Song, et al., “A 0.47-0.66 pJ/bit, 4.8-8 Gb/s I/O  Transceiver in 65nm 
 CMOS," IEEE J. Solid-State Circuits, vol. 48, no. 5, pp. 1276-1289, May. 2013. 
 
[54]  B. Casper, et al., “Clocking Analysis, Implementation and Measurement 
Techniques for High-Speed Data Links – a Tutorial,” IEEE Trans. Circuits Syst. 
I, Reg. Papers, vol. 56, no. 1, pp. 17-39, Jan. 2009. 
 
[55] K.-L.Wong, et al., “A 27-mW 3.6-Gb/s I/O Transceiver,” IEEE J. Solid-State 
 Circuits, vol. 39, no. 4, pp. 602–612, Apr. 2004. 
 
[56]  J. Poulton, et al., “A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS,” IEEE J. 
 Solid-State Circuits, vol. 42, no. 12, pp. 2745-2757, Dec. 2007. 
 
[57]  G. Balamurugan, et al., “A Scalable 5-15Gbps, 14-75mW  Low Power  I/O 
 Transceiver in 65nm CMOS,” IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 
 1010-1019, Apr. 2008. 
 
[58]  W. D. Dettloff, et al., “A 32 mW 7.4 Gb/s Protocol-Agile Source-Series 
Terminated Transmitter in 45 nm CMOS SOI,” in IEEE ISSCC Dig. Tech. 
Papers, pp. 370-371, Feb. 2010.  
 
[59]  M. Hossain, et al., “7.4 Gb/s 6.8 mW Source Synchronous Receiver in 65 nm 
 CMOS,” IEEE Journal of Solid-State Circuits, vol. 46, no. 6, pp. 1337-1348, 
 Jun. 2011. 
 
 135 
 
[60]  F. O’Mahoney, et al., “A 27 Gb/s Forwarded-Clock I/O Receiver Using an 
 Injection-Locked LC-DCO in 45 nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, 
 pp. 452–453, Feb. 2010. 
 
[61]  K. Hu, et al., “0.16-0.25 pJ/bit, 8Gb/s Near-Threshold Serial Link Receiver with 
Super-Harmonic Injection Locking”, IEEE Journal of Solid-State Circuits, vol. 
47, no. 8, pp. 1842-1853, Aug. 2012. 
 
[62]  W. Dally, et al., Digital Systems Engineering, Cambridge, U.K.: Cambridge 
 University Press, 1998. 
 
[63]  S. Gondi, et al., "Equalization and Clock and Data Recovery Techniques for 10-
 Gb/s CMOS Serial Links," IEEE Journal of Solid-State Circuits, vol. 42, pp. 
 1999-2011, Sept. 2007. 
 
[64]  D. Schinkel et al., “A Double-Tail Latch-Type Voltage Sense Amplifier with 
 18ps Setup+Hold Time,” in IEEE ISSCC Dig. Tech. Papers, pp. 314-605, Feb 
 2007. 
 
[65]  L. Nielsen, et al., “Low-Power Operation Using Self-Timed Circuits and 
Adaptive Scaling of Supply Voltage,” IEEE Trans. VLSI Syst., vol. 2, pp. 391–
397, Dec. 1994 
 
[66]  A. Dancy, et al., “Techniques for Aggressive Supply Voltage Scaling and 
Efficient Regulation,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC), 
1997, pp. 579–586. 
 
[67]  V. Gutnik, et al., “Embedded Power Supply for Low Power DSP,” IEEE Trans. 
 VLSI Syst., vol. 5, pp. 425–435, Dec. 1997. 
 
[68]  W. Namgoong, et al., “A High-Efficiency Variable-Voltage CMOS Dynamic 
Dc-Dc  Switching Regulator,” in IEEE Int. Solid State CircuitsConf. (ISSCC) 
Dig. Tech. Papers, Feb. 1997, pp. 380–381. 
 
[69]  T. K. Kuroda, et al., “Variable Supply-Voltage Scheme for Low-Power High-
Speed  CMOS Digital Design,” IEEE J. Solid-State Circuits, vol. 33, pp. 454–
462, Mar. 1998. 
 
[70]  T. D. Burd, et al., “A Dynamic Voltage Scaled Microprocessor System,” IEEE J. 
 Solid-State Circuits, vol. 35, pp. 1571–1580, Nov. 2000. 
 
[71]  G.-Y. Wei, et al., “A Variable-Frequency Parallel I/O Interface with Adaptive 
 Power- Supply Regulation,” IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 
 1715-1722, Nov. 2000. 
 136 
 
 
[72]  J. Kim, et al., “Adaptive Supply Serial Links with Sub-1V Operation and Per-Pin 
 Clock Recovery,” IEEE J. Solid-State Circuits, vol. 37, no. 11, pp. 1403-1413, 
 Nov. 2002. 
 
[73]  B. Leibowitz, et al., “A 4.3 GB/s Mobile Memory Interface with Power-
 Efficient Bandwidth Scaling, ” IEEE J. Solid-State Circuits, vol.45, no. 4, pp. 
 889-898, Apr. 2010. 
 
[74]  J. Zerbe, et al., “A 5.6 Gb/s 2.4 mW/Gb/s Bidirectional Link with 8ns Power-
 On,” IEEE Symp. on VLSI Circuits, pp. 82-83, June. 2011. 
 
[75]  D. Dunwell, et al., “A 2.3-4GHz Injection-Locked Clock Multiplier with 55.7% 
Lock Range and 10-ns Power-On,” in Proc. IEEE Custom Integrated Circuits 
Conf.  (CICC), pp. 1–4, 2012. 
 
[76]  A. P. Chandrakasan, et al., “Technologies for Ultra Dynamic Voltage Scaling,” 
 Proc. IEEE, vol. 98, no. 2, pp. 191-214, Feb. 2010. 
 
[77]  C.-K. Yang, et al., “A 0.8um CMOS 2.5 Gb/s Oversampling Receiver and 
 Transmitter for Serial Links,” IEEE J. Solid-State Circuits, vol. 31, no. 12, pp. 
 2015–2023, Dec. 1996. 
 
[78]  H. Lee, et al., “A 16 Gb/s/Link, 64 Gb/s Bidirectional Asymmetric Memory 
 Interface,” IEEE J. Solid-State Circuits, vol. 44, no. 4, pp. 1235–1247, Apr. 
 2009. 
 
[79]  A. Palaniappan, et al., “A Design Methodology for Power Efficiency 
Optimization  of High-Speed Equalized-Electrical I/O Architectures”, IEEE 
Transactions on VLSI  Systems, vol.  PP, no. 99, 2012. 
 
[80]  R. Sredojevic, et al., “Fully Digital Transmit Equalizer with Dynamic Impedance 
 Modulation,” IEEE J. Solid-State Circuits, vol. 46, no. 8, pp. 1857-1869, Aug. 
 2011. 
 
[81] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits, 2st ed. 
 Cambridge, U.K.: Cambridge Univ. Press, 2004. 
 
[82]  J. Kaukovuori, et al., “Analysis and Design of Passive Polyphase Filters,” IEEE 
 Trans.  Circuits Syst. I, Reg. Papers, vol. 55, no. 10, pp. 3023-3037, Nov. 2008. 
 
[83]  C. Menolfi, et al., “A 16Gb/s Source-Series Terminated Transmitter in 65nm 
 CMOS SOI,”  ISSCC Dig. Tech. Papers, pp.446-447, Feb. 2007. 
 
 137 
 
[84]  Y.-H. Song, et al., “A 6-Gbit/s Hybrid Voltage-Mode Transmitter with Current-
 Mode Equalization in 90-nm CMOS,” IEEE Transactions on Circuits and 
 Systems-II, vol. 59, no. 8, pp. 491-495, Aug. 2012. 
 
[85] Yue Lu, et al., “Design and Analysis of Energy-Efficient Reconfigurable Pre-
 Emphasis Voltage-Mode Transmitter ,” IEEE J. Solid-State Circuits, vol.  48, 
 no. 8, pp. 1898-1909, Aug. 2013. 
 
[86]  A. Amin, et al., “A 32-to-48Gb/s Serializing Transmitter Using Multiphase 
Sampling in 65nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, pp. 38-39, Feb. 
2013. 
 
[87]  R. Ho, et al., “High Speed and Low Energy Capacitivly Driven On-Chip Wires,” 
 IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 52-60, Jan. 2008. 
 
[88]  J. Zerbe, et al., “A 5.6 Gb/s 2.4 mW/Gb/s Bidirectional Link With 8ns Power-
 On,” IEEE Symp. on VLSI Circuits, pp. 82-83, June. 2011. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
