Three-Dimensional (3D) Memory I/O Interface Design Using Quad-Band Interconnect (QBI) And Eight-Level Pulse Amplitude Modulation (8-PAM) by Wang, Xiaoyan
Southern Methodist University 
SMU Scholar 
Electrical Engineering Theses and Dissertations Electrical Engineering 
Spring 2021 
Three-Dimensional (3D) Memory I/O Interface Design Using Quad-




Follow this and additional works at: https://scholar.smu.edu/engineering_electrical_etds 
 Part of the Electrical and Electronics Commons 
Recommended Citation 
Wang, Xiaoyan, "Three-Dimensional (3D) Memory I/O Interface Design Using Quad-Band Interconnect 
(QBI) And Eight-Level Pulse Amplitude Modulation (8-PAM)" (2021). Electrical Engineering Theses and 
Dissertations. 43. 
https://scholar.smu.edu/engineering_electrical_etds/43 
This Dissertation is brought to you for free and open access by the Electrical Engineering at SMU Scholar. It has 
been accepted for inclusion in Electrical Engineering Theses and Dissertations by an authorized administrator of 
SMU Scholar. For more information, please visit http://digitalrepository.smu.edu. 
THREE-DIMENSIONAL (3D) MEMORY I/O INTERFACE DESIGN USING QUAD-BAND 







Approved by:  
 
____________________________________________________ 
Prof. Duncan L. MacFarlane      
Professor of Electrical and Computer Engineering   
 
________________________________________________ 
Prof. Joseph Camp       
Associate Professor of Electrical and Computer Engineering 
 
________________________________________________ 
Prof. Jennifer Dworak       
Associate Professor of Electrical and Computer Engineering 
 
________________________________________________ 
Prof. Mohammad Khodayar      
Associate Professor of Electrical and Computer Engineering 
 
________________________________________________ 
Prof. Jingbo Ye       
Professor of Experimental Physics     
THREE-DIMENSIONAL (3D) MEMORY I/O INTERFACE DESIGN USING QUAD-BAND 




A Dissertation Presented to the Graduate Faculty of the 
 
Bobby B. Lyle School of Engineering 
 




Partial Fulfillment of the Requirements 
 
for the degree of  
 
Doctor of Philosophy 
 
with a  
 






M.S., Electrical Engineering, Beijing University of Posts and Telecomm., China, 2012 
B.S., Information Science and Engineering, Shandong University, China, 2009 
 













 I would like to thank my former advisor, Dr. Byun, Gyung-Su, for the opportunity to 
perform this research, and my current advisor, Dr. MacFarlane, Duncan L., who gave me all the 
freedom, guidance, and support I need. 
I would like to thank my committee for their support and time. I would like to thank all 
the professors for their inspiring courses that I took. The knowledge and ideas I learned from 
those courses greatly helped my research and work.  
I would like to especially thank my husband, who firmly holds my hands and guide me 
walking through the darkest moments in my life. 
Finally, I would like to thank my parent and my friends. I appreciate their help and 






Wang, Xiaoyan            M.S., EE, Beijing University of Posts and Telecomm., China, 2012 
       B.S., Information Science and Engineering, Shandong University, China, 2009 
Three-Dimensional (3D) Memory I/O Interface Design Using Quad-Band Interconnect (QBI) 
And Eight-Level Pulse Amplitude Modulation (8-PAM) 
 Advisor: Professor Duncan L. MacFarlane 
Doctor of Philosophy conferred May 15, 2021 
Dissertation completed April 5, 2021 
 
The demand for high bandwidth, low latency, reconfigurable, and small form factor 
memory I/O interface design has significantly increased with the development of data-intensive 
applications, such as artificial neural networks and high-performance computing. However, the 
current memory I/O interface technologies have critical limitations, such as limited bandwidth, 
limited pin count, long latency, non-reconfigurable data access, and large form factor. Therefore, 
a novel memory I/O interface design is needed to overcome these limitations. 
This dissertation presents two novel memory I/O interface designs using 3D integration, 
QBI, and 8-PAM technologies to improve bandwidth, latency, reconfigurability, and form factor 
for future mobile devices. The proposed 3D reconfigurable QBI memory I/O interface, which 
utilizes a baseband (BB) signaling, three RF-band signalings, and a short vertical 3D µbump 
channel, is capable of reconfigurable data access with low latency and small form factor. 
Moreover, the pin count is reduced by four times due to four data communication on a shared 3D 
µbump channel. The proposed 3D 8-PAM memory I/O interface, which utilizes an 8-level 
signaling and a short vertical 3D µbump channel, enables three data concurrently communication 
between CPU and memory. As a result, three times higher data rate on the channel, three times 
less pin, and reconfigurable data access with low latency and small form factor are achieved 




operation and performance are analyzed, and their circuit implementations are discussed in 
detail. A chip prototype of the 3D QBI transceiver was implemented in a 180 nm CMOS process 
technology. A two-tier QBI die-stack is implemented to verify the QBI design. Face-to-face 
configuration with µbump interconnects is used to save cost. The measured results show that the 
reconfigurable data access with low latency and small form factor is achieved by utilizing both 
3D integration and QBI. The proposed 3D 8-PAM transceiver is analyzed, designed, and 
simulated in a 65nm CMOS technology with a 1.2 V supply. The simulation results show that, 
by utilizing both 3D integration and 8-PAM, the transceiver exhibits higher aggregate data 




TABLE OF CONTENTS 
 
LIST OF FIGURES ................................................................................................................... ix 
LIST OF TABLES ................................................................................................................... xiv 
CHAPTER 1 Introduction........................................................................................................... 1 
1.1 Introduction ....................................................................................................................... 1 
1.2 Proposed Memory I/Os ................................................................................................... 11 
1.3 Dissertation Organization ............................................................................................... 13 
CHAPTER 2 3D Baseband Transceiver using NRZ ................................................................ 15 
2.1 Introduction ..................................................................................................................... 15 
2.2 Baseband Transceiver ..................................................................................................... 16 
2.3 3D µbump Channel ......................................................................................................... 18 
2.4 Simulation Results .......................................................................................................... 20 
CHAPTER 3 3D RF-band Transceiver..................................................................................... 22 
3.1 Introduction ..................................................................................................................... 22 
3.2 RF-band Transmitter ....................................................................................................... 23 
3.3 RF-band Receiver ........................................................................................................... 28 
3.4 The 3D die-to-die Gap Variation Impact ........................................................................ 33 
3.5 RF-band Transceiver ....................................................................................................... 34 




4.1 Introduction ..................................................................................................................... 37 
4.2 3D Quad-Band Interconnect (QBI) Transceiver Architecture ........................................ 40 
4.3 3D Quad-Band Interconnect (QBI) Transceiver Design ................................................ 41 
4.4 3D Quad-Band Interconnect (QBI) Memory Channel Loss and Latency ...................... 47 
4.5 Chip Measurement .......................................................................................................... 50 
4.6 Conclusion ...................................................................................................................... 62 
CHAPTER 5 3D 8-Pulse-Amplitude Modulation (PAM) Memory I/O Interface .................... 63 
5.1 Introduction ..................................................................................................................... 63 
5.2 3D 8-Pulse-Amplitude Modulation (PAM) Transceiver Architecture ........................... 65 
5.3 3D 8-Pulse-Amplitude Modulation (PAM) Transceiver Design .................................... 67 
5.4 3D 8-Pulse-Amplitude Modulation (PAM) Memory Channel Modeling....................... 70 
5.5 3D 8-Pulse-Amplitude Modulation (PAM) Memory I/O Interface Layout and Assembly
............................................................................................................................................... 71 
5.6 Simulated Results............................................................................................................ 72 
5.7 Conclusions ..................................................................................................................... 78 
CHAPTER 6 Conclusions and Future Work ............................................................................ 80 
6.1 Conclusions ..................................................................................................................... 80 
6.2 Future Work .................................................................................................................... 81 




LIST OF FIGURES 
 
Figure 1-1  Typical memory I/O system ......................................................................................... 1 
Figure 1-2  Trend of DRAM performance ...................................................................................... 2 
Figure 1-3  Parallel interface ........................................................................................................... 2 
Figure 1-4  Memory I/O system with PCB traces........................................................................... 3 
Figure 1-5  The HFSS model and simulated S21 of a 10cm 2D FR-4 channel. ............................. 3 
Figure 1-6  Memory I/O interface channel configurations. ............................................................ 4 
Figure 1-7  Working principle of the Preemphasis and equalization technique. ............................ 5 
Figure 1-8  A programmable 3-tap pre-emphasis and a continuous-time equalizer are used to 
compensate for channel loss. .......................................................................................................... 6 
Figure 1-9  Diagram of (a) RF interconnect and (b) multi-band RF interconnect .......................... 7 
Figure 1-10  A high speed, energy-efficient, and reconfigurable memory I/O interface using 
dual-band interconnect. ................................................................................................................... 8 
Figure 1-11  Signal Space diagram for digital PAM signals .......................................................... 9 
Figure 1-12  A 64 Gb/s transceiver for short-reach 4-PAM electrical links ................................... 9 
Figure 1-13  Memory I/O interface structure trends. .................................................................... 10 
Figure 1-14  A wide I/O DRAM using TSV based stacking. ....................................................... 11 
Figure 1-15  Proposed 3D reconfigurable QBI memory I/O interface. ........................................ 12 
Figure 1-16  Proposed 3D 8-PAM memory I/O interface. ........................................................... 13 
Figure 2-1  Baseband transceiver schematic. ................................................................................ 16 
Figure 2-2  Simulated baseband signal. (a) input data, (b) baseband transmitter output. ............. 18 
Figure 2-3  Simulated baseband signal (a) baseband receiver input (b) the recovered signal. ..... 18 
Figure 2-4  Face-to-face µbump technology used in 3D BB transceiver. .................................... 19 
Figure 2-5  (a) HFSS model of the 3D µbump channel and a conventional 10cm 2D FR4 channel 




Figure 2-6  Layout for baseband transmitter (a) and receiver (b). ................................................ 20 
Figure 2-7  Simulated baseband input and output data streams at 3Gb/s. .................................... 21 
Figure 2-8  Simulated baseband eye diagram of the recovered data at 3Gb/s. ............................. 21 
Figure 3-1  RF-band transmitter diagram. .................................................................................... 23 
Figure 3-2  Schematic of the LC VCO used in the RF-band transmitter. ..................................... 25 
Figure 3-3  The symmetric inductor used in the LC VCO: layout; simulated inductance, quality 
factor, and series resistance of (a) 20GHz, (b) 14GHz, and (c) 7GHz. ........................................ 26 
Figure 3-4  Simulated LC VCO output waveform at (a) 20GHz, (b) 14GHz, and (c) 7GHz. ...... 26 
Figure 3-5  Schematic of the ASK modulator used in the RF-band transmitter. .......................... 27 
Figure 3-6  Simulated ASK modulator input and output waveforms of (a) 20GHz, (b) 14GHz, 
and (c) 7GHz. (top: the input data; bottom: the ASK modulated signal). .................................... 28 
Figure 3-7  RF-band receiver diagram. ......................................................................................... 28 
Figure 3-8  Schematic of the differential self-mixer used in the RF-band receiver. .................... 30 
Figure 3-9  Simulated input and output signals of the differential self-mixer used in the RF-band 
receiver, (a) 20GHz carrier frequency, (b) 14GHz carrier frequency, and (c) 7GHz carrier 
frequency (top: the input signal, bottom: the mixer output signal). .............................................. 31 
Figure 3-10  Schematic of the differential amplifier used in the RF-band receiver. .................... 31 
Figure 3-11  Simulated output signal of the differential amplifiers (a) 20GHz band, (b) 14GHz 
band, and (c) 7GHz band. ............................................................................................................. 32 
Figure 3-12  Schematic of the buffer converter used in the RF-band receiver. ............................ 32 
Figure 3-13  Simulated output signal of the buffer converter (a) 20GHz band, (b) 14GHz band, 
and (c) 7GHz band. ....................................................................................................................... 33 
Figure 3-14  The die-to-die gap variation’s impact on the coupling efficiency between the 
inductors. (a) inductors’ HFSS model, (b) simulated coupling factor with different die-to-die 
gaps. .............................................................................................................................................. 34 
Figure 3-15  RF-band transceiver architecture. ............................................................................ 34 
Figure 3-16  Layout of the RF-band transmitter (top) and receiver (bottom). ............................. 35 
Figure 3-17  Simulated RF-band input and output data streams of (a) 20GHz band, (b) 14GHz 




Figure 3-18  Simulated RF-band eye diagrams of (a) 20GHz band, (b) 14GHz band, and (c) 
7GHz band at 3Gb/s. ..................................................................................................................... 36 
Figure 4-1  Diagram of (a) conventional 2D memory I/O, (b) 3D memory I/O. .......................... 38 
Figure 4-2  Proposed 3D reconfigurable QBI memory I/O architecture. Three RF-bands and one 
BB are combined through a quad-band transformer for reconfigurable four-band data 
communication and pin reduction. The 3D technique is utilized to reduce system latency, loss, 
and size. A 3D QBI die-stack is implemented using face-to-face assembly and µbump. ............ 40 
Figure 4-3  Reconfigurable 3D QBI transmitters. It is composed of one BBTX, three RFTXs, and 
a quad-band transformer. .............................................................................................................. 42 
Figure 4-4  Reconfigurable 3D QBI receiver. It consists of one BBRX, three RFRXs, and a quad-
band transformer. .......................................................................................................................... 43 
Figure 4-5  Quad-band transformer design. (a) QBI transformer layout. (b) QBI transformer 3D 
model. (c) The working mechanism of the transformer for BB (CM) and RF-band (DM). ......... 44 
Figure 4-6  3D QBI data communication configurations: BB. ..................................................... 46 
Figure 4-7  3D QBI data communication configurations: RF1. .................................................... 46 
Figure 4-8  3D QBI data communication configurations: RF2. .................................................... 46 
Figure 4-9  3D QBI data communication configurations: RF3. .................................................... 47 
Figure 4-10  Diagram of (a) the proposed two-tier 3D QBI memory I/O interface and (b) the 
conventional point-to-point 2D memory I/O interface. ................................................................ 47 
Figure 4-11  3D QBI µbump model and signal loss simulation using HFSS. (a) 3D QBI face-to-
face configuration. (b) 3D QBI µbump HFSS model. (b) The simulated signal loss of the 3D 
µbump interconnect (blue) and a 10 cm FR-4 T-Line (black). ..................................................... 48 
Figure 4-12  Simulated latency of the QBI transceiver. ............................................................... 49 
Figure 4-13  The Monte Carlo simulation results. (a) latency; (b) power efficiency; (c) output 
data eye width. .............................................................................................................................. 50 
Figure 4-14  Die photos of the 3D QBI transceiver. (a) QBI RX and (b) QBI TX. The TXs and 
RXs for different bands are marked by black, green, blue, and red solid-line boxes, respectively. 
The 3D µbump pad arrays are indicated within the white dotted-line box. .................................. 51 
Figure 4-15  3D QBI assembly diagram. ...................................................................................... 52 
Figure 4-16  (a) 3D QBI die-stack side view (µbumps are marked by a white dotted-line box). (b) 
QBI die-stack top view. (c) 3D QBI wire bonding. ...................................................................... 53 




Figure 4-18  (a) 3D QBI test setup. (b) 3D QBI reconfigurable data communication test 
demonstration. The BERT scope is used to generate the input data stream and measure the BER 
and eye diagram. DC generators are used to provide the supply. ................................................. 54 
Figure 4-19  Measured eye diagrams of 2 Gb/s (BB), 2.3 Gb/s (RF1, 7 GHz), 2.5 Gb/s (RF2, 14 
GHz) and 3 Gb/s (RF3, 20 GHz), respectively. The measured BER of the QBI is less than 10
-15 
for each band. ................................................................................................................................ 55 
Figure 4-20  The measured eye diagrams and BER contours of BB at 1.9Gb/s and 2Gb/s data 
rate. The measured BER is less than 10-15. ................................................................................... 57 
Figure 4-21  The measured eye diagrams and BER contours of RF1 at 2Gb/s and 2.3Gb/s data 
rate. The measured BER is less than 10-15. ................................................................................... 57 
Figure 4-22  The measured eye diagrams and BER contours of RF2 at 2Gb/s and 2.5Gb/s data 
rate. The measured BER is less than 10-15. ................................................................................... 58 
Figure 4-23  The measured eye diagrams and BER contours of RF3 at 2.5Gb/s and 3Gb/s data 
rate. The measured BER is less than 10-15. ................................................................................... 58 
Figure 4-24  Normalized power consumption with different supply voltages of D3 (RF3) at 2Gb/s 
data rate. ........................................................................................................................................ 59 
Figure 4-25  The minimum supply voltage needed for different data rates of D3 (RF3). ............. 59 
Figure 4-26  Measured eye diagrams of RF3 at 1Gb/s – 3.2Gb/s. ................................................ 60 
Figure 4-27  Measured D3 (RF3) Peak-to-Peak jitter at different data rates. ................................ 61 
Figure 4-28  Measured eye diagrams and P-P jitters of 2 Gb/s (BB), 2.3 Gb/s (RF1), 2.5 Gb/s 
(RF2), and 3 Gb/s (RF3), respectively, with repeating 1010 serial pattern. .................................. 61 
Figure 5-1  Diagram of (a) The traditional 2D memory I/O interface; (b) 3D memory I/O 
interface......................................................................................................................................... 64 
Figure 5-2  (a) The architecture of the proposed 3D 8-PAM memory I/O interface; (b) The 8-
PAM signaling enables high data throughput with compact architecture; (c) The 3D technique is 
utilized to reduce system latency, loss, and size. A 3D 8-PAM die-stack can be implemented 
using face-to-face assembly and µbump. ...................................................................................... 65 
Figure 5-3  (a) The architecture of the proposed 3D 8-PAM transmitter. It is composed of an 
encoder, eight DFFs, and an 8-PAM driver; (b) Encoder truth table; Schematic of (c) Encoder; 
(d) DFF; (e) 8-PAM driver. .......................................................................................................... 67 
Figure 5-4  The architecture of the proposed 3D 8-PAM receiver. It is composed of a 
comparator, a circuit block for differential to single-ended conversion (i.e., D-to-S converter), a 




Figure 5-5  Schematic of the proposed 3D 8-PAM receiver blocks: (a) DFF; (b) Decoder. ........ 69 
Figure 5-6  Schematic of the proposed 3D 8-PAM receiver blocks: (a) NMOS differential pair; 
(b) PMOS differential pair; (c) Complementary pseudo-differential pair; (d) Differential to 
single-ended converter (D-to-S converter). .................................................................................. 69 
Figure 5-7  3D 8-PAM µbump model and signal loss simulation using HFSS. The on-chip 3D 
pad is also modeled for high accuracy. ......................................................................................... 70 
Figure 5-8  (a) 8-PAM RX layout; (b) 8-PAM TX layout; (c) 3D 8-PAM assembly diagram. ... 71 
Figure 5-9  (a) Gray-code mapping with symbol errors, (b) Binary-code mapping with symbol 
errors, and (c) Bit error simulation diagram. ................................................................................ 73 
Figure 5-10  Simulation results of the bit errors with gray-code mapping. .................................. 74 
Figure 5-11  Simulation results of the bit errors with binary-code mapping. ............................... 74 
Figure 5-12  The simulated waveforms of the proposed 3D 8-PAM memory I/O interface at the 
worst case (SS corner). (a) the recovered data (4.5 Gb/s per data); (b) the 8-level PAM signal and 
its eye-diagram (13.5Gb/s/pin); (c) the original data (4.5 Gb/s per data). .................................... 75 
Figure 5-13  The simulated latency of the proposed 3D 8-PAM transceiver at SS corner: latency 
breakdown of the transceiver and I/O channel. ............................................................................ 76 
Figure 5-14  Simulated latency mismatch by using Monte Carlo simulations of the proposed 3D 
8-PAM transceiver. (a) the FF corner; (b) the TT corner; (c) the SS corner. ............................... 77 
Figure 5-15  The simulated eye diagrams of the recovered three data at 13.5 Gb/s/pin (4.5 Gb/s 
per data) at the worst corner (SS corner). ..................................................................................... 77 
Figure 6-1  Applications of the proposed memory I/Os. .............................................................. 80 







LIST OF TABLES 
 
Table 4.1  Simulated latency comparison with DBI [22]. ............................................................ 49 
Table 4.2  Comparison with state-of-the-art. ................................................................................ 56 
Table 4.3  P-P jitter comparison between the random input sequence of PRBS 2-7-1 and the 
repeating 1010 serial pattern. ........................................................................................................ 62 
Table 5.1  Simulated latency comparison to [85]. ........................................................................ 76 












1.1 Introduction  
With the rapid growth of information, data-intensive applications, such as artificial neural 
networks (ANN) [1], high-performance computing (HPC) [2], virtual reality (VR), and the 
internet of things (IoT), will be widely used in future mobile devices. To deal with a large 
amount of data movement in these data-intensive applications, the memory I/O interface system, 
as shown in Figure 1-1, is required to increase the bandwidth, reduce latency, reduce power 
consumption, and achieve reconfigurability.  
 
Figure 1-1  Typical memory I/O system 
The main memory interfaces for high-speed DDR systems have been developed from 
DDR1, DDR2, DDR3, DDR4 [3], [4] to DDR5 [5], [6] to support high speed and low power 
applications, as shown in Figure 1-2. The supply voltage decreases from 2.5V to 1.1V, which 





Figure 1-2  Trend of DRAM performance  
The memory I/O interfaces using parallel I/O interconnect, as shown in Figure 1-3, can 
achieve the target bandwidth by adding more pins through a point-to-pint link. However, this 
memory I/O interface signal suffers from crosstalk between interconnects, and its total power 
consumption increases as the parallel interconnect number increases. Furthermore, there is a 
limitation in the total pin count due to the limited chip area and the PCB routing complexity [7].  
 
Figure 1-3  Parallel interface 
Another method to increase the memory I/O bandwidth is to increase the bandwidth of 
each I/O interconnect.  However, the long interconnect between the logic and memory unit limits 




system, as shown in Figure 1-4. The signal suffers from frequency-dependent attenuation by 
channel impedance discontinuity, wire skin effect, and dielectric loss. As shown in Figure 1-5, a 
10cm 2D FR-4 channel is modeled and simulated using a 3D EM solver, HFSS, and the 
simulated S21 is -10.3 dB at 5GHz. Moreover, a long channel causes long latency and a large 
form factor. 
 
Figure 1-4  Memory I/O system with PCB traces 
 
Figure 1-5  The HFSS model and simulated S21 of a 10cm 2D FR-4 channel. 
Furthermore, the conventional memory I/O interfaces only support a fixed channel 
configuration between CPU and memory, as shown in Figure 1-6 (a). The channel cannot be 
reused during the power-down operation mode of the memory. In memory I/O interfaces with 
reconfigurable channel configurations, as shown in Figure 1-6 (b), the channel is shared by 
multiple CPU cores and memory ranks. Reconfigurable (i.e., CPU cores and memory ranks 




and memory ranks communicate at the same time.) data communications between CPU cores 
and memory ranks over a shared channel are supported. Therefore, a memory I/O interface with 
reconfigurable channel configurations can help reduce the I/O number and increase the data rate 
per channel for future mobile devices. 
 
Figure 1-6  Memory I/O interface channel configurations. 
Researchers have studied multiple approaches so far to overcome the conventional 
memory I/O interface limitations mentioned above, including equalization technique, multi-band 
interconnect, PAM technique, and 3D integration. The following section discusses these 
approaches in detail. 
1.1.1 Pre-emphasis and equalization technique 
To compensate the frequency-dependent loss of the band-limited channels, pre-emphasis 
[8], [9], [10], [11], [12] or equalization [13], [14], [15], [16], [17] technique are employed widely 
in serial links [18], [19]. As shown in Figure 1-7, a flat frequency response can be obtained 





Figure 1-7  Working principle of the Preemphasis and equalization technique. 
In prior art [9], a 20Gb/s transceiver with a 30 cm FR4 channel using a 40 nm CMOS 
process is demonstrated. The channel insertion loss is up to 12dB at the baud frequency.  To 
compensate for the channel loss, a programmable 3-tap pre-emphasis at the transmitter and a 
continuous-time equalizer (CTLE) at the receiver are used, as shown in Figure 1-8. The 
transceiver bandwidth is extended by using the pre-emphasis and equalization techniques. 





Figure 1-8  A programmable 3-tap pre-emphasis and a continuous-time equalizer are used to 
compensate for channel loss. 
1.1.2 Multi-band interconnect (MBI) technique 
Another approach to increase the total data rate is using a multi-band interconnect to 
transceive signals in multiple frequency bands on a shared I/O interconnect. Figure 1-9 shows 
the concept of RF interconnect and multi-band RF interconnects.  
In a single-carrier RF interconnect, as shown in Figure 1-9 (a), the input baseband data is 
up-converted to a high frequency modulated signal by the transmitter mixer using amplitude (i.e., 
ASK) or phase (i.e., BPSK [20]) modulation, and the RF modulated signal is then down-
converted to the original data by the receiver mixer. One advantage of RF interconnect over 
conventional baseband signaling is that low power and low latency are achieved since electro-
magnetic (EM) waves are transmitted with the speed of light in the RF interconnect instead of 
the slow voltage signaling transmission in the baseband.   
As shown in Figure 1.8 (b), the single-carrier RF interconnect idea is expanded in the 




different high frequency signals and transmitted to a shared channel with the baseband 
interconnect in a frequency-division multiple access (FDMA) manner [21], [22], [23], [24], [25], 
[26]. As a result, the multi-band RF interconnect can achieve a high data rate without increasing 
the pin count. 
 
Figure 1-9  Diagram of (a) RF interconnect and (b) multi-band RF interconnect 
In 2012, Byun et al. [21] had implemented a dual-band memory I/O interface, as shown 
in Figure 1-10. One baseband and one RF-band (@ 23 GHz) signal are transceived using a 
shared 10cm 2D transmission line. An energy-efficient (2.5 pJ/b from a 1V supply), high-speed 
(8.4 Gb/s in 65-nm CMOS), and reconfigurable communication [24] between multiple CPU 




bandwidth by incorporating more RF-band interconnect (i.e., quad-band interconnect combing 
one baseband and three RF-band). 
 
Figure 1-10  A high speed, energy-efficient, and reconfigurable memory I/O interface using 
dual-band interconnect. 
1.1.3 Pulse amplitude modulation (PAM) technique 
The PAM [27], [28], [29], [30], [31], [32] technique is a widely used approach to increase 
data rates. In PAM signal modulation, multi-level signaling is used and information is encoded in 
the amplitude of the signal, as shown in Figure 1-11 (b).  A conventional baseband signal 
typically uses a binary modulation non-return-to-zero (NRZ) signaling [33], [34], also known as 
2-PAM (shown in Figure 1-11 (a)). Compared with NRZ, each symbol in 4-PAM carries twice 
the information leading to half spectral occupation. Therefore, two times the data rate can be 
obtained on the same channel by using 4-PAM. However, the 4-PAM signal’s eye amplitude is 




more demanding in 4-PAM due to three times larger ISI impact compared to NRZ [29].
 
Figure 1-11  Signal Space diagram for digital PAM signals 
 
Figure 1-12  A 64 Gb/s transceiver for short-reach 4-PAM electrical links 
In [29], a 4-level PAM transceiver, operating up to 64 Gb/s across a 16.8 dB loss channel 
and using a 28 nm CMOS fully depleted silicon-on-insulator (FDSOI), was demonstrated, as 
shown in Figure 1-12. A flexible continuous-time linear equalizer (CTLE) is used in the receiver, 




channel insertion loss. However, equalization adds design complexity and takes more power and 
area, as we discussed in 1.1.1.  Moreover, higher receiver sensitivity and higher SNR are needed 
if a greater degree of modulation, like 8-PAM, is used to increase the bandwidth further.  
1.1.4 3D integration 
As a development of IC packaging technology, 3D integration is utilized in point-to-point 
and multi-drop memory I/O interfaces [35], [36], [37], [38], [39], [40], [41], as shown in Figure 
1-13. In 3D integration technology [42], [43], [44], [45], chips are stacked on each other using a 
very short vertical 3D channel (µbump or through-silicon via (TSV)). As a result, 3D integration 
offers multiple system performance improvements, including low interconnect latency, low 
signal loss,  high I/O pin density, and compact size [46], [47], [48]. Therefore, 3D integration is a 
promising technique for high bandwidth, low latency, and small form factor memory I/O 
interfaces.  
 




 In [37], a Wide I/O DRAM using TSV based stacking is developed in a 50 nm 
technology. A 2-die stacking with 7.5 µm diameter and 40 µm pitch TSVs is implemented and 
tested. Compared to a conventional DRAM with 16 I/Os, it has 512 I/Os and achieves 12.8GB/s 
bandwidth. However, it only operates at a 0.2 Gb/s/pin data rate and nonreconfigurable (i.e., 
supporting a fixed channel configuration between the memory controller and shared DRAMs). 
There is still room to increase the per-pin data rate further, and reconfigurable data access is 
possible by employing PAM or MBI technique. 
 
Figure 1-14  A wide I/O DRAM using TSV based stacking. 
1.2 Proposed Memory I/Os 
 The four techniques mentioned in 1.1.1 to 1.1.4 have their benefits and drawbacks. To 
design a memory I/O interface with a high data rate, low latency, reconfigurable, and small form 
factor, we thoroughly investigated these techniques, especially quad-band interconnect (QBI), 8-
PAM, and 3D integration, and proposed two novel memory I/O interfaces. This dissertation 
includes designing and implementing a 3D reconfigurable QBI memory I/O interface and a 3D 
8-PAM memory I/O interface.  
1.2.1 3D reconfigurable QBI memory I/O interface 
A novel 3D reconfigurable memory I/O transceiver using a quad-band interconnect 




RF-I technique, provides I/O data reconfigurability, decreases latency, and reduces pin count for 
future compact mobile memory interfaces. The 3D integrated circuit (3D-IC) technique reduces 
signal latency and form factor and improves signal integrity. A novel quad-band transformer is 
proposed to achieve reconfigurable four-band data communication and reduce the I/O pin count 
by four times. A two-tier QBI die-stack is implemented to verify the QBI design. Face-to-face 
configuration with µbump interconnects is used to save cost. The QBI chips are designed and 
fabricated in a 180 nm CMOS process. With this design, a total of 4 CPU cores and 4 memory 
ranks can be connected in a time-division multiplexing (TDM) manner. Compared with the 3D 
wide I/O2 [49], the proposed 3D QBI can significantly increase the link reconfigurability by 
extending BB only I/O to BB + three RF-bands I/O. This QBI technology can also be employed 
in next-generation artificial intelligence computing systems which need massively parallel or 
reconfigurable I/O links. 
 
Figure 1-15  Proposed 3D reconfigurable QBI memory I/O interface. 
1.2.2 3D 8-PAM memory I/O interface 
A 13.5 Gb/s/pin memory I/O interface using the 3D integrated circuit (3D-IC) and 




The proposed 8-PAM I/O interface provides high bandwidth, low latency, and a small form 
factor for future compact mobile memory applications. The proposed 8-PAM I/O transceiver can 
support three data concurrently communications and reduce the I/O pin counts by three times. 
The 3D-IC technique is also used to reduce latency and improve signal integrity. The proposed 
3D 8-PAM memory I/O interface is designed in a 65 nm CMOS technology at a 1.2 V supply. 
The simulation results show that the proposed 3D 8-PAM I/O interface can support up to 13.5 
Gb/s/pin data rate with 724 ps latency and 3.4 pJ/b/pin power efficiency. Compared to the 2D 
package solutions, the 3D 8-PAM memory interface can triple the data rate with less power 
consumption and less latency. 
 
Figure 1-16  Proposed 3D 8-PAM memory I/O interface. 
1.3 Dissertation Organization 
In this dissertation, two 3D memory I/O transceivers with high speed, low latency, 
reconfigurability, and small form factor are developed for future compact mobile memory 
interfaces. This dissertation focuses on the memory I/O transceiver design's key techniques, 
including 3D integration, quad-band interconnect, and 8-PAM signaling.  
In chapter 2 and chapter 3, a 3D baseband memory I/O interface using NRZ scheme and 




Chapter 4 will focus on the proposed 3D memory I/O transceiver design using quad-band 
interconnect. The 3D QBI memory I/O architecture and circuits are described in detail. The 3D 
QBI assembly and test procedure are provided. The simulation and measurement results by using 
the 180 nm CMOS process are presented and analyzed.  
Chapter 5 will present the proposed 3D memory I/O transceiver using 8-PAM signaling. 
We will first describe the 3D 8-PAM memory I/O architecture and circuits in detail. After that, 
we will present the simulation results of the proposed 3D 8-PAM transceiver. 






3D Baseband Transceiver using NRZ 
 
In this chapter, a 3D baseband transceiver using NRZ is presented. The transceiver 
circuits, including pre-driver, transmitter output driver, differential amplifier, and receiver output 
driver, are described. The 3D µbump channel used in the transceiver is characterized and 
described. The layout and simulation results of the circuits using a 180 nm CMOS technology 
are also provided. 
2.1 Introduction 
 The demand for high bandwidth, low power consumption, low latency, and compact 
packaging in memory I/O interfaces has increased significantly as more and more media-
intensive applications are added to mobile devices, such as video and cellphone games.  
However, the conventional 2D memory I/O interface using NRZ has critical limitations in 
bandwidth, power consumption, and latency due to the long 2D channel.  
 Recent works for baseband memory I/O interfaces have studied several techniques to 
overcome the channel limitations. In [50], the equalization technique is used to compensate for 
the frequency-dependent channel attenuation. However, extra design complexity and power 
consumption are needed. In [51], 4-PAM is used to increase spectral efficiency leading to a 
double data rate for a given channel bandwidth. However, a greater degree of modulation 




I/O interface. 3D integration [52] is a promising solution to overcome the limitations by 
replacing a long 2D channel with a short vertical 3D channel. 
Moreover, compared to the 2D memory I/O interface, the pin count has been significantly 
increased by utilizing 3D integration (i.e., 288 pins in DDR4 module [3] and 1024 pins in HBM 
[40]). Thus, in this chapter, a simple baseband memory I/O interface using 3D integration is 
presented. It will be extended to the 3D reconfigurable QBI memory I/O interface described in 
chapter 4 and the 3D 8-PAM memory I/O interface described in chapter 5. 
2.2 Baseband Transceiver  
 
Figure 2-1  Baseband transceiver schematic.  
Figure 2-1 shows the schematic of the baseband transceiver. It consists of an inverter-
based transmitter and a receiver using differential amplifiers. Single-ended signaling is used in 




As indicated in [53], between the two typical drivers, inverter-based driver or CML 
driver, the inverter-based driver has a higher power efficiency. Therefore, the inverter-based 
pull-up/pull-down output driver is used in the transmitter to save power, as shown in Figure 2.1. 
Moreover, a termination resistor is added to the pull-up/pull-down output driver to match the 
channel impedance and improve the signal integrity.  The pull-up or a pull-down branch's total 
resistance includes the transistor impedance and the added termination resistor resistance. The 
transistor impedance is not linear. Therefore, a transistor with a large size is used to obtain 
relatively small impedance and make the resistor resistance dominate. However, to drive the 
output driver with a large transistor size, a multi-stage pre-driver is needed. An inverter-based 
pre-driver is used in the transmitter for the same power-saving reason. As studied in [54], the 
signal's amplitude in an inverter chain is reduced as the data rate increases. To maintain a 
reasonable signal amplitude for a higher data rate, FO2 is used in the pre-driver inverter chain 
design. The simulated FO2 delay for the used 180nm CMOS process is 49.1ps. When the signal 
amplitude is reduced by 2.5%, the data rate is about 3 Gb/s in the used 180 nm CMOS process. 






Figure 2-2  Simulated baseband signal. (a) input data, (b) baseband transmitter output. 
On the baseband receiver side, two-stage differential amplifiers are used. The first 
differential amplifier is used as a comparator for better noise immunity and low input voltage 
sensitivity. The second differential amplifier is used to further amplify the signal before the 
receiver output driver. Figure 2-3 (a) and (b) show the baseband receiver received signal and the 
recovered signal. 
 
Figure 2-3  Simulated baseband signal (a) baseband receiver input (b) the recovered signal. 
2.3 3D µbump Channel 
In [48], different 3D interconnect approaches are described and compared, including 
wire-bonded, µbump, TSV, and contactless. The wire-bonded approach has limited interconnect 
density. TSV interconnection [55], [56] has the greatest cost due to the specialized processing 
steps, and its assembly occurs at the wafer level. The contactless interconnection needs to use 
capacitive [57], [58] or inductive coupling [59], [60] to communicate between dies and solder 
bumps to provide DC power to all the chips. µbump technology uses solder or gold bumps to 




processing cost than the TSV approach, and no need to use capacitive or inductive coupling than 
the contactless approach. Furthermore, in our 3D memory I/O interface design, only two tiers are 
needed. Therefore, a face-to-face µbump technology [61] is used in our design, as shown in 
Figure 2-4.  
 
Figure 2-4  Face-to-face µbump technology used in 3D BB transceiver. 
To get an accurate model of the 3D µbump channel, a 3D EM solver (HFSS[62]) is used, 
as shown in Figure 2-5 (a).  To compare the signal loss of the 2D and 3D interconnect, both the 
3D µbump channel and a 10cm FR-4 T-Line are modeled. The 2D FR-4 interconnect model 
includes wire-bonds and parasitic capacitance for accurate signal loss simulation. For easier 
fabrication and low fabrication cost, a large µbump with 100µm diameter and 100µm pitch is 
used. Figure 2-5 (b) shows the simulated signal loss. As we can see from the S21 plot, the 
µbump channel loss is very small. At 5Ghz, the µbump loss is 0.5dB, while the FR-4 T-Line loss 






Figure 2-5  (a) HFSS model of the 3D µbump channel and a conventional 10cm 2D FR4 channel 
(b) simulation results 
2.4 Simulation Results 
 The 3D baseband transceiver is designed and simulated in a 180 nm CMOS technology. 
The layout of the BBTX and BBRX are shown in Figure 2-6 (a) and (b), respectively. The area 
of the BBTX and BBRX are 100 µm x 38.5 µm and 97.8 µm x 60.5 µm, respectively. Figure 2-7 
and Figure 2-8 show the simulated baseband input/output data stream and eye diagram of the 
recovered data at 3Gb/s. 
 





Figure 2-7  Simulated baseband input and output data streams at 3Gb/s. 
 





3D RF-band Transceiver 
 
In this chapter, a 3D RF-band transceiver is presented. The RF-band transmitter and 
receiver architectures are described. The implementation of the RF-band transceiver circuits, 
including LC VCO, ASK modulator, differential self-mixer, differential amplifier, buffer 
converter, and the output driver, are discussed. The impact of the die-to-die gap variation on the 
3D RF-band inductors is evaluated.  The layout and simulation results of the circuits using 180 
nm CMOS technology are also provided. 
3.1 Introduction 
 The 3D baseband memory I/O interface overcomes the limitations induced by the long 
2D channel. The bandwidth, latency, and form factor can be significantly improved. However, 
the CPU and memory communication in a 3D baseband memory I/O cannot be reconfigured. 
Integrating an RF-band transceiver with a baseband transceiver is a promising solution to this 
problem. The use of RF-band interconnect increases the number of input data streams and 
achieve reconfigurable data access. Moreover, compared to the conventional RF interconnect 
using a long 2D channel, more RF bands can be integrated by using the short 3D channel due to 
the significantly reduced channel attenuation.   
 Binary phase-shift keying (BPSK) modulation [26], [63] and frequency shift keying 




frequency synchronization are very critical in signal modulation and demodulation. A power-
hungry and complicated synchronization scheme is needed. Therefore, in this chapter, a 3D RF-
band transceiver using noncoherent ASK modulation, which needs a simple demodulation 
scheme, is described. Three RF bands, 20GHz, 14GHz, and 7GHz, are presented. 
3.2 RF-band Transmitter 
An RF-band transmitter modulates the carrier signal with the input baseband data and 
sends it to the receiver. Differential signaling is used to reduce common-mode noise. Figure 3-1 
shows the diagram of the RF-band transmitter used in this design. It consists of an LC VCO, an 
input buffer, and an ASK modulator. An LC cross-coupled VCO generates the carrier signal. 
Noncoherent ASK modulation is utilized to up-convert the input baseband data. A self-mixer and 
subsequently filter can detect the modulated signal's envelope, and no clock signal is needed. 
The ASK modulator has two inputs, the carrier signal from the LC VCO and the input data from 
the input buffer. The input data controls the switches in the modulator to pass or block the carrier 
signal. The final modulated output signal is sent to the channel. 
 




 Figure 3-2 shows the LC VCO schematic used in the RF-band transmitter. An LC-tuned 
cross-coupled pair is used. To establish oscillation, the LC cross-coupled pair need to satisfy two 
conditions, called “Barkhausen criteria”, the total phase shift is equal to 360 and the loop gain: 
(𝑔𝑚𝑅𝑝)
2 ≥ 1 
Where Rp is the equivalent parallel resistance of the LC tank.  And  
𝑅𝑝 = 𝑄
2 × 𝑅𝑠 
Where Rs is the equivalent series resistance of the inductor, and Q is the inductor quality factor. 
At resonance, the total phase shift of the LC VCO around the loop is equal to 360 and the 
𝑔𝑚𝑅𝑝 value is chosen to be more than 4 to establish the oscillation. 
The inductor and capacitor values determine the LC oscillator frequency. A symmetric 
inductor structure is used with its center tap tied to VDD. The output common-mode level of the 
LC VCO is VDD. To vary the LC cross-coupled oscillator's frequency, two MOS varactors, Mv1 
and Mv2, are used to vary the capacitance of the tank. As shown in Figure 3-2, the gates of the 
MOS varactors are connected to the oscillator nodes, and the source and drain are connected to a 
control signal provided off-chip. The capacitance of the MOS varactor changes as we change the 
control signal's voltage (i.e., when the control signal goes from 0 to VDD, the capacitance 
decreases, and oscillation frequency increases).  
One key drawback of an LC VCO is that more area is need due to the inductor. The 
inductor size is inversely proportional to the oscillation frequency. Therefore, a higher carrier 






Figure 3-2  Schematic of the LC VCO used in the RF-band transmitter. 
 A large output voltage swing with low power consumption is needed for the LC VCO 
design. The LC VCO output voltage swing is determined by the bias current, ISS, and the 
equivalent parallel resistance Rp. The LC VCO power is determined by the bias current ISS. To 
meet both the output swing and power consumption requirements, a larger Rp and small ISS is 
required. Therefore, the spiral symmetric inductors used in the LC VCO need to have a high-
quality factor.  
Figure 3-3 shows the inductors used for different oscillation frequencies (i.e., 7 GHz, 
14GHz, and 20GHz) and the simulated inductance, quality factor (Q), and the equivalent series 
resistance Rs. The symmetric inductors used in the RF-band transmitter are 2-turn coils with 5 
µm space, 10 µm width and 150 µm, 170 µm, and 300 µm diameters for 20GHz, 14GHz, and 
7GHz, respectively. To maximize the quality factor (Q), the top two metals with less resistance 
per unit area in the process are used. For all the oscillation frequencies, the quality factors are 





Figure 3-3  The symmetric inductor used in the LC VCO: layout; simulated inductance, quality 
factor, and series resistance of (a) 20GHz, (b) 14GHz, and (c) 7GHz.  
 Figure 3-4 shows the LC VCO output waveform for 20 GHz, 14GHz, and 7GHz, 
respectively.  
 





Figure 3-5  Schematic of the ASK modulator used in the RF-band transmitter. 
Figure 3-5 shows the schematic of the ASK modulator used in the RF-band transmitter 
with signal waveforms. Its implementation is straightforward. The carrier signal, cos(𝜔𝑡), from 
the LC VCO and the input data, a random sequence m(t), from the input buffer are directly 
multiplied and upconverted to the RF-band, then the modulated signal, 𝑋𝐴𝑆𝐾(𝑡), is transmitted to 
the channel.  
𝑋𝐴𝑆𝐾 (t) = m(t)cos(𝜔𝑡) 
The incoming random data is with a signal value of either 1 or 0.  When it is 1, M3 and M4 are 
on, and the carrier signal can pass to the output. When the input data is 0, M3 and M4 are off, and 
the carrier signal is blocked. The inductor used in the ASK modulator should oscillate at the 
carrier frequency to pass the carrier signal. Its center tap is tied to the supply voltage to provide 




Figure 3-6 shows the simulated ASK modulator input and output waveforms in 180nm 
CMOS process technology for 20GHz, 14 GHz, and 7GHz carrier signals. The input data is 
shown on the top, and the bottom waveform exhibits the ASK modulated output signal. 
 
Figure 3-6  Simulated ASK modulator input and output waveforms of (a) 20GHz, (b) 14GHz, 
and (c) 7GHz. (top: the input data; bottom: the ASK modulated signal). 
3.3 RF-band Receiver 
 
Figure 3-7  RF-band receiver diagram. 
 An RF-band receiver is used to demodulate the incoming signal from the channel and 




Figure 3-7 shows the RF-band receiver diagram. It comprises a differential self-mixer, a 
differential amplifier, a buffer converter, and an output driver. The differential self-mixer detects 
the incoming ASK modulated signal's envelope and down-converts it to the original baseband 
data. The following differential amplifier and buffer converter filter out the unwanted high-
frequency signal and amplify the baseband signal. The buffer converter outputs are settled on a 
single common-mode voltage. 
A mixer is a three-port device that can modulate or demodulate signals. A typical mixer 
has a local oscillator (LO) input port, a radio frequency (RF) input/output port, and an 
intermediate frequency (IF) input/output port. The mixer can perform either up-conversion or 
down-conversion. A down-conversion mixer, where the LO and RF are input ports and the IF is 
the output port, can demodulate the ASK modulated signal in our case. However, a local 
oscillator signal is needed in the typical mixer, which increases the design complexity, area, and 
power consumption. In [66], a self-mixing technique is used to demodulate the ASK signal 
without a local oscillator signal. In our design, we use a similar self-mixer to save power and 
area. Figure 3-8  shows the schematic of the differential self-mixer used in the RF-band receiver. 
The incoming ASK modulated signals are fed to the differential gates of M1-M4 directly and the 
sources of M1-M4 through a high-pass capacitor (i.e., the RF and LO inputs are at the same 
carrier frequency). Transistors M1-M4 are used as voltage-controlled resistors. The biasing for 





Figure 3-8  Schematic of the differential self-mixer used in the RF-band receiver. 
The incoming ASK modulated signal multiplies itself in the self-mixer and generates two 
signal parts in the self-mixer's output, as shown in the equation below.  
𝑌(𝑡) = 𝑋𝐴𝑆𝐾
2 (𝑡) = 𝐴𝑐





(1 + cos(2𝜋𝑓𝑡)) 
Where Ac is the attenuation factor of the transformer and the channel. The first part is 𝑚2(𝑡), it 
will not change the logic level, and it is the baseband signal we want. The second part is a high-
frequency harmonic signal, which we need to filter out.  
Figure 3-9 shows the differential self-mixer's simulated input and output signals with 
input carrier frequencies at 20GHz, 14GHz, and 7GHz, respectively. The self-mixer's 
complementary output signals have two different common-mode levels due to the down-
conversion method used by the self-mixer. The mixer output signal swing is small, and it 





Figure 3-9  Simulated input and output signals of the differential self-mixer used in the RF-band 
receiver, (a) 20GHz carrier frequency, (b) 14GHz carrier frequency, and (c) 7GHz carrier 
frequency (top: the input signal, bottom: the mixer output signal). 
The differential amplifier follows the self-mixer is used to filter out the unwanted high-
frequency harmonic signal since the amplifier has a limited gain bandwidth and amplifies the 
desired baseband signal. Figure 3-10 shows the schematic of the differential amplifier. The 
output signals of the differential amplifiers are shown in Figure 3-11 
 





Figure 3-11  Simulated output signal of the differential amplifiers (a) 20GHz band, (b) 14GHz 
band, and (c) 7GHz band. 
Figure 3-12 shows the schematic of the buffer converter used in the RF-band receiver. 
The buffer converter amplifies the small demodulated signal and filters out any unwanted high-
frequency harmonic signals. The RC-feedback is needed to block the unbalanced common-mode 
and generate a new balanced common mode. The RC feedback circuit needs to be optimized to 
obtain a fast transient signal settlement. Figure 3-13 shows the simulated output signals of the 
buffer converter. The following output driver has a 50 Ohms resistance load to match the test 
board's impedance transmission line for better signal integrity. 
 





Figure 3-13  Simulated output signal of the buffer converter (a) 20GHz band, (b) 14GHz band, 
and (c) 7GHz band.  
3.4 The 3D die-to-die Gap Variation Impact 
The 3D µbump channel used in the RF-band is the same as the 3D baseband channel as 
introduced in 2.3. One area of specific interest/concern in the RF band is the variation in the die-
to-die gap, which may directly impact the coupling efficiency between the inductors. To evaluate 
the variation impact of the die-to-die gap, we built 3D signal couplers with different gaps (i.e., 
14GHz inductors) using a highly accurate 3D EM HFSS simulator, as shown in Figure 3-14 (a). 
We verified the variation impact of coupling efficiency between inductors. As shown in Figure 
3-14 (b), with 10% die-to-die gap variations (i.e., 90µm, 100µm, and 110µm), the simulated 
coupling factor variation is 19% at 14GHz. The coupling efficiency could impact the quad-band 
interconnect (QBI) performance (i.e., the incoming RF carrier signal at the receiver side might 
degrade four times). However, an active mixer with a variable gain amplifier could overcome the 





Figure 3-14  The die-to-die gap variation’s impact on the coupling efficiency between the 
inductors. (a) inductors’ HFSS model, (b) simulated coupling factor with different die-to-die 
gaps. 
3.5 RF-band Transceiver 
 Figure 3-15 shows a single RF-band transceiver architecture. It consists of an LC VCO, 
an ASK modulator, a 3D µbump channel, a differential self-mixer, an amplifier, and an output 
driver. 
 




 Figure 3-16 shows the layout of the RF-band transmitter and receiver. A 180 nm CMOS 
technology is used. The area without the inductors/transformers are 152 µm x 103.6 µm, and 
273.6 µm x 95.6 µm, respectively. 
 
Figure 3-16  Layout of the RF-band transmitter (top) and receiver (bottom). 
 Figure 3-17 and Figure 3-18 show the simulated RF-band input/output data streams and 
eye diagrams of (a) 20GHz band, (b) 14GHz band, and (c) 7GHz band. All of them are at the 






Figure 3-17  Simulated RF-band input and output data streams of (a) 20GHz band, (b) 14GHz 
band, and (c) 7GHz band at 3Gb/s. 
 
Figure 3-18  Simulated RF-band eye diagrams of (a) 20GHz band, (b) 14GHz band, and (c) 





3D Reconfigurable Quad-Band Interconnect (QBI) Memory I/O Interface  
 
This chapter presents a novel 3D reconfigurable memory I/O transceiver using a quad-
band interconnect (QBI). The 3D QBI architecture and transceiver design are described in detail. 
A quad-band transformer and the 3D µbump channel used in the 3D QBI transceiver are 
characterized and described. The 3D QBI test procedure, including die preparation, assembly, 
and test setup, is presented. The measurement results are also provided and analyzed. This 
chapter is based on my accepted paper: 3D reconfigurable quad-band interconnect (QBI) 
memory I/O interface [67]. 
4.1 Introduction 
 The development of high-performance processors and high bandwidth memory have been 
demanded by data-intensive applications such as high-performance computing (HPC) [2] and 
artificial neural networks [1]. Therefore, the demand for reconfigurable and low latency I/O 
interfaces has significantly increased. However, the conventional 2D I/O interface, shown in 
Figure 4-1(a), has critical limitations such as high signal loss and high latency due to the long 
interconnect between the logic and memory unit [68]. Furthermore, it is difficult to use a large 
number of I/Os to increase bandwidth due to the limited chip area and the PCB routing 





Figure 4-1  Diagram of (a) conventional 2D memory I/O, (b) 3D memory I/O. 
Currently, many efforts have been dedicated to overcoming such limitations. Kim et al. 
[69] and Tomita et al. [70] inversed the interconnect effects using predistortion and equalization 
techniques to achieve high interconnect bandwidth but at the cost of increased power 
consumption and circuit design complexity. In [21], [71], [23], RF-Interconnect (RF-I) was used 
to support multiband and reconfigurable data communication, which leads to less pin and higher 
spectral efficiency [24]. In [21], Byun et al. proposed a dual-band interconnect (DBI) to double 
the interface bandwidth and implement reconfigurable data communication between two CPU 
cores and two memory ranks. This approach achieves data reconfigurability and reduces the I/O 
pin count while maintaining low-power circuit operation. However, there is still plenty of room 
to further increase the reconfigurability of the I/O links, decrease the I/O latency and enhance 
energy efficiency. 
The 3D-IC technique has been used to reduce the interconnect length [48], [35], [72], 
which can help overcome the 2D interconnect performance limitations in previous works. By 
stacking multiple die layers with a short vertical interconnect, as shown in Figure 4-1 (b), a 3D-
IC offers multiple system performance improvements, including low interconnect latency, low 




memory and wide I/O are designed using the 3D-IC technique. High data bandwidth is achieved 
by using a large number of I/Os. However, the I/O links are not reconfigurable. 
To overcome the 2D I/O interconnect limitations and achieve a higher level of 
reconfigurability in a 3D-IC, we propose a 3D reconfigurable quad-band interconnect (QBI), 
which combines the 3D-IC and the RF-I technique, to reduce interconnect latency and signal 
loss, reduce system size, and implement reconfigurable data communication that saves the I/O 
number. The proposed 3D QBI memory I/O interface is shown in Figure 4-2. It consists of one 
BB transceiver, three RF-band transceivers, and a 3D short vertical differential interconnect. 
With this design, a total of 4 CPU cores and 4 memory ranks can be connected in a time-division 
multiplexing (TDM) manner. Compared with the 3D HBM [40], the proposed 3D QBI can 
significantly increase the link reconfigurability by extending BB only I/O to BB + three RF-
bands I/O. 
The rest of this chapter is organized as follows. Section 4.2 provides an overview of the 
3D QBI transceiver architecture and the 3D-IC technique and RF-I technique used. Section 4.3 
describes the design details of the QBI transceiver, quad-band transformer, and 3D µbump 




4.2 3D Quad-Band Interconnect (QBI) Transceiver Architecture 
 
Figure 4-2  Proposed 3D reconfigurable QBI memory I/O architecture. Three RF-bands and one 
BB are combined through a quad-band transformer for reconfigurable four-band data 
communication and pin reduction. The 3D technique is utilized to reduce system latency, loss, 
and size. A 3D QBI die-stack is implemented using face-to-face assembly and µbump. 
The proposed 3D reconfigurable QBI memory I/O interface architecture is shown in 
Figure 4-2. It incorporates three sets of RF-band transceivers, one BB transceiver, and a shared 
short 3D interconnect. It can support reconfigurable data communications between four CPU 
cores (core A-D) and four memory ranks (rank A-D). 
The RF-I technique and 3D-IC technique are used to implement reconfigurable data 
communication, reduce pin count, and decrease system latency. A quad-band transformer is 




corner of Figure 4-2, which increases spectral efficiency. Differential signaling is used for RF-
bands to reject common-mode noise and maintain the signal-to-noise ratio (SNR). Single-ended 
signaling is used for BB, which is compatible with the standard memory I/O interface. A two-tier 
3D-IC is implemented as shown in the bottom right corner of Figure 4-2. Two dies are stacked 
face-to-face using a short µbump interconnect. 
4.3 3D Quad-Band Interconnect (QBI) Transceiver Design 
4.3.1 BB and RF-band Transmitter (BBTX, RFTX) 
Figure 4-3 shows the proposed QBI transmitter, which consists of one BB transmitter 
(BBTX), three RF-band transmitters (RFTX), and a quad-band transformer. 
Each RFTX contains a VCO and an ASK modulator. The VCO generates RF carrier 
signals at 7 GHz (RF1), 14 GHz (RF2), and 20 GHz (RF3). The noncoherent ASK modulator 
upconverts the input data streams [68]. The modulated signals are injected into the primary coil 
of the quad-band transformer and then transferred to the 3D µbump interconnect. The center tap 
of the primary coil of the transformer provides the supply voltage for the modulator. The 
simulated waveforms of D2 (RF2) and its ASK modulated output signals are shown in Figure 4-3 
(a) and (b). 
In BBTX, the data steam D4(BB) is sent to the center tap of the quad-band transformer 
secondary coil using a push-pull driver. The BBTX input and output simulated waveforms are 





Figure 4-3  Reconfigurable 3D QBI transmitters. It is composed of one BBTX, three RFTXs, and 
a quad-band transformer. 
4.3.2 BB and RF-Band Receiver (BBRX, RFRX) 
Figure 4-4 shows the QBI receiver, which consists of a BB receiver (BBRX), three RF-





Figure 4-4  Reconfigurable 3D QBI receiver. It consists of one BBRX, three RFRXs, and a quad-
band transformer. 
Each RFRX consists of a self-mixer and a buffer converter. The incoming modulated RF 
signal is band-selected by the proposed quad-band transformer. The common-mode voltage is set 
by the termination voltage at the center tap of the transformer primary coil, which minimizes the 
DC offset. The differential self-mixer [66] directly downconverts the ASK modulated signal into 
the BB data. The subsequent buffer converter amplifies the small demodulated signal and filters 
out any unwanted high-frequency harmonic signals. The simulated incoming ASK modulated 
signal and recovered BB data for RF2 are shown in Figure 4-4 (a) and (b). 
In BBRX, the incoming data stream is amplified by the differential amplifier. The 




4.3.3 Quad-band Transformer Design and the Four Communication Configurations 
 
Figure 4-5  Quad-band transformer design. (a) QBI transformer layout. (b) QBI transformer 3D 
model. (c) The working mechanism of the transformer for BB (CM) and RF-band (DM). 
The quad-band transformer is designed to implement reconfigurable data communication 
and reduce pin count. The key design goals of the quad-band transformer are small size, small 
signal loss, and data reconfigurability. To have a small transformer size, high carrier frequencies 
should be used for RF-bands. The maximum RF-band carrier frequency can be determined by 
considering the minimum on-chip transformer size and the transit frequency (fT), which is the 
maximum available speed of an intrinsic CMOS transistor. To have a small signal loss, the 
spectral space between the RF-bands needs to be large enough, as explained in [73]. Therefore, 
we trade off the transformer size and the carrier frequency to implement the data communication 
reconfigurability in our proposed 3D QBI communication. 
The proposed quad-band transformer is shown in Figure 4-5 (a) and (b). The three 




metal resistivity in the utilized CMOS process are employed to implement the proposed 
transformer to provide a high-quality factor (Q) and reduce RF-band signal loss. Each 
transformer is designed using both a high-frequency structure simulator (HFSS) and momentum. 
The size of each transformer is optimized to extract each desired RF-band signal and reject the 
unwanted out-of-band signals. The carrier frequencies of the three RF-bands are 7 GHz (RF1), 14 
GHz (RF2), and 20 GHz (RF3). 
Figure 4-5 (c) shows the working mechanism of the transformer for the BB and RF-
bands. Common-mode (CM) signaling and differential-mode (DM) signaling are used for BB 
and RF-band signals, respectively. At low frequency (i.e., BB), the transformer works as a short 
circuit. The single-ended BB signal is connected to the center tap of the transformer secondary 
coil to minimize the signal skew between the two signal paths of the transformer. Each 
differential RF-band signal is injected into each primary coil of the transformer and inductively 
coupled to the secondary coil of the transformer. 
The four configurations of the data communication through the QBI memory I/O 
interface are shown in Figure 4-6, Figure 4-7, Figure 4-8, and Figure 4-9. Each data point of the 





Figure 4-6  3D QBI data communication configurations: BB. 
 
Figure 4-7  3D QBI data communication configurations: RF1. 
 





Figure 4-9  3D QBI data communication configurations: RF3.  
4.4 3D Quad-Band Interconnect (QBI) Memory Channel Loss and Latency 
Figure 4-10 shows the concept of the proposed 3D QBI memory I/O interface and a 
conventional point-to-point 2D memory I/O interface. In the proposed 3D QBI memory I/O 
interface, the QBI transceiver is connected face-to-face using a short 3D µbump interconnect. In 
the traditional 2D memory I/O interface, a 10 cm to 20 cm FR-4 transmission line (T-Line) is 
used as the interconnect. The signal loss, latency, and system size are significantly reduced by 
using the short 3D µbump interconnect in the proposed 3D QBI interface. 
 
Figure 4-10  Diagram of (a) the proposed two-tier 3D QBI memory I/O interface and (b) the 




4.4.1 Channel Loss 
Figure 4-11 (a) shows the face-to-face 3D µbump technology used in the proposed 3D 
QBI. The 3D EM solver, HFSS, is utilized to model and evaluate the accurate signal loss of the 
3D µbump interconnects used in the 3D QBI. The HFSS model of the 3D µbump interconnect is 
shown in Figure 4-11 (b). For reliable and cheap 3D stacking implementation, a 3D µbump with 
100 µm diameter and 200 µm pitch is used. Figure 4-11 (c) shows the simulated signal loss of 
the 3D µbump interconnect and a 10 cm length FR-4 T-Line (including the pads and bonding 
wires). The S21 plot shows that the signal loss of the 3D µbump channel is very small (i.e., 10 
dB less signal loss at 5 GHz). 
 
Figure 4-11  3D QBI µbump model and signal loss simulation using HFSS. (a) 3D QBI face-to-
face configuration. (b) 3D QBI µbump HFSS model. (b) The simulated signal loss of the 3D 





Figure 4-12 (a) shows the simulated latency of the key components in the proposed 3D 
QBI memory I/O interface. Table 4.1 shows the simulated data propagation latency comparison 
between the proposed QBI and prior work 2D DBI [21]. The latency of the I/O data can be 
significantly reduced (i.e., 11 times smaller than prior works [21]) due to the inherent advantage 
of 3D chip stacking. 
 
Figure 4-12  Simulated latency of the QBI transceiver. 
Table 4.1  Simulated latency comparison with DBI [22]. 
 DBI [21] This work 
Technology 65 nm CMOS 180 nm CMOS 
Interconnect type FR-4 T-Line 3D µbump 
Interconnect length 10 cm 100 µm 
Interconnect latency 610 ps 10 ps 
Transformer latency 2 ps 45 ps 




a: b Total latency=interconnect latency + transformer latency 
 
4.4.3 Monte-Carlo Simulation Results of the key performance 
Monte Carlo simulation is performed to evaluate the high-volume manufacturing (HVM) 
variation's impact on our 3D QBI by considering both process variation and mismatch. The key 
performance, including latency, power efficiency, and output data eye width, is evaluated. Figure 
4-13 shows the Monte Carlo simulation results. The process variation and mismatch’s impact on 
the key performance of latency (i.e., 56.4 ps @ 3σ), power efficiency (i.e., 1.2 pJ/b @ 3σ), and 
output data eye width (i.e., 0.09 UI @ 3σ) might be negligible. 
 
Figure 4-13  The Monte Carlo simulation results. (a) latency; (b) power efficiency; (c) output 
data eye width. 
4.5 Chip Measurement 
In this section, the 3D QBI implementation, including die preparation, die-level face-to-
face stacking, and chip-on-board assembly, is described. The experimental setup and test results, 
including the data rate, BER, and output jitter, are presented and discussed. 
4.5.1 Die Preparation 
The 3D QBI prototype is designed and fabricated in a 180 nm CMOS process. Figure 




(including the inductors and transformers and excluding pads; two designs are on the same dies) 
are 1.77 mm2 and 1.4 mm2, respectively. The 3D µbump pads (in the white dotted-line box) are 
placed around the transceiver with high-precision 3D alignment. Most of them are placed to the 
periphery to evenly distribute any potential mechanical stress in die-stacking. In detail, the 3D 
µbump pads for supply voltage and digital control signals are placed to the periphery. The 3D 
µbump pads for critical I/Os such as the QBI modulated signals are placed close to the 
corresponding circuit layout to reduce any parasitic effects in the route. To improve signal 
integrity, the high-speed signal and DC signal µbump pads are properly placed. An outer ring of 
bonding pads is also disposed with an optimal wire-bonding pitch. 
 
Figure 4-14  Die photos of the 3D QBI transceiver. (a) QBI RX and (b) QBI TX. The TXs and 
RXs for different bands are marked by black, green, blue, and red solid-line boxes, respectively. 




4.5.2 3D QBI Assembly 
To verify the 3D QBI performance, a dedicated four-layer test board with 3D die-
stacking and chip-on-board assembly is implemented, as shown in Figure 4-15. The 3D QBI 
assembly can be described as follows. First, the bottom die is placed on the test board with a 
typical chip-on-board assembly. Gold bonding wires are used to connect all the signal pads of the 
3D QBI bottom die to the test board, as shown in Figure 4-16 (c). Second, the top die is aligned 
and stacked face-to-face on the bottom die using high-precision 3D µbump connections. Figure 
4-16 (a) and (b) show the side and top view of the 3D QBI die-stacking. Finally, the 3D QBI dies 
and bond wires are encapsulated to fully protect the system from mechanical damage and 
contaminants, as shown in Figure 4-17. Furthermore, to improve power integrity and signal 
integrity, decoupling capacitors are added, and differential T-Lines with a ground shield are used 
for the data streams in the test board design. 
 





Figure 4-16  (a) 3D QBI die-stack side view (µbumps are marked by a white dotted-line box). (b) 
QBI die-stack top view. (c) 3D QBI wire bonding. 
 
 
Figure 4-17  3D QBI test board. 
4.5.3 3D QBI Test 
4.5.3.1 Test setup 
Figure 4-18 demonstrates the test setup and reconfigurable data communication test for 




[74]) is used to provide input data streams and measure the eye diagrams and BERs to verify the 
signal integrity of the proposed 3D QBI. 
 
Figure 4-18  (a) 3D QBI test setup. (b) 3D QBI reconfigurable data communication test 
demonstration. The BERT scope is used to generate the input data stream and measure the BER 
and eye diagram. DC generators are used to provide the supply. 
4.5.3.2 Test results 
The BERT scope provides pseudorandom binary sequences (PRBSs) to measure the 
proposed 3D QBI performance.  
The measured eye diagrams are shown in Figure 4-19. The measured data rates are 2 
Gb/s (BB), 2.3 Gb/s (RF1, 7 GHz), 2.5 Gb/s (RF2, 14 GHz), and 3 Gb/s (RF3, 20 GHz), 
respectively. The measured BER of the QBI is less than 10-15 for each band. The measured 
energy efficiencies of the QBI are 5.9 pJ/b (BB, 2 Gb/s), 6.2 pJ/b (RF1, 2.3 Gb/s), 7.4 pJ/b (RF2, 





Figure 4-19  Measured eye diagrams of 2 Gb/s (BB), 2.3 Gb/s (RF1, 7 GHz), 2.5 Gb/s (RF2, 14 
GHz) and 3 Gb/s (RF3, 20 GHz), respectively. The measured BER of the QBI is less than 10
-15 
for each band. 
Table 4.2 shows the proposed 3D QBI memory I/O interface performance compared to 
prior works using 2D [75], [76], [29], 2.5D [77], and 3D [41], [40], [49], [78] package solutions. 
Those works utilize state-of-the-art I/O technologies and implement BB-only signaling. The key 
benefit of the proposed 3D QBI is both BB and multiple RF-band signaling are utilized. For the 
BB I/O interface, those state-of-the-art solutions can be utilized. On top of the BB I/O interface, 
multiple RF-band I/O interfaces are added for flexible and reconfigurable communications 
between memory controller units and memories. This QBI technology can also be employed in 
next-generation artificial intelligence computing systems which need massively parallel or 


































28 65 28a  7b 25c  20c  
10/22F
FL 
10c  180 
Supply (V) 0.9 1.2 1 0.8 1.1/1.8 1.2 - 1.1 1.8 
Data rate 
(Gb/s) 
20 40 64 8 1.066 2.4 0.5 5 
2/2.3/2.5
/3d 






No No No No No No No No Yes 
Dimension 2D 2D 2D 2.5D 3D 3D 3D 3D 3D 
Latency 
(ns) 











0.54 9.23 4.9 0.56 3.5 - 0.2 - 
5.9/6.2/7
.4/8f 
area (mm2) 12 2.52 0.44 54.56 176.58 192 174 220 3.2 
a: FDSOI, b: FinFET, c: DRAM technology 
d: 2Gb/s @ BB, 2.3Gb/s @7GHz, 2.5Gb/s @14GHz, 3Gb/s @20GHz 
e: 0.63Gb/s/mm2 @ BB, 0.72Gb/s/mm2 @7GHz, 0.78Gb/s/mm2 @14GHz, 0.9Gb/s/mm2 
@20GHz 
f: 5.9pJ/b @ BB, 6.2pJ/b @7GHz, 7.4pJ/b @14GHz, 8 pJ/b @20GHz 
 
 Figure 4-20, Figure 4-21, Figure 4-22, and Figure 4-23 show the measured eye diagrams 
and the corresponding BER contours for BB, RF1, RF2, and RF3. The results of two different data 





Figure 4-20  The measured eye diagrams and BER contours of BB at 1.9Gb/s and 2Gb/s data 
rate. The measured BER is less than 10-15. 
 
Figure 4-21  The measured eye diagrams and BER contours of RF1 at 2Gb/s and 2.3Gb/s data 





Figure 4-22  The measured eye diagrams and BER contours of RF2 at 2Gb/s and 2.5Gb/s data 
rate. The measured BER is less than 10-15. 
 
Figure 4-23  The measured eye diagrams and BER contours of RF3 at 2.5Gb/s and 3Gb/s data 




Figure 4-24 shows an analysis of power consumption versus the supply voltage in RF3 at 
a 2Gb/s data rate. The power consumption decreases when we reduce the supply voltage from 
1.8 V to 1.3 V, which is as expected. Thus, an ultra-low-power 3D QBI link can be designed 
using the near-threshold voltage (NTV) technique to improve the I/O power efficiency in the 
future. 
 
Figure 4-24  Normalized power consumption with different supply voltages of D3 (RF3) at 2Gb/s 
data rate. 
 Figure 4-25 shows the minimum supply voltage needed for different data rates in RF3. 
The minimum supply voltage needed decreases when reducing the data rate. Thus, lowering the 
data rate to a point where the requirement can be adequately met can save power. 
 




4.5.3.3 Output Jitter analysis 
To analyze the output jitter of the proposed 3D QBI memory I/O interface, two tests are 
performed. First, the RF3 eye diagrams at data rates of 1 Gb/s to 3.2 Gb/s are measured as shown 
in Figure 4-26. There has been a gradual rise in the measured P-P jitter after 1 Gb/s, as expected, 
as shown in Figure 4-27. The other three bands have similar trends. When the data rate is 3 Gb/s, 
the measured P-P jitter is 0.4 UI, and the BER is still less than 10-15. 
 





Figure 4-27  Measured D3 (RF3) Peak-to-Peak jitter at different data rates.  
 
Figure 4-28  Measured eye diagrams and P-P jitters of 2 Gb/s (BB), 2.3 Gb/s (RF1), 2.5 Gb/s 





Table 4.3  P-P jitter comparison between the random input sequence of PRBS 2-7-1 and the 
repeating 1010 serial pattern. 
 P-P jitter (UI) PRBS2-7-1 (w/ ISI) 1010 (w/o ISI) 
BB @ 2Gb/s 0.17 0.11 
RF1 @ 2.3 Gb/s 0.5 0.05 
RF2 @ 2.5 Gb/s 0.58 0.05 
RF3 @ 3Gb/s 0.4 0.14 
 
Second, two different data streams, 1010 and PRBS, are applied to the QBI transmitter 
separately using the BERT scope. Figure 4-28 shows the measured eye diagrams of the four 
bands in the 3D QBI with the repeating 1010 input data stream. Table 4.3 shows the measured 
output P-P jitters. The QBI has no ISI and minimal DCD with the 1010 input data stream. The 
D3(RF) jitter increases slightly because the modulated RF3 signal suffers from higher signal loss 
in the signal path. In the PRBS input data stream case, the P-P jitter of each band is higher (6%, 
45%, 53%, and 26% higher in BB, RF1, RF2, and RF3, respectively) as expected due to the 
increase in ISI impact, as shown in Figure 4-19 and Figure 4-26. 
4.6 Conclusion 
This chapter presents a 3D quad-band interconnect (QBI) to enhance data 
reconfigurability, decrease latency, and reduce the pin count of the memory I/O interface. A 
novel quad-band transformer with high out-of-band suppression and superior band selectivity is 
proposed to achieve reconfigurability. The short 3D µbump interconnect is used to achieve 
enhanced signal integrity and reliable data communication. The proposed 3D QBI could be a 
promising solution for future mobile devices with applications that need dedicated parallel data 







3D 8-Pulse-Amplitude Modulation (PAM) Memory I/O Interface  
 
In this chapter, we present a novel 3D memory I/O transceiver using 8-PAM signaling. 
The 3D 8-PAM architecture and transceiver design are described in detail. The 3D µbump 
channel used in the 3D 8-PAM transceiver is characterized and described. The layout and 
simulation results are also provided.  
5.1 Introduction 
Data-intensive applications demand high data rate, low latency, and small form factor 
memory I/O interface to deal with a large amount of data movement. However, the traditional 
memory I/O interface using a long channel, as shown in Figure 5-1 (a), suffers from limited 
bandwidth, long latency, restricted pin count, and large form factor [68]. 
To meet the demand mentioned above, researchers have studied multiple approaches so 
far. In [21], multi-band signaling is studied. Both radio frequency band and baseband are 
incorporated to implement high bandwidth and reconfigurable memory I/O interface. However, 
this approach's multi-band transformer takes a large chip area and increases the total latency. In 
[79], the modulation scheme, PAM, which requires less bandwidth than the binary signaling (2-
PAM) for the same data rate, was utilized to achieve high data throughput with compact 
architecture and small area. However, the multi-level signal still suffers from channel loss and 




loss from the off-chip memory channel; In [29], both the PAM technique and equalization 
technique are used to overcome the signal loss and increase the data rate. However, those 
approaches may increase the design complexity and the total power consumption significantly. 
Most of the prior memory I/O interface works are using a traditional 2D on/off-chip 
memory channel. Recently, 2.5D/3D-IC designs, as shown in Figure 5-1 (b), are utilized to 
implement vertical chip stacks that can reduce channel loss, latency, and form factor. In [49], 
[41], and [40], wide I/O and high-bandwidth memory (HBM) are implemented using the 3D-IC 
technology. However, the promising 3D HBM only relies on the binary signal with limited 
bandwidth and cannot fully unitize the key advantage of increased TSV pin counts (i.e., 1024 
pins). In [80], a 3D-IC-based I/O with 4-PAM signaling can support over 8Gb/s/pin data rate.  
 
Figure 5-1  Diagram of (a) The traditional 2D memory I/O interface; (b) 3D memory I/O 
interface 
In this chapter, we propose a novel 3D 8-PAM I/O interface to further increase the 
bandwidth. Three data streams can simultaneously be transceived through a shared single-ended 
3D memory channel in the proposed 3D I/O interface. The 8-PAM signaling enables high data 
throughput with compact architecture; the short 3D channel reduces the I/O latency and 




interface can support up to 13.5 Gb/s/pin data rate with 724 ps latency and 3.4 pJ/b/pin power 
efficiency.   
5.2 3D 8-Pulse-Amplitude Modulation (PAM) Transceiver Architecture 
 
Figure 5-2  (a) The architecture of the proposed 3D 8-PAM memory I/O interface; (b) The 8-
PAM signaling enables high data throughput with compact architecture; (c) The 3D technique is 
utilized to reduce system latency, loss, and size. A 3D 8-PAM die-stack can be implemented 
using face-to-face assembly and µbump. 
Figure 5-2 shows the 3D 8-PAM I/O interface architecture. It consists of one 8-PAM 
transmitter (TX), one 8-PAM receiver (RX), and a 3D µbump channel, as shown in Figure 5-2 
(a). The 8-PAM transmitter encodes the three input binary data, generates an 8-level signal, and 
sends it to the 8-PAM receiver through a short single-ended 3D µbump memory channel. The 8-
PAM receiver then receives the 8-level PAM signal and recovers each original data. A reference 
clock signal is simply used to re-time each data at the transmitter and receiver for secured 




Two techniques, 8-PAM signaling as shown in Figure 5-2 (b) and 3D-IC as shown in 
Figure 5-2 (c), are used in the proposed 3D 8-PAM I/O interface to integrate their key 
advantages and meet the high data rate, low latency, and small form factor memory I/O interface 
demand. The key challenging factors in designing the 8-PAM transceiver over a 2D channel are 
the voltage amplitude reduction of each symbol level and the severely signal integrity 
degradation after the channel. The small 8-PAM signal amplitude demands a receiver with 
relatively high input sensitivity. 
To overcome these challenges, we use three approaches. First, a voltage-mode driver is 
utilized in the transmitter to obtain the maximum transmitter output swing. Second, a short 3D 
channel is used to minimize the channel attenuation and achieve a faster transition of 8-PAM 
signaling. Finally, differential amplifiers are utilized as the receiver's comparators to provide 
better immunity to external noises and higher input voltage sensitivity. As a result, the original 




5.3 3D 8-Pulse-Amplitude Modulation (PAM) Transceiver Design 
5.3.1 3D 8-Pulse-Amplitude Modulation (PAM) Transmitter Design 
 
Figure 5-3  (a) The architecture of the proposed 3D 8-PAM transmitter. It is composed of an 
encoder, eight DFFs, and an 8-PAM driver; (b) Encoder truth table; Schematic of (c) Encoder; 
(d) DFF; (e) 8-PAM driver. 
Figure 5-3 shows the 3D 8-PAM transmitter. The 8-PAM TX consists of an encoder, 
eight D flip-flops (DFF), and an 8-PAM driver. Gray-code mapping is used in the transmitter. In 
gray coding, only one-bit changes on each transition, which minimizes the bit error. The three-
input data streams, B1 in – B3 in, are encoded by the encoder, as shown in Figure 5-3 (c), to eight 
control signals, C1-C8, with a truth table shown in Figure 5-3 (b). An 8-bit re-timer circuit (i.e., a 
DFF, shown in Figure 5-3 (d)) is utilized to synchronize these eight control signals before the 8-
PAM driver, which reduces the latency skew. An 8-level push-pull voltage-mode (VM) driver 




voltage supplies are used to achieve simplicity in the 8-PAM transmitter design. Finally, the 8-
level PAM signal is sent to the 3D µbump channel. 
5.3.2 3D 8-Pulse-Amplitude Modulation (PAM) Receiver Design 
 
Figure 5-4  The architecture of the proposed 3D 8-PAM receiver. It is composed of a 
comparator, a circuit block for differential to single-ended conversion (i.e., D-to-S converter), a 
DFF for re-timing operation, a decoder, and an output driver. 
Figure 5-4 shows the proposed 3D 8-PAM receiver schematic diagram. The 8-PAM RX 
comprises a comparator, a circuit block for differential to single-ended conversion (i.e., D-to-S 
converter), a DFF for re-timing operation, a decoder, and an output driver. The received 8-level 
PAM signal is compared to the reference voltages in the comparator, and a 7-bit thermometer 
code is generated. Then the thermometer code is synchronized by the re-timing DFF block. A 
high-speed and low-power DFF [82] is used for faster rising and falling voltage switching, as 
shown in Figure 5-5 (a). Finally, as shown in Figure 5-5 (b), a decoder is utilized to recover the 





Figure 5-5  Schematic of the proposed 3D 8-PAM receiver blocks: (a) DFF; (b) Decoder. 
 
Figure 5-6  Schematic of the proposed 3D 8-PAM receiver blocks: (a) NMOS differential pair; 
(b) PMOS differential pair; (c) Complementary pseudo-differential pair; (d) Differential to 
single-ended converter (D-to-S converter). 
Figure 5-6 shows the comparators used in the proposed 3D 8-PAM receiver. A 
differential amplifier is used as the comparator for its better noise immunity and higher input 
voltage sensitivity. Seven stable reference voltages are provided from a reference voltage 




input common-mode range is needed for the differential amplifiers. In the proposed 3D 8-PAM 
receiver, we utilize a combination of differential pair configurations, including NMOS 
differential pair shown in Figure 5-6 (a), PMOS differential pair shown in Figure 5-6 (b), and 
complementary input pseudo-differential pair shown in Figure 5-6 (c) [83]. Each differential 
amplifier is designed and optimized to meet the comparator performance requirement. A 
differential to singled-end converter is used after the comparator, as shown in Figure 5-6 (d), to 
get the CMOS logic levels. 
5.4 3D 8-Pulse-Amplitude Modulation (PAM) Memory Channel Modeling 
 
Figure 5-7  3D 8-PAM µbump model and signal loss simulation using HFSS. The on-chip 3D 
pad is also modeled for high accuracy. 
A 3D electromagnetic (EM) simulator (i.e., HFSS) is used to acquire an accurate S-
parameter of the 3D µbump memory channel, as shown in Figure 5-7. For high accuracy, the on-
chip pad is also modeled and added to our 3D memory channel model. The µbump diameter is 
50 µm, the pitch is 104 µm, and the height is 35 µm. The simulated 3D µbump memory channel 
signal loss is -0.3 dB at 15GHz. Compared to a 2D package with a long FR4 memory channel, 
our memory channel signal loss of 3D µbump is much smaller (i.e., a 5 cm 2D FR4 channel has 




5.5 3D 8-Pulse-Amplitude Modulation (PAM) Memory I/O Interface Layout and Assembly 
 
Figure 5-8  (a) 8-PAM RX layout; (b) 8-PAM TX layout; (c) 3D 8-PAM assembly diagram. 
The proposed 3D 8-PAM prototype is designed and simulated in a 65 nm CMOS 
technology. The 8-PAM TX and RX layout is shown in Figure 5-8 (a) and (b). The chip areas of 
the TX and RX are 0.3 mm2 and 0.78 mm2, respectively. The 3D µbump pads are shown in the 
white dash boxes. The size is 39 µm x 52 µm. The pitch is 104 µm to maintain the keep-out zone 
for better signal integrity. The 3D µbump pads are placed with accurate alignment, and the signal 
signals are isolated by the DC signals for better signal integrity. The 2D pads are placed on the 
three edges of the RX layout with an optimal wire-bonding pitch. Enough room between 3D pads 
and 2D pads is left for wire bonding. The 3D 8-PAM transceiver is designed for a 2-layer face-
to-face die stack, as shown in Figure 5-8 (c). The bottom die can be assembled on the test board 




die can be flipped, aligned, and stacked on top of the bottom die using the 3D µbump 
connections.   
5.6 Simulated Results 
The advantage of minimizing bit error by using gray-code mapping in the 8-PAM 
transmitter can be explained as follow. In the 8-PAM receiver, a comparator with a reference 
voltage, which is the middle level between neighboring symbol levels, is used to recover the 
symbol. Then, a decoder is utilized to convert the thermometer code to the original data. Thus, 
the symbol error most likely occurs when a symbol level is confused with its neighboring levels. 
Comparator metastability errors and sparkles in thermometer code are two possible causes of 
symbol errors. Gray coding can suppress metastability and sparkles, which reduces symbol errors 
as explained in [85]. Moreover, when a symbol error occurs, one symbol error can cause one or 
more bit errors depending on the mapping of the bits to the symbols. In gray coding, only one-bit 
changes on each transition. In other words, one symbol error only causes one bit error if gray 
coding is used to map the bits to the symbols. Therefore, the bit error is minimized by using 
gray-code mapping.  
To verify the decrease of bit error by using gray-code mapping, simulations are done to 
compare the bit errors with gray-code mapping and binary-code mapping when a symbol error 
exists, as shown in Figure 5-9 (a) and (b). The reference voltage vref6 is intentionally set to a 
value away from the middle level of symbol levels 5 and 6 to induce a symbol error for the 
simulation. When the vref6 value is away from the middle level of the neighboring symbol 
levels, only one bit, B2, will have errors in gray-code mapping, and two bits, B1 and B2, will have 




compensate the system delays for the random input data, and XOR gates are used to compare the 
delayed input data and the recovered data, as shown in Figure 5-9 (c).  
Figure 5-10 and Figure 5-11 show the simulation results of gray-code mapping and 
binary-code mapping, including the random input data, D1 in - D3 in, the recovered data, B1 out – 
B3 out, and the bit error, B1 error – B3 error.  As we can see from the simulation results, only B2 
has errors in gray-code mapping, while in binary-code mapping, both B1 and B2 have errors.
 
 
Figure 5-9  (a) Gray-code mapping with symbol errors, (b) Binary-code mapping with symbol 





Figure 5-10  Simulation results of the bit errors with gray-code mapping. 
 





Figure 5-12  The simulated waveforms of the proposed 3D 8-PAM memory I/O interface at the 
worst case (SS corner). (a) the recovered data (4.5 Gb/s per data); (b) the 8-level PAM signal and 
its eye-diagram (13.5Gb/s/pin); (c) the original data (4.5 Gb/s per data). 
Figure 5-12 shows the simulated waveforms of the proposed 3D 8-PAM memory I/O 
interface at the SS corner. Each of the original data and recovered data is at 4.5 Gb/s, and the 8-





Figure 5-13  The simulated latency of the proposed 3D 8-PAM transceiver at SS corner: latency 
breakdown of the transceiver and I/O channel. 
Table 5.1  Simulated latency comparison to [85]. 
 This Work [86] 
Technology 65nm CMOS 130nm CMOS 
Interconnect  3D µbump On-chip channel 
Interconnect length  35 µm 5 mm 
Total delay 724ps 1.5ns 
 
Figure 5-13 shows the simulated latency of the proposed 3D 8-PAM transceiver at the SS 
corner. The latency of each component is listed and illustrated. The total latency is 724 ps. The 
3D µbump latency is 0.3 ps which only takes 0.04% of the total latency. As a result, the total 





Figure 5-14  Simulated latency mismatch by using Monte Carlo simulations of the proposed 3D 
8-PAM transceiver. (a) the FF corner; (b) the TT corner; (c) the SS corner.  
Figure 5-14 shows the Monte Carlo simulation results of the proposed 3D 8-PAM 
transceiver latency mismatch at FF, TT, and SS corners. The max mismatch variation across 
corners is only 0.72%. The mismatch impact on the proposed 8-PAM I/O interface latency is 
negligible (i.e., 3.8 fs @ 3σ for FF corner, 2.1 ps @ 3σ for TT corner, and 5.2 ps @3σ for SS 
corner). 
 
Figure 5-15  The simulated eye diagrams of the recovered three data at 13.5 Gb/s/pin (4.5 Gb/s 
per data) at the worst corner (SS corner). 
Figure 5-15 shows the simulated eye diagrams for the recovered three 2-PAM data. The 
simulated data rate is 4.5 Gb/s for each data. The energy-efficient is 3.4 pJ/b with 1.2 V supply 




output driver). Table 5.2 shows the proposed 3D 8-PAM memory I/O interface performance 
compared to the prior works using 3D [49], [40], 2.5D [2], and 2D [76], [28], [87], [53] package 
solutions. The 3D wide-IO2 [49], HBM2 [40] and 2.5D [2] utilize only 2-PAM signaling. The 
proposed 3D 8-PAM uses 8-level signaling, achieves higher data throughput, and enables 
multiple data communication concurrently. Compared to the 2D package solutions, the 3D 8-
PAM memory interface can triple the data rate with less power consumption. 



































Dimension  3D 3D 3D 2.5D 2D 2D 2D 2D 
Technology 65nm 25nma 20nma 7nmb 65nm 7nm 0.5µmc 180nm 
Supply (V) 1.2 1.1/1.8 1.2/2.5 0.8 1.2 1.2 3.3 2/3.5 
Latency (ps) 615 - - - - - - - 
Chip area 
(mm2) 




3.4 3.5 - 0.56 9.23 6.9 400 42.5 
a: DRAM, b: FinFET, c: digital CMOS, d: total area of two dies           
 
5.7 Conclusions 
This chapter presents a 3D 8-PAM memory I/O interface to increase the data rate, 
decrease latency, and provide a small form factor. The 8-PAM signaling is used to achieve high 




latency and enhance signal integrity. The proposed 3D 8-PAM could be a promising solution for 





Conclusions and Future Work  
 
6.1 Conclusions 
 This dissertation presents two novel memory I/O interfaces using 3D integration, QBI, 
and 8-PAM to increase throughput, reduce latency, enhance data reconfigurability, and reduce 
form factor. A novel quad-band transformer with high out-of-band suppression and superior 
band selectivity is proposed to achieve reconfigurability in the 3D QBI memory I/O interface. 
The 8-PAM signaling is used to increase the throughput in the 3D 8-PAM memory I/O interface. 
The short 3D μbump interconnect is used to achieve enhanced signal integrity, low latency, and 
small form factor.  
 




The proposed 3D QBI and 3D 8-PAM could be promising solutions for future mobile 
devices with data-intensive applications (i.e., artificial intelligence, virtual reality, and 
autonomous vehicle systems), as shown in Figure 6-1.   
6.2 Future Work 
A 3D multilevel quad-band interconnect (MQBI) memory I/O interface consisting of 
three RF-band transceivers using ASK modulation and one baseband using 8-PAM, as shown in 
Figure 6-2, will be designed in the future to achieve higher performance. A novel quad-band 
transformer with higher out-of-band suppression and a better selective matching network will be 
designed for signal integrity enhancement. An ultra-low-power 3D QMI link design using near-
threshold voltage (NTV) design techniques in 28nm CMOS technology will be designed to 
improve the I/O power efficiency. Furthermore, an improved 3D µbump geometry with less 












[1] S. Yin et al., “An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-
Weight Neural Networks with Flexible Data Bit Width,” IEEE J. Solid-State Circuits, vol. 
54, no. 4, pp. 1120–1136, 2019, doi: 10.1109/JSSC.2018.2881913. 
[2] M.-S. Lin et al., “A 7-nm 4-GHz Arm1-Core-Based CoWoS1 Chiplet Design for High-
Performance Computing,” IEEE J. Solid-State Circuits, vol. 55, no. 4, pp. 956–966, 2020, 
doi: 10.1109/JSSC.2019.2960207. 
[3] K. W. Sohn, K., Na, T., Song, I., Shim, Y., Bae, W., Kang, S., Lee, D., Jung, H., Hyun, S., 
Jeoung, H. and Lee, “A1.2 V30nm3.2 Gb/s/pin 4Gb DDR4 SDRAM With Dual-Error 
Detection and PVT-Tolerant Data-Fetch Scheme,” IEEE J. Solid-State Circuits, vol. 48, 
no. 1, pp. 168–177, 2013. 
[4] K. Koo et al., “A 1.2V 38nm 2.4Gb/s/pin 2Gb DDR4 SDRAM with bank group and x4 
half-page architecture,” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., vol. 55, pp. 
40–41, 2012, doi: 10.1109/ISSCC.2012.6176869. 
[5] D. Kim et al., “A 1.1-V 10-nm class 6.4-Gb/s/pin 16-Gb DDR5 SDRAM With a Phase 
Rotator-ILO DLL, High-Speed SerDes, and DFE/FFE Equalization Scheme for Rx/Tx,” 
IEEE J. Solid-State Circuits, vol. 55, no. 1, pp. 167–177, 2020. 
[6] D. Kim et al., “A 1.1V 1ynm 6.4Gb/s/pin 16Gb DDR5 SDRAM with a Phase-Rotator-
Based DLL, High-Speed SerDes and RX/TX equalization scheme,” 2019 IEEE Int. Solid- 




[7] K. I. K. Oh et al., “A 5-Gb/s/pin transceiver for DDR memory interface with a crosstalk 
suppression scheme,” IEEE J. Solid-State Circuits, vol. 44, no. 8, pp. 2222–2232, 2009, 
doi: 10.1109/JSSC.2009.2022303. 
[8] P. W. Chiu, S. Kundu, Q. Tang, and C. H. Kim, “A 65-nm 10-Gb/s 10-mm On-Chip Serial 
Link Featuring a Digital-Intensive Time-Based Decision Feedback Equalizer,” IEEE J. 
Solid-State Circuits, vol. 53, no. 4, pp. 1203–1213, 2018, doi: 
10.1109/JSSC.2017.2774276. 
[9] S. Jeon et al., “A 20Gb/s transceiver with framed-pulsewidth modulation in 40nm 
CMOS,” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., vol. 61, pp. 270–272, 
2018, doi: 10.1109/ISSCC.2018.8310288. 
[10] S. Saxena et al., “A 2.8 mW/Gb/s, 14 Gb/s Serial Link Transceiver,” IEEE J. Solid-State 
Circuits, vol. 52, no. 5, pp. 1399–1411, 2017, doi: 10.1109/JSSC.2016.2645738. 
[11] J. Kim, H. Hatamkhani, and C. K. K. Yang, “A large-swing transformer-boosted serial 
link transmitter with > v DD swing,” IEEE J. Solid-State Circuits, vol. 42, no. 5, pp. 
1131–1142, 2007, doi: 10.1109/JSSC.2007.894821. 
[12] J. F. Buckwalter, M. Meghelli, D. J. Friedman, and A. Hajimiri, “Phase and amplitude 
pre-emphasis techniques for low-power serial links,” IEEE J. Solid-State Circuits, vol. 41, 
no. 6, pp. 1391–1398, 2006, doi: 10.1109/JSSC.2006.874270. 
[13] R. Navid et al., “A 40 Gb/s Serial Link Transceiver in 28 nm CMOS Technology,” IEEE 





[14] A. Shafik, E. Z. Tabasy, S. Cai, K. Lee, S. Hoyos, and S. Palermo, “A 10Gb/s hybrid 
ADC-based receiver with embedded 3-tap analog FFE and dynamically-enabled digital 
equalization in 65nm CMOS,” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., vol. 
58, pp. 62–63, 2015, doi: 10.1109/ISSCC.2015.7062926. 
[15] N. Kalantari and J. F. Buckwalter, “A multichannel serial link receiverwith dual-loop 
clock-and-data recovery and channel equalization,” IEEE Trans. Circuits Syst. I Regul. 
Pap., vol. 60, no. 11, pp. 2920–2931, 2013, doi: 10.1109/TCSI.2013.2256172. 
[16] E. H. Chen, R. Yousry, and C. K. K. Yang, “Power optimized ADC-based serial link 
receiver,” IEEE J. Solid-State Circuits, vol. 47, no. 4, pp. 938–951, 2012, doi: 
10.1109/JSSC.2012.2185356. 
[17] K. L. J. Wong, E. H. Chen, and C. K. K. Yang, “Edge and data adaptive equalization of 
serial-link transceivers,” IEEE J. Solid-State Circuits, vol. 43, no. 9, pp. 2157–2169, 2008, 
doi: 10.1109/JSSC.2008.2001876. 
[18] B. Zhang et al., “A 28 Gb/s Multistandard Serial Link Transceiver for Backplane 
Applications in 28 nm CMOS,” IEEE J. Solid-State Circuits, vol. 50, no. 12, pp. 3089–
3100, 2015, doi: 10.1109/JSSC.2015.2475180. 
[19] F. Spagna et al., “A 78mW 11.8Gb/s serial link transceiver with adaptive RX equalization 
and baud-rate CDR in 32nm CMOS,” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits 
Conf., vol. 53, pp. 366–367, 2010, doi: 10.1109/ISSCC.2010.5433823. 
[20] H. Shin, Z. Xu, and M. F. Chang, “RF-interconnect for multi-Gb/s digital interface based 
on 10-GHz RF-modulation in 0.18μm CMOS,” IEEE MTT-S Int. Microw. Symp. Dig., vol. 




[21] G. S. Byun, Y. Kim, J. Kim, S. W. Tam, and M. C. F. Chang, “An energy-efficient and 
high-speed mobile memory I/O interface using simultaneous bi-directional dual (base+ 
RF)-band signaling,” IEEE J. Solid-State Circuits, vol. 47, no. 1, pp. 117–130, 2012, doi: 
10.1109/JSSC.2011.2164709. 
[22] Y. Kim et al., “An 8Gb/s/pin 4pJ/b/pin Single-T-Line Dual (Base+RF) Band 
Simultaneous Bidirectional Mobile Memory I/O Interface with Inter-Channel Interference 
Suppression,” ISSCC, pp. 488–490, 2012. 
[23] S. W. Tam, E. Socher, A. Wong, and M. C. F. Chang, “A simultaneous tri-band on-chip 
rf-interconnect for future network-on-chip,” 2009 Symp. VLSI Circuits, pp. 90–91, 2009. 
[24] S. W. Tam, M. F. Chang, and J. Kim, “Wireline and wireless RF-Interconnect for next 
generation SoC systems,” IEEE 54th Int. Midwest Symp. Circuits Syst., pp. 1–3, 2011. 
[25] W. H. Cho et al., “A 5.4-mW 4-Gb/s 5-band QPSK transceiver for frequency-division 
multiplexing memory interface,” Proc. Cust. Integr. Circuits Conf., vol. 2015-Novem, pp. 
5–8, 2015, doi: 10.1109/CICC.2015.7338373. 
[26] J. Ko and F. Chang, “An RF/baseband FDMA interconnect transceiver for reconfigurable 
multiple access chip to chip communication,” ISSCC, pp. 21–23, 2005. 
[27] L. Wang, Y. Fu, M. A. LaCroix, E. Chong, and A. Chan Carusone, “A 64-Gb/s 4-PAM 
transceiver utilizing an adaptive threshold ADC in 16-nm FinFET,” IEEE J. Solid-State 
Circuits, vol. 54, no. 2, pp. 452–462, 2019, doi: 10.1109/JSSC.2018.2877172. 
[28] M. Lacroix et al., “A 60Gb/s PAM-4 ADC-DSP Transceiver in 7nm CMOS with SNR-




Circuits Conf., pp. 114–116, 2019, doi: 10.1109/ISSCC.2019.8662322. 
[29] E. Depaoli et al., “A 64 Gb/s low-power transceiver for short-reach PAM-4 electrical links 
in 28-nm FDSOI CMOS,” IEEE J. Solid-State Circuits, vol. 54, no. 1, pp. 6–17, 2019, 
doi: 10.1109/JSSC.2018.2873602. 
[30] K. Zheng et al., “An Inverter-Based Analog Front-End for a 56-Gb/s PAM-4 Wireline 
Transceiver in 16-nm CMOS,” IEEE Solid-State Circuits Lett., vol. 1, no. 12, pp. 249–
252, 2018, doi: 10.1109/LSSC.2019.2894933. 
[31] S. Moazeni et al., “A 40Gb / s PAM-4 Transmitter Based on a Ring- Resonator Optical 
DAC in 45nm SOI CMOS,” IEEE Int. Solid-State Circuits Conf., pp. 486–488, 2017. 
[32] Y. Frans et al., “A 56-Gb/s PAM4 Wireline Transceiver Using a 32-Way Time-
Interleaved SAR ADC in 16-nm FinFET,” IEEE J. Solid-State Circuits, vol. 52, no. 4, pp. 
1101–1110, 2017, doi: 10.1109/JSSC.2016.2632300. 
[33] T. Shibasaki et al., “A 56Gb/s NRZ-electrical 247mW/lane serial-link transceiver in 28nm 
CMOS,” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., vol. 59, pp. 64–65, 2016, 
doi: 10.1109/ISSCC.2016.7417908. 
[34] M. Erett et al., “A 126mW 56Gb/s NRZ wireline transceiver for synchronous short-reach 
applications in 16nm FinFET,” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., vol. 
61, pp. 274–276, 2018, doi: 10.1109/ISSCC.2018.8310290. 
[35] L. Y. Chen, S. Tao, and N. Verma, “A 3-D IC for Mitigating Energy of Memory 
Accessing and Data Movement in Accelerator-Based Streaming Architectures,” IEEE J. 





[36] D. H. Kim et al., “Design and analysis of 3D-MAPS (3D Massively parallel processor 
with stacked memory),” IEEE Trans. Comput., vol. 64, no. 1, pp. 112–125, 2015, doi: 
10.1109/TC.2013.192. 
[37] J. W. Kim, J.S., Oh, C.S., Lee, H., Lee, D., Hwang, H.R., Hwang, S., Na, B., Moon, J., 
Kim, J.G., Park, H. and Ryu, “A 1.2 V 12.8 GB/s 2 Gb Mobile Wide-I/O DRAM With 
4$\times $128 I/Os Using TSV Based Stacking,” IEEE J. Solid-State Circuits, vol. 47, no. 
August, pp. 107–116, 2011. 
[38] S. Lee et al., “A 0.83-pJ/bit 6.4-Gb/s HBM Base Die Receiver using a 45∘ Strobe Phase 
for Energy-Efficient Skew Compensation,” IEEE Trans. Circuits Syst. II Express Briefs, 
vol. 7747, no. c, pp. 1–1, 2019, doi: 10.1109/tcsii.2019.2947296. 
[39] D. U. Lee et al., “A 1.2 v 8 Gb 8-channel 128 GB/s high-bandwidth memory (HBM) 
stacked DRAM with effective I/O test circuits,” IEEE J. Solid-State Circuits, vol. 50, no. 
1, pp. 191–203, 2015, doi: 10.1109/JSSC.2014.2360379. 
[40] K. Sohn et al., “A 1.2 V 20 nm 307 GB/s HBM DRAM with at-speed wafer-level IO test 
scheme and adaptive refresh considering temperature distribution,” IEEE J. Solid-State 
Circuits, vol. 52, no. 1, pp. 250–260, 2017, doi: 10.1109/JSSC.2016.2602221. 
[41] C. S. Oh et al., “A 1.1V 16GB 640GB/s HBM2E DRAM with a Data-Bus Window-
Extension Technique and a Synergetic On-Die ECC Scheme,” IEEE Int. Solid-State 
Circuits Conf., pp. 330–332, 2020, doi: 10.1109/ISSCC19947.2020.9063110. 




ICs: Design Methods and Tools,” IEEE Trans. Comput. Des. Integr. Circuits Syst., vol. 
36, no. 10, pp. 1593–1619, 2017, doi: 10.1109/TCAD.2017.2666604. 
[43] V. Kumar and A. Naeemi, “An overview of 3D integrated circuits,” 2017 IEEE MTT-S 
Int. Conf. Numer. Electromagn. Multiphysics Model. Optim. RF, Microwave, Terahertz 
Appl. NEMO 2017, pp. 311–313, 2017, doi: 10.1109/NEMO.2017.7964270. 
[44] E. Beyne and B. Swinnen, “3D system integration technologies,” 2006 Int. Symp. VLSI 
Technol. Syst. Appl., pp. 180–182, 2006, doi: 10.1109/ICICDT.2007.4299568. 
[45] Z. Xu and J. Q. Lu, “Through-silicon-via fabrication technologies, passives extraction, 
and electrical modeling for 3-D integration/packaging,” IEEE Trans. Semicond. Manuf., 
vol. 26, no. 1, pp. 23–34, 2013, doi: 10.1109/TSM.2012.2236369. 
[46] C. C. Liu, I. Ganusov, M. Burtscher, and S. Tiwari, “Bridging the processor-memory 
performance gap with 3D IC technology,” IEEE Des. Test Comput., vol. 22, no. 6, pp. 
556–564, 2005, doi: 10.1109/MDT.2005.134. 
[47] G. Singh et al., “Near-memory computing: Past, present, and future,” Microprocess. 
Microsyst., vol. 71, p. 102868, 2019, doi: 10.1016/j.micpro.2019.102868. 
[48] W. R. Davis et al., “Demystifying 3D ICs: The pros and cons of going vertical,” IEEE 
Des. Test Comput., vol. 22, no. 6, pp. 498–510, 2005, doi: 10.1109/MDT.2005.136. 
[49] Y. J. Yoon et al., “An 1.1 V 68.2 GB/s 8Gb Wide-IO2 DRAM with non-contact 
microbump I/O test scheme,” IEEE Int. Solid-State Circuits Conf., pp. 320–322, 2016, 
doi: 10.1109/ISSCC.2016.7418036. 




equalization, offset cancellation and clock deskew,” Dig. Tech. Pap. - IEEE Int. Solid-
State Circuits Conf., vol. 47, no. 1, pp. 80–88, 2004, doi: 10.1109/isscc.2004.1332686. 
[51] J. L. Zerbe et al., “1.6 Gb/s/pin 4PAM signaling and circuits for a Multidrop Bus,” IEEE 
J. Solid-State Circuits, vol. 36, no. 5, pp. 752–760, 2001. 
[52] M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, “TETRIS: Scalable and efficient 
neural network acceleration with 3D memory,” Proc. Twenty-Second Int. Conf. Archit. 
Support Program. Lang. Oper. Syst., pp. 751–764, 2017, doi: 10.1145/3037697.3037702. 
[53] B. Song, S. Member, K. Kim, S. Member, J. Lee, and S. Member, “A 0.18-/spl mu/m 
cmos 10-gb/s dual-mode 10-pam serial link transceiver,” IEEE Trans. Circuits Syst. I 
Regul. Pap., vol. 60, no. 2, pp. 457–468, 2013. 
[54] R. Farjad-Rad, “A CMOS 4-PAM Multi-Gbps Serial Link transceiver,” Thesis, no. 
August, 2000. 
[55] G. Hariharan et al., “Reliability Evaluations on 3D IC Package beyond JEDEC,” Proc. - 
Electron. Components Technol. Conf., pp. 1517–1522, 2017, doi: 
10.1109/ECTC.2017.298. 
[56] K. Cho et al., “Signal Integrity Design and Analysis of Silicon Interposer for GPU-
Memory Channels in High-Bandwidth Memory Interface,” IEEE Trans. Components, 
Packag. Manuf. Technol., vol. 8, no. 9, pp. 1658–1671, 2018, doi: 
10.1109/TCPMT.2017.2779838. 
[57] S. Mick, J. Wilson, and P. Franzon, “4 Gbps high-density AC coupled interconnection,” 




[58] K. Kanda, D. D. Antono, K. Ishida, H. Kawaguchi, T. Kuroda, and T. Sakurai, 
“1.27Gb/s/pin 3mW/pin Wireless Superconnect (WSC) Interface Scheme,” IEEE Int. 
Solid-State Circuits Conf., pp. 133–140, 2003. 
[59] N. Miura, D. Mizoguchi, Y. B. Yusof, T. Sakurai, and T. Kuroda, “Analysis and design of 
transceiver circuit and inductor layout for inductive inter-chip wireless superconnect,” 
IEEE Symp. VLSI Circuits, Dig. Tech. Pap., vol. 40, no. CIRCUITS SYMP., pp. 246–249, 
2004, doi: 10.1109/vlsic.2004.1346575. 
[60] T. Sakurai and T. Kuroda, “A 195Gb/s 1.2W 3D-Stacked Inductive Inter-Chip Wireless 
Superconnect with Transmit Power Control Scheme,” IEEE Int. Solid-State Circuits 
Conf., vol. 23, no. 12, pp. 14–16, 2005. 
[61] B. Black, D. W. Nelson, C. Webb, and N. Samra, “3D processing technology and its 
impact on iA32 microprocessors,” Proc. - IEEE Int. Conf. Comput. Des. VLSI Comput. 
Process., pp. 316–318, 2004, doi: 10.1109/iccd.2004.1347939. 
[62] “3D Electromagnetic Field Simulator for RF and Wireless Design.” 
https://www.ansys.com/products/electronics/ansys-hfss. 
[63] K. Okada et al., “A 60-GHz 16QAM/8PSK/QPSK/BPSK direct-conversion transceiver for 
IEEE802.15.3c,” IEEE J. Solid-State Circuits, vol. 46, no. 12, pp. 2988–3004, 2011, doi: 
10.1109/JSSC.2011.2166184. 
[64] W. Volkaerts, N. Van Thienen, and P. Reynaert, “An FSK plastic waveguide 
communication link in 40nm CMOS,” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits 




[65] H. Wang, M. H. Hung, Y. C. Yeh, and J. Lee, “A 60-GHz FSK transceiver with 
automatically-calibrated demodulator in 90-nm CMOS,” IEEE Symp. VLSI Circuits, Dig. 
Tech. Pap., pp. 95–96, 2010, doi: 10.1109/VLSIC.2010.5560338. 
[66] F. S. Lee and A. P. Chandrakasan, “A 2.5 nJ/bit 0.65 V pulsed UWB receiver in 90 nm 
CMOS,” IEEE J. Solid-State Circuits, vol. 42, no. 12, pp. 2851–2859, 2007, doi: 
10.1109/JSSC.2007.908723. 
[67] X. Wang and G.-S. Byun, “A 3D Reconfigurable Memory I/O Interface Using a Quad-
band Interconnect,” IEEE Trans. Components, Packag. Manuf. Technol., vol. 3950, no. c, 
pp. 1–1, 2021, doi: 10.1109/tcpmt.2021.3073594. 
[68] Y. Kim et al., “Analysis of noncoherent ASK modulation-based RF-interconnect for 
memory interface,” IEEE J. Emerg. Sel. Top. Circuits Syst., vol. 2, no. 2, pp. 200–209, 
2012, doi: 10.1109/JETCAS.2012.2193511. 
[69] J. Kim, Hatamkhani, and C. K. K. H. and Yang, “A large-swing transformer-boosted 
serial link transmitter with >VDD swing,” IEEE J. Solid-State Circuits, vol. 42, no. 5, pp. 
1131–1142, 2007, doi: 10.1109/JSSC.2007.894821. 
[70] Y. Tomita, M. Kibune, J. Ogawa, W. W. Walker, H. Tamura, and T. Kuroda, “A 10-Gb/s 
receiver with series equalizer and on-chip ISI monitor in 0.11-/spl mu/m CMOS,” IEEE J. 
Solid-State Circuits, vol. 40, no. 4, pp. 986–993, 2005, doi: 10.1109/JSSC.2005.845563. 
[71] M. F. Chang et al., “CMP network-on-chip overlaid with multi-band RF-interconnect,” 





[72] N. Mirzaie, C. C. Lin, A. Alzahmi, and G. S. Byun, “Reliability-Aware 3-D Clock 
Distribution Network Using Memristor Ratioed Logic,” IEEE Trans. Compon. Packag. 
Manuf. Technol., vol. 9, no. 9, pp. 1847–1854, 2019, doi: 10.1109/TCPMT.2019.2900851. 
[73] S. W. Tam, E. Socher, M. C. F. Chang, J. Cong, and G. D. Reinman, Low Power 
Networks-on-Chip. Boston, MA: Springer, 2011. 
[74] “BERTScope, BERTScope S, BERTScope Si , and BERTScope SPG Signal Integrity 
Instruments.” https://www.ntecusa.com/docs/BERTScope_TS.pdf. 
[75] J. W. Poulton et al., “A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach 
Serial Link in 28 nm CMOS for Advanced Packaging Applications,” IEEE J. Solid-State 
Circuits, vol. 48, no. 12, pp. 3206–3218, 2013, doi: 10.1109/JSSC.2013.2279053. 
[76] X. Zheng, C. Zhang, F. Lv, F. Zhao, and S. Yuan, “A 40-Gb/s Quarter-Rate SerDes 
Transmitter and Receiver Chipset in 65-nm CMOS,” IEEE J. Solid-State Circuits, vol. 52, 
no. 11, pp. 2963–2978, 2017, doi: 10.1109/JSSC.2017.2746672. 
[77] M. S. Lin et al., “A 7-nm 4-GHz Arm-Core-Based CoWoS Chiplet Design for High-
Performance Computing,” IEEE J. Solid-State Circuits, vol. 55, no. 4, pp. 956–966, 2020, 
doi: 10.1109/JSSC.2019.2960207. 
[78] W. Gomes et al., “Lakefield and Mobility Compute: A 3D Stacked 10nm and 22FFL 
Hybrid Processor System in 12×12mm2, 1mm Package-on-Package,” IEEE Int. Solid-
State Circuits Conf., pp. 144–146, 2020, doi: 10.1109/ISSCC19947.2020.9062957. 
[79] R. Farjad-rad, C. K. Yang, M. A. Horowitz, and T. H. Lee, “A 0.3um CMOS 8Gb/s 4-





[80] J. Park et al., “A Novel Eye-Diagram Estimation Method for Pulse Amplitude Modulation 
with N-Level (PAM-N) on Stacked Through-Silicon Vias,” IEEE Trans. Electromagn. 
Compat., vol. 61, no. 4, pp. 1198–1206, 2019, doi: 10.1109/TEMC.2018.2843813. 
[81] J. Kim et al., “A 4-Gb / s / pin Low-Power Memory I / O Interface Using 4-Level 
Simultaneous Bi-Directional Signaling,” IEEE J. Solid-State Circuits, vol. 40, no. 1, pp. 
89–101, 2005. 
[82] R. Chandrasekaran, Y. Lian, and R. S. Rana, “A High-Speed Low-Power D Flip-Flop,” 
IEEE 6th Int. Conf. ASIC, vol. 1, pp. 152–155, 2005. 
[83] R. Singh, Y. Audet, Y. Gagnon, Y. Savaria, É. Boulais, and M. Meunier, “A laser-
trimmed rail-to-rail precision CMOS operational amplifier,” IEEE Trans. Circuits Syst. II 
Express Briefs, vol. 58, no. 2, pp. 75–79, 2011, doi: 10.1109/TCSII.2010.2104011. 
[84] M. Jalalifar and G. S. Byun, “An Energy-Efficient Mobile Memory I/O Interface Using 
Simultaneous Bidirectional Multilevel Dual-Band Signaling,” IEEE Trans. Circuits Syst. 
II Express Briefs, vol. 64, no. 8, pp. 897–901, 2017, doi: 10.1109/TCSII.2016.2614989. 
[85] B. Razavi, Principles of data conversion system design. 1994. 
[86] G. S. Byun and M. M. Navidi, “A low-power 4-PAM transceiver using a dual-sampling 
technique for heterogeneous latency-sensitive network-on-chip,” IEEE Trans. Circuits 
Syst. II Express Briefs, vol. 62, no. 6, pp. 613–617, 2015, doi: 
10.1109/TCSII.2014.2387615. 




CMOS,” IEEE J. Solid-State Circuits, vol. 37, no. 3, pp. 310–316, 2002, doi: 
10.1109/4.987082. 
 
 
