A LPDDR4 MEMORY CONTROLLER DESIGN WITH EYE CENTER DETECTION ALGORITHM by 홍기문
 
 
저 시-비 리- 경 지 2.0 한민  
는 아래  조건  르는 경 에 한하여 게 
l  저 물  복제, 포, 전송, 전시, 공연  송할 수 습니다.  
다 과 같  조건  라야 합니다: 
l 하는,  저 물  나 포  경 ,  저 물에 적 된 허락조건
 명확하게 나타내어야 합니다.  
l 저 터  허가를 면 러한 조건들  적 되지 않습니다.  
저 에 른  리는  내 에 하여 향  지 않습니다. 
것  허락규약(Legal Code)  해하  쉽게 약한 것 니다.  
Disclaimer  
  
  
저 시. 하는 원저 를 시하여야 합니다. 
비 리. 하는  저 물  리 목적  할 수 없습니다. 
경 지. 하는  저 물  개 , 형 또는 가공할 수 없습니다. 
 
PH.D. DISSERTATION 
 
 
 
A LPDDR4 MEMORY CONTROLLER DESIGN 
WITH EYE CENTER DETECTION ALGORITHM 
 
 
눈 중심 찾기 방법을 사용한  
LPDDR4 메모리 컨트롤러의 설계 
 
 
 
BY 
 
 
GI-MOON HONG 
 
 
 
 
 
 
 
FEBRUARY 2016 
 
 
 
 
 
 
DEPARTMENT OF ELECTRICAL ENGINEERING AND 
COMPUTER SCIENCE 
COLLEGE OF ENGINEERING 
SEOUL NATIONAL UNIVERSITY 
 
A LPDDR4 MEMORY CONTROLLER DESIGN  
WITH EYE CENTER DETECTION ALGORITHM 
 
 
 
눈 중심 찾기 방법을 사용한  
LPDDR4 메모리 컨트롤러의 설계 
 
 
지도교수   김 수 환 
 
이 논문을 공학박사 학위논문으로 제출함 
 
2016 년 2 월 
 
 
서울대학교 대학원 
 
전기컴퓨터 공학부 
 
홍  기  문 
 
 
 
홍기문의 공학박사 학위논문을 인준함 
 
2016 년 2 월 
 
 
 
위 원 장 :                  (印) 
부위원장 :                  (印) 
위    원 :                  (印) 
위    원 :                  (印) 
위    원 :                  (印)
I 
 
ABSTRACT 
 
 
 
A LPDDR4 MEMORY CONTROLLER DESIGN  
WITH EYE CENTER DETECTION ALGORITHM 
 
 
 
GI-MOON HONG 
DEPARTMENT OF ELECTRICAL ENGINEERING AND 
COMPUTER SCIENCE 
COLLEGE OF ENGINEERING 
SEOUL NATIONAL UNIVERSITY 
 
 
The demand for higher bandwidth with reduced power consumption in mobile 
memory is increasing. In this thesis, architecture of the LPDDR4 memory controller, 
operated with a LPDDR4 memory, is proposed and designed, and efficient training 
algorithm, which is appropriate for this architecture, is proposed for memory training and 
verification. 
The operation speed range of the LPDDR4 memory specification is from 533Mbps to 
4266Mbps, and the LPDDR4 memory controller is designed to support that range of the 
LPDDR4 memory. The phase-locked loop in the LPDDR4 memory controller is designed 
to operate between 1333MHz and 2133MHz. To cover the range of the LPDDR4 memory, 
the selectable frequency divider is used to provide operation clock. The output frequency 
II 
 
of the phase-locked loop with divider is from 266MHz to 2133MHz. The delay-locked loop 
in the LPDDR4 memory controller is designed to operate between 266MHz and 2133MHz 
with 180˚ phase locking. The delay-locked loop is used each training operation, which is 
command training, data read and write training. To complete training in each training stage, 
eye center detection algorithm is used. The circuits for the proposed eye center detection 
algorithm such as delay line, phase interpolator and reference generator are designed and 
validated. The proposed 1x2y3x eye center detection algorithm is 23 times faster than 
conventional two-dimensional eye center detection algorithm and it can be implemented 
simply. 
Using 65nm CMOS process, the proposed LPDDR4 memory controller occupies 
12mm2. The verification of the LPDDR4 memory controller is performed with commodity 
LPDDR4 memory. The verification of all training sequence, which is power on, initializing, 
boot up, command training, write leveling, read training, write training, is performed in this 
environment. The low voltage swing terminated logic driver and other several functions, 
including write leveling and data transmission, are verified at 4266Mbps and the entire 
LPDDR4 memory controller operations from 566Mbps to 1600Mbps are verified. The 
proposed eye center detection algorithm is verified from 566Mbps to 2843Mbps. 
 
 
Keywords: LPDDR4, mobile memory, memory controller, memory interface, transceiver, 
training algorithm, eye center detection 
 
Student Number: 2011-3026 
III 
 
CONTENTS 
 
 
 
ABSTRACT…………………………………………………………………….…………...I 
 
CONTENTS……………………………………………………………………………….III 
 
LIST OF FIGURES……………………….………………………………………………..VI 
 
LIST OF TABLES………………………………………………………………………...X 
 
CHAPTER 1 INTRODUCTION………………………………………………………..1 
1.1 MOTIVATION…………………..……………………………………...1 
1.2 INTRODUCTION…….…….………………………………………...…5 
1.3 THESIS ORGANIZATION…….…………………………………………...7 
 
CHAPTER 2  LPDDR4 MEMORY CONTROLLER DESIGN…………………………8 
2.1 DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY…………..8 
2.1.1 ARCHITECTURAL DIFFERENCE BETWEEN LPDDR3 AND 
LPDDR4 MEMORY.................................................................10 
2.1.2 SOURCE SYNCHRONOUS MATCHED SCHEME AND UNMATCHED 
SCHEME…………………………………………………...11 
2.1.3 LOW VOLTAGE SWING TERMINATED LOGIC DRIVER AND 
TERMINATION SCHEME…………………………………….. 12 
2.2 LPDDR4 MEMORY CONTROLLER SPECIFICATION……………….….15 
2.3 DESIGN PROCEDURE…………………………………………………..18 
 
CHAPTER 3 LPDDR4 MEMORY CONTROLLER ARCHITECTURE BASED ON 
MEMORY TRAINING….……………………………………………...20 
IV 
 
3.1 LPDDR4 MEMORY TRAINING SEQUENCE…………………………..20 
3.2 LPDDR4 MEMORY TRAINING EYE DETECTION ALGORITHM………..24 
3.2.1 EYE CENTER DETECTION………………………………….24 
3.2.2 1X2Y3X EYE CENTER DETECTION ALGORITHM…………..27 
3.3. LPDDR4 MEMORY CONTROLLER DESIGN BASED ON MEMORY 
TRAINING…………………………………………………………….31 
3.3.1 ARCHITECTURE FOR MEMORY BOOT UP AND POWER UP…..31 
3.3.2 CLOCK PATH ARCHITECTURE AND CLOCK TREE.……….....34 
3.3.3 C O M M A N D  T R A I N I N G  A N D  C O M M A N D  P A T H 
ARCHITECTURE……………………………………………..35 
3.3.4 WRITE LEVELING AND DATA STROBE TRANSMISSION PATH 
ARCHITECTURE..……………………………………………39 
3.3.5 READ TRAINING AND READ PATH ARCHITECTURE………...41 
3.3.6 WRITE TRAINING AND WRITE PATH ARCHITECTURE………43 
3.3.7 NORMAL READ/WRITE OPERATION AND MARGIN TEST……46 
 
CHAPTER 4 LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING AND 
CIRCUIT DESIGN ……………………………………….………….48 
4.1 OV ERALL LPDDR4 MEMORY CONTRO LLER ARCHITEC TURE 
MODELING…………………………………………………………….48 
4.2 S I M U L AT I O N  R E S U LT O F  LPDDR4 M E M O RY C O N TR O L LE R 
MODELING…………………………………………………………….51 
4.3 LPDDR4 MEMORY CONTROLLER CIRCUIT DESIGN…………………61 
4.3.1 PHASE-LOCKED LOOP………...…………………………..61 
4.3.2 DELAY-LOCKED LOOP………...…………………………..65 
4.3.3 TRANSMITTER OF LPDDR4 MEMORY CONTROLLER: WRITE 
PATH…………………………………………………………70 
4.3.4 DE-SERIALIZER WITH CLOCK DOMAIN CROSSING………….75 
 
V 
 
CHAPTER 5 MEASUREMENT RESULT OF LPDDR4 MEMORY CONTROLLER…77 
5.1 LPDDR4 MEMORY CONTROLLER MEASUREMENT SETUP……………77 
5.1.1 LPDDR4 MEMORY CONTRO LLER FLOOR PLAN AN D 
LAYOUT…………………………………………….……..77 
5.1.2 PACKAGE AND TEST BOARD……………………………….79 
5.2 L P D D R 4  M E M O R Y  C O N T R O L L E R  S U B - B L O C K 
M E A S U R E M E N T…………………………….………………..81 
5.2.1 PHASE-LOCKED LOOP…………………………………….81 
5.2.2 DELAY-LOCKED LOOP……………………………………....83 
5.2.3 200PS AND 800PS DELAY LINE………………….…………85 
5.2.4 VOLTAGE REFERENCE GENERATOR…...…………....86 
5.2.5 PHASE INTERPOLATOR…………………….…...…………....87 
5.3 LPDDR4 MEMORY SYSTEM OPERATION MEASUREMENT…………...90 
 
CHAPTER 6 CONCLUSION………………………………………………………....93 
 
APPENDIX OPERATION FLOW CHART OF THE PROPOSED LPDDR4 MEMORY 
CONTROLLER………………………………………….…………...95 
 
BIBLIOGRAPHY…………………………………………………………………………118 
 
KOREAN ABSTRACT……………………………………………………………….…124 
VI 
 
 
 
 
LIST OF FIGURES 
 
 
 
Fig. 1.1.1  What are the main areas of improvement for smartphone users in Korea………2 
Fig. 1.1.2  Roadmap of LPDDRx memory with per pin data rate.…………………………3 
Fig. 1.1.3  Roadmap of mobile memory device with total data bandwidth...……………....3 
Fig. 2.1.1  Architecture of mobile memory (a) LPDDR4 and (b) LPDDR3 memory…….10 
Fig. 2.1.2  Source synchronous matched scheme and unmatched scheme…………….…11 
Fig. 2.1.3  Schematic of (a) high speed unterminated logic in the LPDDR3, (b) LVSTL 
driver in the LPDDR4…………………………………………………...….12 
Fig. 2.2.1  Data rate and clock speed of the LPDDR4 memory………………………..…14 
Fig. 2.2.2  Simplified bus interface state diagram of LPDDR4 memory…………………15 
Fig. 2.3.1  Design procedure flow chart of the LPDDR4 memory controller……….........18 
Fig. 3.1.1  Training sequence of the proposed LPDDR4 memory controller………….…21 
Fig. 3.2.1  Various eye diagrams observed in simulations and measurements………….24 
Fig. 3.2.2  Simplified eye diagrams of Fig. 3.2.1……………………………………….24 
Fig. 3.2.3  Two-dimensional eye detection……………………………………………...25 
Fig. 3.2.4  The proposed 1x2y3x eye center detection algorithm….........………………..27 
Fig. 3.2.5  Comparison of two-dimensional eye detection center and 1x2t3x eye detection 
center…………………………………………………………………………28 
Fig. 3.2.6  Exception handling example of 1x2y3x eye detection……………………....29 
Fig. 3.3.1  LPDDR4 memory initialization sequence……………………………………31 
Fig. 3.3.2  Boot up path circuit block diagram………………………………………….32 
Fig. 3.3.3  Clock path circuit block diagram……………………………………..………34 
Fig. 3.3.4  CS training circuit block diagram…………………………………………...35 
Fig. 3.3.5  CA training circuit block diagram……………………………………………35 
Fig. 3.3.6  CS training timing diagram………………………………………………….36 
VII 
 
Fig. 3.3.7  CA training timing diagram…………………………………………………37 
Fig. 3.3.8  Write leveling circuit block diagram…………………………………………39 
Fig. 3.3.9  Write leveling timing diagram……………………………………………….40 
Fig. 3.3.10 Read training circuit block diagram……………………...….……………......41 
Fig. 3.3.11 Read training timing diagram…………………………...……………………42 
Fig. 3.3.12 Write training circuit block diagram…………………………………………43 
Fig. 3.3.13 Write training timing diagram………………………………………………44 
Fig. 3.3.14 Supported option list of normal operation and margin test method…………46 
Fig. 4.1.1 Block diagram of the proposed LPDDR4 memory controller………………48 
Fig. 4.1.2  Modeling diagram of the proposed LPDDR4 memory controller…………….49 
Fig. 4.2.1 Boot up sequence - initialization step 1………………………………………...51 
Fig. 4.2.2 Boot up sequence - initialization step 2………………………………………...51 
Fig. 4.2.3 Boot up sequence - initialization step 3………………………………………...52 
Fig. 4.2.4 Timing of command training entry……………………………………………..52 
Fig.4.2.5 Setup  and  hold  timing  margin  for  reference  voltage  sweep  at  command 
training………………………………………………………………………53 
Fig. 4.2.6 Training pattern of CS training………………………………………………....53 
Fig. 4.2.7 Reference voltage sweep in CS training………………………………………..54 
Fig. 4.2.8 1x2y3x eye detection algorithm in CA training………………………………54 
Fig. 4.2.9 Training pattern of CA training: 0 → A → 0 → B → 0 → C → 0 → D → 0 → 
E → 0 → A ···…………………………………..……………………………55 
Fig. 4.2.10 Result of the command training………………………………………………55 
Fig. 4.2.11 Exit timing of the command training………………………………………….56 
Fig. 4.2.12 Operation speed change in the end of command training……………………..56 
Fig. 4.2.13 Entry timing of the write leveling……………………………………………..57 
Fig. 4.2.14 Write leveling…………………………………………………………...........57 
Fig. 4.2.15 Training code sweep of the read training……………………………………...58 
Fig. 4.2.16 Environment of the read training………………………………………….......58 
Fig. 4.2.17 Result of the read training…………………………………………………….59 
VIII 
 
Fig. 4.2.18 Training code sweep of the write training…………………………………….59 
Fig. 4.2.19 Write training pattern…………………………………………………………59 
Fig. 4.3.1  Block diagram of the phase-locked loop……………………………………61 
Fig. 4.3.2  Digital loop filter architecture……………………………………………….62 
Fig. 4.3.3 Digitally controlled oscillator architecture…………………………………63 
Fig. 4.3.4 Delay-locked loop architecture and Coarse and fine delay cell of the delay-
locked loop…………………………………………………………………...65 
Fig. 4.3.5 Operation flow chart of the delay-locked loop………………………………66 
Fig. 4.3.6  Block diagram of the digital window phase detector in the local delay-locked 
loop and timing diagram and operation of the digital window phase detector...67 
Fig. 4.3.7 Phase interpolator……………………………………………………………68 
Fig. 4.3.8 Block diagram of the 200ps delay line………………………..………………69 
Fig. 4.3.9 Block diagram of a 16:1 serializer……………………………………………70 
Fig. 4.3.10 Block diagram of LVSTL driver of the LPDDR4 memory controller.………71 
Fig. 4.3.11 Pull down calibration circuit……………………………………………......72 
Fig. 4.3.12 Pull up calibration circuit…………………………………………………...73 
Fig. 4.3.13 Block diagram and timing diagram of the de-serializer with clock domain 
crossing………………………………………………………………………75 
Fig. 5.1.1  Microphotograph and layout of the LPDDR4 memory controller………….....77 
Fig. 5.1.2  Packaging and test plan of LPDDR4 memory and memory controller………79 
Fig. 5.1.3 Photo of PCB…………………………………………………………………80 
Fig. 5.2.1  Measurement results of phase-locked loop integrated jitter…………………81 
Fig. 5.2.2  Measurement results of phase-locked loop jitter……………………………82 
Fig. 5.2.3 Measured waveforms illustrating delay-locked loop locking behavior at (a) 
0.11GHz and (b) 2.5GHz…………………………….……………………….83 
Fig. 5.2.4 Measured long-term jitter performance of the proposed delay-locked loop at 
2.5GHz and measured phase noise plot of the delay-locked loops using bang-
bang phase detector and the proposed delay-locked loop at 2.5GHz.…………83 
Fig. 5.2.5  Measurement results of 200ps delay line…………………………………...85 
IX 
 
Fig. 5.2.6 Measurement results of 800ps delay line………………………..…………...86 
Fig. 5.2.7 Measurement results of reference generator…………………………………86 
Fig. 5.2.8 Measured monotonicity of phase interpolator………………………………87 
Fig. 5.2.9 Measured DNL of phase interpolator…………………………………………87 
Fig. 5.3.1 Measurement results of LPDDR4…………………………………………….89 
Fig. 5.3.2  Measurement results of LPDDR4 at 533Mbps operation……………………..90 
Fig. 5.3.3 Measurement results of LPDDR4 at 1066Mbps operation……………………90 
Fig. A.1 LPDDR4 memory controller operation flow chart 1.........................................95 
Fig. A.2 LPDDR4 memory controller operation flow chart 2.........................................96 
Fig. A.3 LPDDR4 memory controller operation flow chart 3.........................................97 
Fig. A.4 LPDDR4 memory controller operation flow chart 4.........................................98 
Fig. A.5 LPDDR4 memory controller operation flow chart 5.........................................99 
Fig. A.6 LPDDR4 memory controller operation flow chart 6.......................................100 
Fig. A.7 LPDDR4 memory controller operation flow chart 7.......................................101 
Fig. A.8 LPDDR4 memory controller operation flow chart 8.......................................102 
Fig. A.9 LPDDR4 memory controller operation flow chart 9.......................................103 
Fig. A.10 LPDDR4 memory controller operation flow chart 10.....................................104 
Fig. A.11 LPDDR4 memory controller operation flow chart 11.....................................105 
Fig. A.12 LPDDR4 memory controller operation flow chart 12.....................................106 
Fig. A.13 LPDDR4 memory controller operation flow chart 13.....................................107 
Fig. A.14 LPDDR4 memory controller operation flow chart 14.....................................108 
Fig. A.15 LPDDR4 memory controller operation flow chart 15.....................................109 
 
X 
 
 
 
 
LIST OF TABLES 
 
 
 
Table 1.2.1 Compare of the mobile memory specification……………………………........5 
Table 2.1.1 Compare of the LPDDR3 and LPDDR4 memory specification………………..9 
Table 5.3.1 Simulated power consumption of LPDDR4 memory controller at 4266Mbps 
operation…………...…………………………………………………………92 
1 
 
 
 
 
CHAPTER 1 
 
 
 
INTRODUCTION 
 
 
 
1.1 MOTIVATION 
 
The first commercially automated cellular network was launched in Japan by Nippon 
Telegraph and Telephone in 1979. From early 1990s, mobile phones began to spread. After 
20 years since mobile phones began to spread, the age of "one phone per one person" has 
come. The International Telecommunication Union has forecasted that the number of 
mobile phone user in the world would reach 7 billion by end of 2014 [1.1.1]. The number 
of mobile phone users in 2005 was 2.2 billion. Only after 10 years, penetration rate has 
exceeded 96.8 percent [1.1.2]. Today, the demand for portable electronic devices including 
mobile phones and tablet PCs is rapidly increasing throughout the world. If the mobile 
phone market grows this fast, the time of two mobile phones per one person will come soon, 
and market share competition in the IT industry is also expected to become increasingly 
intense. As the percentage of smart phone in the mobile market is high, development and 
sales of smart phones are also becoming important. Furthermore, various smart phones with 
diverse specifications, a small size and high performance are required by end users. Fig 
1.1.1 shows the result of inquiring of Korean smart phone users about what needs to make 
2 
 
improvements [1.1.3]. Although cost, user interface and size, especially screen size are 
considered important, 42% of the smart phone users responded that the speed is the most 
important specification and 30% of the smart phone users answered that the battery 
capacity is the most important requirement. Actually fully charged smart phone is 
completely discharged an average of 28 hours after the time of normal use, so it must be 
charged every day. In addition, the operation speed of the memory directly impacts on 
reducing the user’s response waiting time. It is the obvious fact in the mobile device using 
the mobile memory, that low power consumption, small chip area and high speed operation 
are the important. Thus, low power consumption with high operation bandwidth in a mobile 
memory is the most important thing. 
Many mobile memory and system architectures have been introduced to resolve these 
demands. Especially, low power double data rate synchronous dynamic random access 
memory (LPDDR SDRAM, LPDDR memory), which is developed from LPDDR1 to 
 
Fig. 1.1.1 What are the main areas of improvement for smartphone users in Korea 
Needs of improvement
Battery
30%
Speed 42%
UI 8%
Price
18%
Size
2%
[Ref] “Value-driven Memory Technology for the Future Semiconductor Market”, 
Samsung Electronics, Dr. Oh-Hyun Kwon, 2010 GSA MEMORY CONFERENCE
3 
 
LPDDR4, and Wide I/O, which is developed from Wide I/O 1 to Wide I/O 2, suggested by 
JEDEC are discussed in these days. Figs. 1.1.2 and 1.1.3 show the roadmap of LPDDRx 
memory with per pin data rate, and the roadmap of mobile memory with total data 
bandwidth [1.1.4]. Year after year, the figures show the per pin data rate is faster and total 
 
Fig. 1.1.2 Roadmap of LPDDRx memory with per pin data rate 
400 400
800
1066
1600
2133
3200
4300
2007 2008 2009 2010
1066
2011 2012
2013
2014 2015
1000
2000
3000
4000
5000
IO
 d
at
a 
ra
te
 (M
bp
s)
Year
[Ref] JEDEC, "LPDDRx data rate roadmap"
                PC-DDR4 date rate
LPDDR1
LPDDR2
LPDDR3
LPDDR4 
Target
 
Fig. 1.1.3 Roadmap of mobile memory device with total data bandwidth 
12.8
25.6
38.4
51.2
64.0
M
em
or
y 
m
ax
 b
an
d-
w
id
th
 (G
by
te
/s
)
Year
[Ref] JEDEC, "Mobile memory device roadmap"
76.8
2009 2010 2011 2012 2013 2014 2015
2ch
LPDDR2-800
2ch
LPDDR2-1066
4ch
Wide IO-200
2ch
LPDDR3-1600
2ch
LPDDR3-2133
4ch
Wide IO2-800
2ch
LPDDR4-4300
2ch
LPDDR4-3200
8ch
Wide IO2-800
8ch
Wide IO2-1066
4 
 
data band width is larger. The first LPDDR memory devices, which is suggested 2007, 
operated at 400 Mbps/pin with 1.8V supply voltage [1.1.5] [1.1.6], and LPDDR2 memory 
devices operated at 1066 Mbps/pin with 1.2V supply voltage [1.1.7], and LPDDR3 memory 
devices operated at 2133Mbps/pin [1.1.8] [1.1.9]. The LPDDR4 memory is suggested to 
operate maximum speed at 4266Mbps/pin with 1.1V [1.1.10]. Otherwise, Wide I/O, which 
is suggested 2011, operated at 200Mbps/pin [1.1.11]. Maximum data rate of Wide I/O 2 is 
1066Mbps/pin [1.1.12] [1.1.13]. 
5 
 
1.2 INTRODUCTION 
 
Recently the hottest mobile memories are LPDDR4 memory and Wide I/O 2. Table 
1.2.1 shows the main specification of LPDDR3 [1.1.8], LPDDR4 [1.1.10] and Wide I/O 2 
[1.1.12]. The total bandwidth of the LPDDR3 memory is 12.8GBps, and the total band 
width of early version of the LPDDR4 memory and Wide I/O 2 are targeted 25.6GBps. The 
number of I/O pin of the LPDDR4 memory is equal to that of LPDDR3 memory, and these 
two memory use the same type of chip package. But the supply voltage of the LPDDR4 
memory is 0.1V lower than LPDDR3 memory and data rate of the LPDDR4 memory is 
faster than LPDDR3 memory. Finally total band width of the LPDDR4 memory is targeted 
at 34.1GBps. Unlike LPDDR4 memory, per pin data rate of the Wide I/O 2 is decreased to 
0.8Gbps, and number of I/O pin is increased to 256 and finally the number of I/O pin is 
reached 512. The total band width of Wide I/O 2 is targeted at 51.2GBps. However, this 
 
Table 1.2.1 Compare of the mobile memory specification 
LPDDR4
LPDDR3
25.6GBps
3.2Gbps
x64
1.1
*POP/**MCP
/***DSC
Total bandwidth
Datarate/pin
#of IO
VDDQ
Package
12.8GBps
1.6Gbps
x64
1.2
*POP/**MCP
/***DSC
34.1GBps
4.3Gbps
x64
1.2
****SIP/*****TSV
Phase1 Phase2
Wide I/O 2
Phase1 Phase2
25.6GBps
0.8Gbps
x256
51.2GBps
x512
* Package on package, **Multi-chip package, ***Discrete component, 
**** Silicon interposer, *****Through silicon via
6 
 
approach has yet to surmount such design constraints like high cost, low reliability from 
low cell efficiency, wafer stacking [1.2.1] -[1.2.4], micro-bump reliability, and difficulty in 
backend failure analysis. The LPDDR4 memory, on the other hand, achieves 30% power 
reduction per bandwidth, without such high-cost process overhead as wide I/O with 
through-silicon-via [1.2.1] [1.2.2]. In this thesis, architecture of the LPDDR4 memory 
controller, operated with a LPDDR4 memory, is proposed and designed, and efficient 
training algorithm, which is appropriate for this architecture, is proposed for memory 
training and verification. Also, it shows the design flow of memory controller architecture 
from LPDDR4 memory. 
 
7 
 
1.3 THESIS ORGANIZATION 
 
The organization of this thesis consists as follows. Chapter 1 is an introductory chapter 
which describes the necessity of the LPDDR4 memory. In chapter 2, introduces LPDDR4 
memory and major specification of LPDDR4 memory, especially difference between 
LPDDR3 and LPDDR4 memory. In addition, design procedure is presented in chapter 2. 
In chapter 3, the architecture of LPDDR4 memory controller based on training sequence 
and method with 1x2y3x eye center detection algorithm are discussed. In chapter 4, overall 
LPDDR4 memory controller modeling and sub blocks are explained. The measurement 
setup and experimental results are given in chapter 5. Finally, in chapter 6, the proposed 
LPDDR4 memory controller is summarized. 
 
8 
 
 
 
 
CHAPTER 2 
 
 
 
LPDDR4 MEMORY CONTROLLER DESIGN 
 
 
 
2.1 DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 
 
Unlike memory controller used in personal computer or laptop computer which exist 
independently in computing system, mobile memory controller is equipped with mobile 
processor. The design of LPDDR4 memory controller starts from comparing difference of 
LPDDR3 and LPDDR4 memory, because there is no research about the LPDDR4 memory 
controller. 
The key feature of LPDDR4 memory, which is next generation mobile memory 
standard, is low power consumption with high operation bandwidth than LPDDR3 memory. 
Existing LPDDR3 memory consumes large power to meet the demand for high speed 
operation. LPDDR4 memory, on the other hand, sets a goal of low power consumption with 
high operation bandwidth. As shown in Table 2.1.1 the VDDQ voltage of the LPDDR4 
memory has dropped to 1.1V. The LPDDR4 memory support burst length of 16 to maintain 
dynamic random access memory (DRAM)’s core speed as that of LPDDR3 memory. Other 
features include 16 data bus (DQ) bus per channel, support for BL32—as an extension of 
BL16—ZQ calibration, single RESET pin, and 6 command-address (CA) pins. In addition, 
9 
 
LPDDR4 memory has a 2 KB page size, reduced from 4 KB of LPDDR3 memory in order 
to reduce active power consumption, at the cost of over 4% die penalty. A small swing 
signaling is adopted for faster data transmission and extra power savings. The number of 
the CA pin was reduced from 10 to 6. The CA data rate and data transmission topology was 
changed from 1600MT/s with double data rate (DDR) to 2133MT/s with single data rate 
(SDR). The termination and signaling topology was changed to VSSQ termination with 
low voltage swing terminated logic (LVSTL) driver to achieve low power consumption 
 
Table 2.1.1 Compare of the LPDDR3 and LPDDR4 memory specification 
LPDDR4LPDDR3
1.8V
1.1V
2-Channel, x16
16 or 32
6
~2133MT/s(SDR)
~4266Mbps(DDR)
VSSQ termination
LVSTL
(Low voltage swing 
terminated logic)
Source synchronous 
unmatched scheme
VDD1
VDD2/VDDQ
Channel & IO
BL
CA pin count
CA rate
Data rate
Termination
Signaling
Write DQS scheme
1.8V
1.2V
1-Channel, x32
8
10
~1600MT/s(DDR)
~1600Mbps(DDR)
VDDQ termination
HSUL
(High speed 
unterminated logic)
Source synchronous 
matched scheme
10 
 
with high speed operation. The biggest change to controller side is that write data bus strobe 
(DQS) scheme was changed from source synchronous matched scheme to source 
synchronous unmatched scheme. It means memory controller should delay the DQS timing 
to compensate the tDQS2DQ delay in the memory [1.1.13]. 
 
2.1.1 ARCHITECTURAL DIFFERENCE BETWEEN LPDDR3 AND 
LPDDR4 MEMORY 
 
One of the most apparent change of the LPDDR4 memory is 2-channel architecture 
per die, as shown in Fig. 2.1.1. This means that two independent devices exist in a single 
die, an identical set of I/O and power pins devoted to each, with the exception of ZQ and 
RESET pin, which are shared by the two devices. CA bus of LPDDR3 memory is placed 
on the top side and DQ bus on the bottom. Consequently, control signals handling data of 
write and read operation run the entire length of the chip, and this results a large latency 
and a wide process-voltage-temperature (PVT) variation. This in turn hurts such timing 
 
Fig. 2.1.1 Architecture of mobile memory (a) LPDDR4 and (b) LPDDR3 memory 
CA
DQX32
BANK
CA
BANK
DQX8DQX8
CA DQX8DQX8
Channel A
Channel B
(a) (b)
11 
 
parameters as tDQSCK (clock to DQS delay). The tDQSCK indicates DQS output's access 
time measured relative to the external clock and is present in mobile DRAMs due to the 
absence of power hungry delay-locked loop. The LPDDR4 memory adopts different 
disposition for CA and DQ buses in order to avoid performance degradation that stems 
from the stretched control signals. As illustrated in Fig. 2.1.1, the implemented change 
leads to shorter signal trees and removes constraints placed on the timing parameters 
[1.1.13].  
 
2.1.2 SOURCE SYNCHRONOUS MATCHED SCHEME AND UNMATCHED 
SCHEME 
  
The LPDDR4 memory adopts source synchronous unmatched scheme with different 
signal paths for DQ and DQS. The unmatched signal paths inevitably give rise to 
unmatched delay between DQ and DQS signals as illustrated in Fig. 2.1.2, and this time 
difference is expressed as tDQS2DQ (DQS buffering delay to DQ). Fig. 2.1.2 shows 
 
Fig. 2.1.2 Source synchronous matched scheme and unmatched scheme 
LPDDR4 memory
Source synchronous 
unmatched scheme
LPDDR2 & 3 memory
Source synchronous
 matched scheme
DQ
VREF_DQ
DQS
DQSB DQS TREE
DQS TREE
Matching delay
DES
DQ
VREF_DQ
DQS
DQSB DQS TREE
DES
tDQS2DQ delay
12 
 
matched structure in LPDDR2 and LPDDR3 memory. The source synchronous unmatched 
scheme removes one of the critical design constraints of DRAM, namely, setup and hold 
time margins of DQ receiver in memory [1.1.13]. On the other hand, in memory controller 
side, source synchronous unmatched scheme is one of the most critical design constraint. 
Thus, a more sophisticated design technique is needed in LPDDR4 memory controller.  
 
2.1.3 LOW VOLTAGE SWING TERMINATED LOGIC DRIVER AND 
TERMINATION SCHEME 
 
Pre-LPDDR4 memory devices, such as LPDDR2 and LPDDR3 memory, adopted 
high-speed unterminated logic, a backward-compatible, unterminated interface with low 
power consumption. In response to the growing demand for high-performance DRAM, the 
interface in LPDDR3 memory now supports termination [1.1.8], [1.1.9]. The performance 
target of LPDDR4 memory standard, however, cannot be satisfied with the conventional 
interface scheme, and thus adopts a small swing interface called LVSTL with ground 
 
Fig. 2.1.3 Schematic of (a) high speed unterminated logic in the LPDDR3, (b) LVSTL driver in 
the LPDDR4 
DATA
DATA VREF
DOUT
DATA
DATA VREF
DOUT
(a) (b)
13 
 
termination (VSSQ termination). This interface consists of a pull-down NMOS driver and 
a pull-up NMOS driver, which operates in a saturation region. Hence, the fast current 
provided by the pull-up driver enables faster transmission of data together with lower I/O 
capacitance from the absence of PMOS. Also, in an un-terminated mode, LVSTL driver's 
output does not swing rail-to-rail thanks to a threshold voltage drop across the pull-up 
NMOS [2.1.1]. Moreover, this interface is VSSQ terminated rather than VDDQ or 
VDDQ/2. This improves signal integrity characteristics of the interface as the ground signal, 
being the lowest impedance supply in most systems with the strongest noise immunity 
[2.1.2], [2.1.3]. The output swing level of the LPDDR4 memory, referred to VOH, can be 
selected between VDDQ/3 and VDDQ/2.5 in on-die termination (ODT) mode [2.1.4]-
[2.1.6]. For instance, the memory controller can set 120 ohm strength for the pull-up driver 
and DRAM 60 ohm for the ODT under a high speed operating condition in favor of 
impedance matching. At an intermediate frequency, the memory controller can set 240 ohm 
strength and DRAM 120 ohm for the ODT in favor of current consumption. In both cases 
VOH value equals to VDDQ/3. The ODT operation in DRAM is carried out automatically 
upon write command. If ODT is enabled via mode register settings, a write command 
triggers internal ODT operation, and termination is turned on before write DQS signals 
start toggling. Upon the arrival of the last data strobe pulse, DRAM turns off ODT to reduce 
power consumption. There exists an asynchronous delay for the ODT control, which is 
referred to as ODT uncertainty. The reference voltage for the receiver is adaptive and is 
around VDDQ/6 or VDDQ/5, whose optimum value is determined by a training sequence. 
The memory controller can set the VOH value at VDDQ/3 or VDDQ/2.5, depending on the 
14 
 
operating environment and application [1.1.13]. 
15 
 
2.2 LPDDR4 MEMORY CONTROLLER SPECIFICATION 
 
As shown in Fig. 2.2.1, the per pin speed range of the LPDDR4 memory is from 
533Mbps to 4266Mbps. In particular, it supports 533Mbps, 1066Mbps, 1600Mbps, 
2133Mbps, 2666Mbps, 3200Mbps, 3733Mbps and 4266Mbps which is interval of 
533Mbps. The clock speed of the LPDDR4 memory is from 266MHz to 2133MHz. To 
support aforementioned LPDDR4 operation, 266MHz, 533MHz, 800MHz, 1066MHz, 
1333MHz, 1600MHz, 1866MHz, and 2133MHz clocks are needed. The speed of the 
operation clocks is 266MHz intervals, and these clocks can be made from 1333MHz, 
 
Fig. 2.2.1 Data rate and clock speed of the LPDDR4 memory 
234ps
468ps
938ps
1876ps
625ps
268ps
375ps
4266Mbps
2133Mbps
1066Mbps
533Mbps
3200Mbps
1600Mbps
3733Mbps
2666Mbps
/1
/1
/1
/1
/2
/2
/4
/8
469ps
938ps
1876ps
3759ps
625ps
1250ps
536ps
750ps
2133MHz
1066MHz
533MHz
266MHz
1600MHz
800MHz
1866MHz
1333Mbps
/1
/1
/1
/1
/2
/2
/4
/8
312.5ps
16 
 
1600MHz, 1866MHz, or 2133MHz with integer divider. 
A memory channel of the LPDDR4 memory consists one uni-directional differential 
CK pins, 7 uni-directional single ended command pins, 2 bi-directional differential DQS 
 
Fig. 2.2.2 Simplified bus interface state diagram of LPDDR4 memory 
Power
On
Reset
Per
Bank
Refresh
MRW
MRR
All
Bank
Refresh
Command
Bus
Training
Self
Refresh
Idle
Power
Down
MRW
MRW
MRR
MRR
Command
Bus
Training
Bank
Active
Read
Write
or
MWR
Active
Power
Down
Per
Bank
Refresh
Read
With Auto-
Precharge
Pre-
charging
Write
With Auto-
Precharge
MPC
Based
Training
Activating
Idle
MPC
Based
Training
MPC
Based
Training
MPC
Based
Training
SR
Power
Down
Command sequence
Automatic sequence
17 
 
pins, 16 bi-directional single ended DQ pins, 2 bi-directional single ended DMI pins and 
other control pins such as RESET, ODT_CA, and ZQ_CAL. 
A LPDDR4 memory controller follows flow chart depicted in Fig. 2.2.2, to 
communicate with LPDDR4 memory [1.1.10]. The LPDDR4 memory starts at power on 
state, to operate normally, memory passed power on, reset, boot up state and training 
sequence. The training sequence consist command training, write leveling, read DQ 
training, and write DQ training. After training, the LPDDR4 memory goes activation state 
to prepare normal state. The margin tests are performed to evaluate the LPDDR4 memory’s 
operating performance. The tDQS2DQ, tDQSCK, tDQSS (clock to DQS delay) and 
tDQSQ (DQS to DQ delay) should be compensated by LPDDR4 memory controller to 
proper operation, and LPDDR4 memory controller also performs ZQ calibration, per pin 
de-skewing, read and write latency check, clock domain crossing, and eye center detection. 
To perform these functions, phase-locked loop, delay-locked loop, serializer/de-serializer, 
LVSTL driver, clock distribution circuit with skew minimization, and continuous-time 
linear equalizer are required. 
The proposed LPDDR4 memory controller supports power on, reset, idle, activating, 
bank active, read, write, command training, and MPC based training. Other functions, such 
as refresh and power down, are excluded for simple realization of LPDDR4 memory 
controller. These functions can execute simple command transmission from memory 
controller to memory. 
 
18 
 
2.3 DESIGN PROCEDURE 
 
The proposed memory controller design started with the preliminary version of the 
LPDDR4 memory specification from the SKhynix. Thus, the specification of proposed 
memory controller is based on LPDDR4 memory specification from the SKhynix version. 
The first JEDEC version of the LPDDR4 specification is published after tape-out, August 
2014 [2.3.1]. 
The architecture of LPDDR4 memory controller is determined to support function and 
 
Fig. 2.3.1 Design procedure flow chart of the LPDDR4 memory controller 
START
Spec
Architecture
Modeling Circuit design Test plan
Simulation
Floor plan & layout
Measurement
END
SK hynix LPDDR4 v0.85 (Preliminary) Jan. 2014
JEDEC LPDDR4 v1.0 Aug. 2014
 PKG and 
PCB
design
Tapeout
Process : 65nm CMOS Process
Modeling Verification with SK hynix LPDDR4 model
Tapeout : Jun. 2014
19 
 
specification of LPDDR4 memory. The design of LPDDR4 memory controller was 
performed three parts. Focused on circuit design, one-to-one matched modeling was 
progressed, and at the same time, the test method was considered. In order to ensure that 
memory controller works well, the LPDDR4 memory controller modeling was simulated 
with the LPDDR4 memory modeling, which is provided by the SKhynix. To test proposed 
LPDDR4 memory controller with LPDDR4 memory, LPDDR4 memory was provided by 
the SKhynix. Generally LPDDR4 memory controller is stacked with LPDRR4 memory by 
package-on-package structure [2.3.1] [2.3.2]. However, it is not easy to make package-on-
package structure for academic research. Thus, thin quad flat package is used for testing, 
and this package type is considered when layout and floor plan. 
20 
 
 
 
 
CHAPTER 3 
 
 
 
LPDDR4 MEMORY CONTROLLER 
ARCHITECTURE BASED ON MEMORY TRAINING 
 
 
 
3.1 LPDDR4 MEMORY TRAINING SEQUENCE 
 
The LPDDR4 memory controller operates sequentially to train a LPDDR4 memory 
as shown in Fig. 3.1.1. When power is applied to the memory controller, the phase-locked 
loop and other internal circuits of the memory controller are ready to train the memory. In 
addition, the controller sends signals to approve power according to reset sequence so that 
memory can prepare the training. Boot up sequence includes registers, which is in the 
memory, initializing and setting according to operation speed and training mode by the 
memory controller. When boot up of the memory is done, the memory controller starts the 
command training. The operation frequency of the boot up is 33MHz, which is half of the 
reference frequency of the phase-locked loop. The command training, which is starts after 
boot up, is consists of the chip select (CS) training and the CA training. Firstly, when the 
CS training mode, only CS signal is transmitted to the memory and sampled value of the 
CS signal is returned to the memory controller via DQ feedback path of the memory 
controller, and when the CA training mode, the CA and CS signals are transmitted to the 
21 
 
memory, and sampled values of the CA and CS are returned to the memory controller. Both 
of these two processes received feedback from DQ pin, and the DQ feedback path must 
exist independently, since the receiver path is not trained. The operation speed is lowered 
to 33MHz, when the command training is completed, to write the result of the training at 
 
Fig. 3.1.1 Training sequence of the proposed LPDDR4 memory controller 
Boot
: Memory power on
CS training
: CS – CLK↑
CA/CS training
: CA/CS – CLK↑
CMD(CA[5:0]/CS) training
: All CMD(CA/CS) transmitted to memory
: DQ feedback
Boot up
: Ctrl. ready to training(PLL & DLL power on)
: Memory power on
CS training
: only CS signal transmitted to memory
: DQ feedback 
Write leveling
: DQS↑ - CLK↑
DQS0 & DQS1 training
: DQS transmitted to memory
: DQ feedback
Read training
: DQ - DQS↑
à  FIFO CLK↑
Read training(DQ[0:15] & DMI[0:1]) + EQ training
: Read CMD transmitted to memory
: Data transmitted to Ctrl.
: Data sampled by DQS
: Latency check & Clock domain crossing
Write training
: DQ - memory DQS↑
: Read DQ
Write training(DQ[0:15] & DMI[0:1])
: Write CMD transmitted to memory
: Data transmitted to memory
: Read CMD transmitted to memory
: Verify write data
Normal operation 
check
: Read & Write verify
Normal operation check (DQ[0:15] & DMI[0:1])
: Read & Write CMD transmitted to memory
: Specific Address or Bank access -> Verify ALL DQ read/write
Margin test Read & Write margin test
Start
End
22 
 
mode register of the memory. 
The write leveling is the DQS timing training to match the rising edge of the DQS and 
the rising edge of the CK signal, which is clock signal delivered from the LPDDR4 memory 
controller to the LPDDR4 memory, at the LPDDR4 memory side. It operates at the normal 
operation speed of the LPDDR4 memory between 266MHz and 2133MHz. Timing of the 
DQS[0] and DQS[1] is controlled to align the rising edge of the DQS and the rising edge 
of the CK. After the write leveling, enters the read training, and the read training should 
lead the write training, because of written value at LPDDR4 memory checked by read 
command. 
At the read training, the memory controller transmit the read command to the memory, 
after that the memory transmit the data to the memory controller, which is save at the mode 
register when boot up. The receiver of the memory controller samples the DQ with the 
DQS from the memory, after that clock domain crossing should be performed from the 
DQS domain to the clock domain in the memory controller. At the read training, internal 
timing and reference voltage of the memory controller are controlled, but at the write 
training, internal timing of the memory controller and reference voltage of the memory 
should be controlled. Accordingly, at the write training, the memory controller transmits 
the command of write, read to identify the write data, and control signals to change the 
training values. Finally, if the written data and the read data are matched, the all trainings 
are completed. The 1x2y3x eye center detection algorithm is used to fine the center of the 
eye at all trainings, which is the command training, read training, and write training. 
In all aforementioned training, written data is not accessed to the memory cell in the 
23 
 
LPDDR4 memory, all written data is saved at the register existing at the I/O of the LPDDR4 
memory. Therefore, the write and read functions should be tested with the real memory cell, 
which is used at normal read and write operation. After all training are ended, the proposed 
memory controller verifies write and read command with the real memory cell in the 
LPDDR4 memory. The read and write margins are tested after verification of the normal 
operation. In section 3.3.7, the normal operation and the margin test are discussed. 
24 
 
3.2 LPDDR4 MEMORY TRAINING EYE DETECTION ALGORITHM 
 
3.2.1 EYE CENTER DETECTION 
 
To train the LPDDR4 memory, the LPDDR4 memory controller is necessary to check 
the data from/to the memory at each training stage. Eye center detection is required to verify 
the functions of memory training. Many eye detection algorithms have been introduced to 
 
Fig. 3.2.1 various eye diagrams observed in simulations and measurements 
 
Fig. 3.2.2 Simplified eye diagrams of Fig. 3.2.1 
25 
 
verifying eyes in many applications [3.2.1]-[3.2.8]. In section 3.2, the conventional two-
dimensional eye center detection algorithm is compared with the proposed 1x2y3x eye 
center detection algorithm, which is adopted in the proposed LPDDR4 memory controller. 
Eye patterns have various shapes when sending and receiving data between chips. Fig. 
3.2.1 shows various eye diagrams observed in simulations and measurements. The eye 
diagram can be have a variety of shapes under the influence of input load, cross-talk, 
anbient noise, impedence mismatching, and inter-symbol interference. Fig 3.2.2 shows 
simplified eye diagram of Fig. 3.2.1. Simplified eye patterns are used to briefly explain of 
the subsequent description. As shown in Fig. 3.2.3, the most common way to find eye center 
in the eye diagram from various eye diagram is two-dimensional eye dectection, which 
checks all point in a two-dimensional. It moves one point up, down, left or right from the 
 
Fig. 3.2.3 Two-dimensional eye detection 
Data or CMD1UI
Ideal signal
Data or CMD1UI
Real signal
 Eye center
26 
 
starting point by checking whether the point passed or not, to find all the eye diagrams. For 
example, if the timing step of the x-axis is 64 points, and the reference voltage of the y-
axis is 72 points, there are total of 4608 test points in total. The eye detection is judged 
whether the point of eye open or not, and one of the largest margin point is the eye center. 
If 4608 of this step is reduced, the time for the entire memory training will be reduced. 
Moreover, these eye center detection algorithm used at 7 CA and CS at the command 
training, 2 DQSs at write leveling, 18 DQs and DMIs of the memory controller side at the 
read training, and 18 DQs and DMIs of the memory side at the write training. Shorter time 
of eye center dectection is very effective to reduce the total trainging time, because there is 
45 pins to train. 
 
27 
 
3.2.2 1X2Y3X EYE CENTER DETECTION ALGORITHM 
 
Instead of the two-dimensional eye center detection algorithm, the 1x2y3x eye center 
detection algorithm is proposed to reduce the time of training. First 1x direction sweep is 
performed. As shown in Fig. 3.2.4, 1x eye detection sweeps the x-axis direction of the 
sampling timing to fine the x-axis eye monitoring while the y-axis of the reference voltage 
were fixed. And, then the 2y eye detection sweeps the y-axis direction of the reference 
voltage to find the y-axis eye monitoring while the x-axis of the timing were fixed at center 
point value of the 1x sweep. Finally, the y-axis of the reference voltage is fixed at the center 
point of the 2y sweep, and the x-axis of the timing is swept for 3x eye detection. The center 
point of the eye is fixed at center point of the 2y and 3x sweep. 
 
Fig. 3.2.4 The proposed 1x2y3x eye center detection algorithm 
Data or CMD1UI
Ideal signal
Data or CMD1UI
Real signal
1
2
3
1
2
3
 Eye center
28 
 
The reason to do this is because maximum swing level, VOH level, of the LPDDR4 
memory and controller is fixed in the LPDDR4 memory specification at VDDQ/2.5 or 
VDDQ/3 [1.1.10] [3.2.9]. The value of the reference voltage of 1x sweep can be effectively 
started by with half value of the VOH, VDDQ/5 or VDDQ/6. In addition, the algorithm of 
saving two points, which is start and end point of eye opening, and averaging the sum of 
these two points is simpler than the algorithm of saving all point values of two-dimension 
and finding two-dimensional center point. For example, if the timing step of the x-axis is 
64 points, and reference voltage of the y-axis is 72 points, 4608 registers are required. And, 
the algorithm of finding the point of the biggest x and y margin in the two-dimensional 
 
Fig. 3.2.5 Comparison of two-dimensional eye detection center and 1x2t3x eye detection center 
1
2
3
1
2
3
1 & 3
2
1
2
3
1
2
3
Two-dimensional eye detection center
1x2y3x eye detection center
ignored
ignored
ignored
29 
 
table is needed. In the 1x2y3x algorithm, on the other hand, 4 registers are required to save 
the start and end point of the x- and y-axis. And, the algorithm of averaging is very simple. 
Fig. 3.2.5 shows the center of eyes found in a variety of eye diagram on the two-
dimensional eye detection methods and the 1x2y3x eye detection method. If the value of 
the reference voltage of the first 1x sweep is very low, very unusual center point difference 
of two algorithm can be seen in the example of the figure. In most cases, eye centers of the 
two eye detection algorithms are identical, and there is low probability of the reference 
voltage difference of the first 1x sweep because the VOH level of the LPDDR4 memory is 
defined as VDDQ/2.5 or VDDQ/3 [3.2.9] [3.2.10]. Fig. 3.2.6 shows exceptional handling 
example of the 1x2y3x eye center detection algorithm. Upper example shows that if 
detected range of the eye open point of the 1x sweep is more than one range, the widest 
range is selected and others are ignored. In case of one or more of same range is detected, 
first one is selected. Lower example shows that if there is no eye open range on the first x-
 
Fig. 3.2.6 Exception handling example of 1x2y3x eye detection 
1'''' 1'''
1'' 1'
1
2
3
1
2
3
ignoredignored
y value
changed
y value
changed
30 
 
axis sweep, the reference voltage of the y-axis is changed to detect eye open range until 
found. The moving average algorithm is applied to prevent the accidentally deciding 
contrary by an error, noise or metastability. The opening decision of point is determined by 
only consecutively detected case, and by test for multiple time for a single point. The 
exceptional handling methods above are also applied to the case of y-axis sweep. The 
proposed 1x2y3x eye detection algorithm is applied in training operation of the memory 
controller and the memory, such as the CA and CS in the command training, the DQSs in 
the write leveling, the memory side of the DQs and DMIs in the read training, and the 
memory controller side of the DQs and DMIs in the write training. 
 
31 
 
3.3. LPDDR4 MEMORY CONTROLLER DESIGN BASED ON MEMORY 
TRAINING 
 
Fig 3.3.1 shows initialization and training sequence of a LPDDR4 memory by a 
LPDDR4 memory controller in the LPDDR4 memory specification [1.1.10]. Section 3.3 
discusses about the structure and operation of the proposed LPDDR4 memory controller 
and the LPDDR4 memory in accordance with the sequence. 
 
3.3.1 ARCHITECTURE FOR MEMORY BOOT UP AND POWER UP 
 
In order to test the LPDDR4 memory, the initialization and training of the LPDDR4 
memory must be performed by the LPDDR4 memory controller. As shown in Fig. 3.3.1, 
the LPDDR4 memory can operates normally, after finish the initialization and training 
process from power ramp state at Ta to DQ training state ar Tj [1.1.10]. The power ramp 
 
Fig. 3.3.1 LPDDR4 memory initialization sequence 
Ta Tb Tc Td Te Tf Tg Th Ti Tj Tk
Power 
Ramp Reset Initialization
CK_c
CK_t
Supplies
Reset_n
CKE
CA[5:0]
CS
DQs
tINIT0=20ms(MAX) tINIT1=200us(min)
tINIT2=10ns(min) tINIT3=2ms(min)
tINIT4=5tCK(min)
tINIT5=2us(min) tZQCAL=1us(min) tZQLAT=MAX(30ns, 8tCK)(min)
Exit 
PD DES DES DES DES DES DES
MRW
MRR
ZQCal
start
ZQCal
latch
CA BUS
Training DES
Write
Leveling
DQ
Training Valid
ValidValidValid
32 
 
state means process for supplying power to the memory. In case of the memory controller, 
controller should be powered like the memory, besides all circuits in the memory controller, 
such as the phase-locked loop and the delay-locked loop are prepared to train the memory. 
The meaning of prepared circuit is that entire circuits are clocked, saturated and reset. In 
other words, power ramp state of the memory controller means readiness for transmitting 
and receiving signals to and from the memory. Therefore, power ramp state of the memory 
controller is the process of power supplying, reset state of the digital control circuits, and 
ready to operate state of the analog circuits.  
The states of the LPDDR4 memory from time Ta to Tg in Fig. 3.3.1 are low speed 
operation range. The memory is operated at a low speed after power ramp, and operated at 
high speed of normal operation, which is defineded in the LPDDR4 memory specification 
[1.1.10], after command training, including some part of the command training. A timer 
circuit is used to calculate wait and response time of each state. 
As shown in Fig. 3.3.2, signals of CKs, CAs and CS are transmitted from digital circuit 
 
Fig. 3.3.2 Boot up path circuit block diagram 
LPDDR4
memory 
controller
local DLL+PI
REF 0º 45º 135ºPLL
Mode
Resistor
CMD timing
16:1
SER
.75tCK
Mask
Boot
Mux
16:1
SER
.75UIM
ask
Boot
Mux
BootCMD – 33MHz CA[5:0]
CS
local DLL+PI DL200
CKt
CKc
Boot
Mux
BootCMD – 33MHz
BootCLK – 33MHz
LPDDR4 memoryLPDDR4 memory controller
C
L
K
T
R
E
E
BootCLK – 33MHz
BootCMD – 33MHz
33 
 
of the memory controller via Boot Mux at low speed of 33MHz. To locate rising edge of 
the transmitted boot clock at margin center of the transmitted command, boot clock should 
be reversed. 
A series of processes being operated at a low speed is called the boot-up operation for 
convenience. The boot-up operation ends after some part of high speed operation of the 
completed command training. The all initialization sequence processes end after the 
command training, write leveling, and read and write training. At low speed operation, as 
shown in Fig. 3.3.1, timing values from tINIT1 to tZQLAT are generated by counter based 
timer in the digital control circuit in the memory controller for the Reset_n, CKE, and other 
signals. The Reset_n signal operates timing as shown in Fig. 3.3.1, and value of the 
ODT_CA is "LOW" when operates 33MHz at boot-up and "HIGH" when operates over 
266MHz at other training mode including the command training and DQ training. The 
value of the ODT_CA is controlled by MRW-CA_ODT command. The low speed operation, 
called as boot-up operation, continued to finish the command training. After the command 
training, boot-up operation is continued to write result of the command training at the mode 
register in the LPDDR4 memory. 
 
34 
 
3.3.2 CLOCK PATH ARCHITECTURE AND CLOCK TREE 
 
Before describes the command training, the clock path, which is related with overall 
operation of the LPDDR4 memory, is mentioned. As shown in Fig. 3.3.3, the clock path 
starts from the phase-locked loop, and pass through the clock tree. The clock path consists 
of the local delay-locked loop, phase interpolator, and 200ps delay line. The 200ps delay 
line used in write leveling, and Boot Mux used in boot-up operation. The clock signal, 
which is shifted 180˚ and delayed by the delay-locked loop and 200ps delay line, is reached 
the LPDDR4 memory via CK_t and CK_c channel. The CK signals should shifted 180˚, 
because the rising edge of the CK must be located at the center of the transmitted command. 
The clock signal from the phase-locked loop is distributed to the circuits which needs 
the clock signal such as the global delay-locked loop, TX_DQS and de-serializer through 
the clock tree. The clock tree is designed the arrival time of each clock signal to be same. 
The digital clock is delivered to the digital control circuit with divided by 8. 
 
 
 
 
Fig. 3.3.3 Clock path circuit block diagram 
PLL
local DLL+PI DL200
CKt
CKc
Boot
Mux BootCLK – 33MHz
LPDDR4 memoryLPDDR4 memory controller
C
L
K
T
R
E
E
35 
 
3.3.3 COMMAND TRAINING AND COMMAND PATH ARCHITECTURE 
 
The command training is the training, which finds the rising edge timing of the CK at 
the timing center of the CS and CA signals and finds the center of the reference voltage, to 
transmit the command. The command training consists of the CS training and the CA 
 
Fig. 3.3.4 CS training circuit block diagram 
LPDDR4
memory 
controller
local DLL+PI
REF 0º 45º 135ºPLL
Mode
Resistor
CMD timing
16:1
SER
.75tCK
Mask
Boot
Mux
16:1
SER
.75UI
Mask
Boot
Mux
DC VrefCA code
more than 2 pulse
DC value
CA sampled 
value feedback
DQS
DQ[6:0]
CA[5:0]
0.75 tCK pulse CS
local DLL+PI
180º DL200
CKt
CKc
DQ[8:13]
Boot
Mux
LPDDR4 memory
LPDDR4 memory controller
C
L
K
T
R
E
E
 
Fig. 3.3.5 CA training circuit block diagram 
LPDDR4
memory 
controller
local DLL+PI
REF 0º 45º 135ºPLL
Mode
Resistor
CMD timing
16:1
SER
.75tCK
Mask
Boot
Mux
16:1
SER
.75UI
Mask
Boot
Mux
DC VrefCA code
more than 2 pulse
CA sampled 
value feedback
DQS
DQ[6:0]
CA[5:0]
0.75 tCK pulse CS
local DLL+PI
180º DL200
CKt
CKc
DQ[8:13]
Boot
Mux
0.75 tCK pulse
LPDDR4 memory
LPDDR4 memory controller
C
L
K
T
R
E
E
36 
 
training. At the CS training, as shown in Fig. 3.3.4, the CS signal is transmitted to memory 
with the fixed value of the CA. And the feedback result is returned through the DQs. The 
CA training refers to this feedback. 
The command signal transmitted to memory with 0˚ shift for timing margin, because 
transmitted clock signal shifted 180˚ as mentioned at section 3.3.2. The local delay-locked 
 
Fig. 3.3.6 CS training timing diagram 
CA[5:0], CS
@ memory chip
CLK 
@ memory chip
Normal operation @ memory chip
CS training operation
CS
@ Selected phase domain
selected phase
by replica DLL+PI
CLK
@ memory chip
CA[5:0] DC value
MUXed CS
0
MUXed CS
@ memory chip
first
CS x center found
PHY CLK
@ memcon chip
CA/CS local DLL + PI
CS
@ CLK domain
PHY CLK
fix 100ps delay
@ DL200
PHY CLK
replica DLL 180
CLK flight time
CS flight time
MUXed CS
@ memory chip
11111111?0000?11
1
37 
 
loop is used to 0˚ shift. And, the comparators are used to sample the feedback signals of 
the CS and CAs. At the CS training, as shown in Fig. 3.3.5, it is not changed compared to 
the CS training, except that the CA value is changed. The unit interval of the CS and CA 
signals are 1tCK at normal operation. On the other hand, the command training uses 
 
Fig. 3.3.7 CA training timing diagram 
CS
@ memory chip
CLK 
@ memory chip
Normal operation @ memory chip
CA training operation
CA[5:0], CS
@ Selected phase domain
selected phase
by replica DLL+PI
CLK
@ memory chip
MUXed 
CA[5:0], CS
0
MUXed 
CA[5:0], CS
@ memory chip
first
CA[5:0], CS
 x center found
PHY CLK
@ memcon chip
CA/CS local DLL + PI
CS
@ PHY CLK domain
PHY CLK
fix 100ps delay
PHY CLK
replica DLL 180
CLK flight time
CA/CS flight time
MUXed
CA[5:0], CS
@ memory chip
11111111?0000?11
1
first
CA[5:0], CS
y center found
second
CA[5:0], CS
 x center found
38 
 
0.75tCK masked CS and CA to prevent consecutive samples. 
The command training using the 1x2y3x eye detection algorithm operates as follows. 
First, the CS signal is sampled by timing sweep at the CS training state. Second, the CS 
and CA signals are is sampled by 1x timing sweep at the CA training state referring the 
sampling timing of the CS training value. After the 1x timing sweep, the 2y and 3x sweep 
are performed. Fig. 3.3.6 and 3.3.7 show the timing diagram of the timing sweep of the CS 
and CA training. The controlled variables of the command training are control code of the 
phase interpolator of the memory controller and the reference voltage of the memory. The 
x-axis timing sweep was performed by the phase interpolator, and the y-axis voltage sweep 
was performed by the reference voltage generator located at the LPDDR4 memory. The 
reference voltage of the memory can be changed by the DQ and DQS. From the DQ[0] to 
DQ[6] are used to change the reference voltage and from the DQ[8] to DQ[13] are used to 
receive the sampled feedback value of the command training. The received values are 
detected and judged by the comparator. The receiver path used at normal operation cannot 
be used at the command training because receiver path is not initialized yet. 5 kinds of 
transmission patterns are used to reduce pattern dependency and also dummy pattern, in 
which the considering characteristics, is used between each training pattern to initialize 
feedback path. If all 5 patterns has properly feedback, it is judged as "PASS", which means 
that point of eye in open. Otherwise, it is judged as "FAIL" which means that point of eye 
in closed. The bubble correction using moving average algorithm is used to reduce the error 
caused by noise. 
39 
 
3.3.4 WRITE LEVELING AND DATA STROBE TRANSMISSION PATH 
ARCHITECTURE 
 
 The write leveling is the training, which is aligning the rising edge of the DQS and 
CK. Fig 3.3.8 shows the circuit block diagram of the write leveling. The DQS should be 
shifted 180˚ to arrive same time with CK signal. The local delay-locked loop is used to 
shift the DQS signal. And, the delay line is used to compensate skew between DQS and 
CK. The sampled DQS value is returned through feedback pass of the receiver. The 
feedback values of the DQS[0] and DQS[1] are detected by comparators at DQ[0] and 
DQ[15], respectively. The timing difference is exist between the DQS and CK before write 
leveling. After performing the write leveling, the timing difference between DQS and CK 
is reduced to 0.75 - 1.25tCK. Ideally, the difference is 1tCK. The 1tCK shifting is carried 
out by the clock, and the ±0.25tCK is carried out by the delay line. The write leveling 
 
Fig. 3.3.8 Write leveling circuit block diagram 
LPDDR4
memory 
controller
local DLL+PI
REF 180ºPLL
DQ DQS timing
DL200
16:1
SER
DQS[1] sampled 
value feedback
DQS[1]t
local DLL+PI
180º DL200
CKt
CKc
DQ[15]
Boot
Mux
DQS[1]
DQS[1]c
local DLL+PI
REF 180º
DQ DQS timing
DL200
16:1
SER
DQS[0]t
DQS[0]
DQS[0]c
DQS[0] sampled 
value feedback
DQ[0]
LPDDR4 memory
LPDDR4 memory controller
C
L
K
T
R
E
E
40 
 
actions of the DQS[0] and DQS[1] are proceeded in parallel. Fig. 3.3.9 shows the timing 
diagram of the write leveling. Two consecutive DQS pulses are transmitted through the 
DQS path. And the CK signal is sampled by the ringing edge of the DQS. If the feedback 
value is changed from "LOW" to "HIGH", the write leveling is ended. The moving average 
algorithm is used for same reason of the command training. To ensure the results, the DQS 
transmission sequence is performed two times under the one timecode.  
 
 
Fig. 3.3.9 Write leveling timing diagram 
PHY CLK
@ memcon 
DQS
@ memcon
CLK
@memory chip
CK flight time
DQS
@memory chip
DQS
@memory chip
CLK & DQS edge align
Operation
PHY CLK
@ fixed DL100
PHY CLK
@ local CKCH DLL
CLK delay time(100ps)
CLK shift (180º)
DQS shift (180º)
DQS flight time
DQS DL200 shift
CLK GT enable
0 delay
41 
 
3.3.5 READ TRAINING AND READ PATH ARCHITECTURE 
 
The read training is the training, which is find the eye center of the DQ transmitted 
from memory at memory controller side. It finds the rising edge of the DQS signal, which 
is transmitted from memory, at the timing center of the DQ, and find the center of the 
reference voltage at the receivers of the memory controller. Fig. 3.3.10 shows the circuit 
block diagram of the read training. The DQS should shift 90º and the DQ should shift 0º to 
locate rising edge of the DQS at center of the DQ. The local delay-locked loops are used to 
phase shift in each read path. The phase interpolator is used to shift the timing of the 
receiver and the reference voltage is used to control the sampling voltage. In addition, the 
delay line is used to compensate per pin skew. The clock domain closing point from DQS 
to clock of the memory controller exists in the 1:16 de-serializer. The clock domain 
 
Fig. 3.3.10 Read training circuit block diagram 
LPDDR4
memory 
controller
PLL
DQ[0:7],
DMI[0]
local DLL+PI
90º
DQS[0]t
DQS[0]
DQS[0]c
LPDDR4 memory
LPDDR4 memory controller
local DLL+PI
0º replica
CTLE
LA
CTLE
LA
VrefGen
DL200
replica
DL200Sampler1:16 DESCDC
DQ[8:15],
DMI[1]
local DLL+PI
90º
DQS[1]t
DQS[1]
DQS[1]c
local DLL+PI
0º replica
CTLE
LA
CTLE
LA
VrefGen
DL200
replica
DL200Sampler1:16 DESCDC
local DLL+PI
180º DL200
CKt
CKc
Boot
Mux
local DLL+PI
REF 0º 45º 135º
CMD timing
16:1
SER
.75UIM
ask
Boot
Mux
CA[5:0],
CS
READ CMD
C
L
K
T
R
E
E
42 
 
crossing, read latency training and byte-aligning is performed in the 1:16 de-serializer with 
clock domain crossing circuit.  
To find eye center at the read training, the clock pattern is used to find first control 
code of the phase interpolator, reference generator and delay line. The training controls of 
the DQS[0] and DQS[1] of the phase interpolator and reference generator are performed 
parallel. On the other hand, the delay line control for compensate per pin de-skewing is 
performed per DQ. After eye center detection, a predetermined test patterns are used to find 
the read latency. Fig. 3.3.11 shows the read training timing diagram. 
 
Fig. 3.3.11 Read training timing diagram 
DQS
@ memory chip
DQ, DMI
@ memory chip
DQ, DMI
@memcon chip
DQ flight timeskew
Operation
Assumption : MAX(DQS, DQ, DMI skew) < 100ps
DQS
@memcon chip
DQS
DQS flight time
DQS
PHY CLK
@ memcon chip
CLK domain 
crossed
DQ, DMI
sampled
DQ, DMI @ DFE
DQS buffer delay
DQS
DQS local DLL 90
DQS DL 200
DQ buffer delay : DQ local DLL 0 replica
CA/CS READ CMD
DQ DL 200
43 
 
3.3.6 WRITE TRAINING AND WRITE PATH ARCHITECTURE 
  
The write training is the training, which is find eye center of the DQ transmitted from 
memory controller to memory at memory side. In addition tDQS2DQ delay is caused by 
the source synchronous unmatched scheme, and per pin skew are should be compensated. 
Fig 3.3.12 shows the block diagram of the write training circuit. The write path consists of 
the 16:1 serializer, 200ps and 800ps delay line, and 90º shift local delay-locked loop. The 
 
Fig. 3.3.12 Write training circuit block diagram 
LPDDR4
memory 
controller
PLL
local DLL+PI
180º DL200
CKt
CKc
Boot
Mux
local DLL+PI
REF 180º
DQ DQS timing
DL200
16:1
SER
DQS[1]t
DQS[1]
DQS[1]c
LPDDR4 memory
LPDDR4 memory controller
local DLL+PI
REF 90º
DQ DQS timing
DL800
DL200
16:1
SER
DQ[8:15],
DMI[1]
local DLL+PI
REF 180º
DQ DQS timing
DL200
16:1
SER
DQS[0]t
DQS[0]
DQS[0]c
local DLL+PI
REF 90º
DQ DQS timing
DL800
DL200
16:1
SER
DQ[0:7],
DMI[0]
local DLL+PI
REF 0º 45º 135º
CMD timing
16:1
SER
.75UIM
ask
Boot
Mux
CA[5:0],
CS
WRITE CMD
READ CMD
READ
circuit
READ
circuit
tD
Q
S2
D
Q
 d
el
ay
tD
Q
S2
D
Q
 d
el
ay
C
L
K
T
R
E
E
44 
 
DQ signal should be shift 90º, so the DQS rising edge locates margin center of the DQ 
timing, because the DQS signal is 180º shifted for write leveling as mentioned in section 
3.3.4. The 1x2y3x eye center detection algorithm is used in the write training. Fig. 3.3.13 
shows the timing diagram of the write training. In the same way as command and read 
training, the write training adopts the bubble correction using moving average algorithm to 
reduce the error caused by noise. If all the 5 times of write attempts are properly feedback, 
it is judged as "PASS" which means that point of eye is open. Otherwise, it is judged as 
"FAIL" which means that point of eye is closed. The 800ps delay line is used to compensate 
the tDQS2DQ delay and the 200ps delay line is used to compensate per pin skew. The write 
 
Fig. 3.3.13 Write training timing diagram 
DQ
@memcon chip
CLK
@memcon chip
Operation
Assumption : MAX(DQ skew) < 100ps && 200ps<TDQS2DQ<800ps
DQS phase
@ memcon chip
write leveled
= memCLK
DQ
@memcon chip
Serialized data
DQS @ memory chip
write leveled + Buffered
tDQS2DQ delay
DQS flight time
DQS -90º phase 
@ memcon chip
= DQ TX CLK
 DQ CK + DL800
@memcon chip
 DQS + DL200
@memcon chip
DQ @memory chip
sampler
DQ flight time
DQS @ memory chip
write leveled
DQ @memory chip
sampled
DQS replica DLL -90
DL 800
GTEN 200(DL 200)
45 
 
latency is preset according to the operation frequency. The read command is used to check 
the write values. Therefore, unlike other training, the write training uses read and write 
command. 
46 
 
3.3.7 NORMAL READ/WRITE OPERATION AND MARGIN TEST 
 
The normal read/write operation and margin test are the training which is a procedure 
for confirmation after the command training, write leveling, read training, and write 
training. The normal test and margin test are not included in the specification of the 
LPDDR4 memory [1.1.10]. But the proposed LPDDR4 memory controller supports the 
optional test. Fig. 3.3.14 shows supported option list of the normal operation and margin 
test method. The bank active command is transmitted to the memory for all bank active 
operation. After that, the normal write and read command is executed to confirm the 
operation of the training correctly. Many options can be selectable. The command sequence 
option is a choice of two which is 8 consecutive write and read command or 8 consecutive 
write and 8 consecutive read. For example write  write  …  write  read  read  
 
Fig. 3.3.14 Supported option list of normal operation and margin test method 
• After write training begin normal operation test
• CMD (RD/WR)
• (WR  RD) X8
• (WR X 8) à (RD X 8)
• BANK address from 0 to 7
• 0 and 7
• 0  1  2  …  6  7
• x selected by i2c
• LOW address : MSB 3bit
• 0
• 0  1  2  …  6  7
• DQ : from 0 to 15
• DQ[0:15]
• DQ[0:7] or DQ[8:15]
• DQ[x]
• DQ pattern
• 2^7-1 PRBS
• 16 bit special pattern by i2c
• Margin test
• CMD(CMD PI sweep, memchip VrefCMD @ CMD TR)
• Write(TX DQS PI sweep, memchip VrefDQ) @ NOR)
• Read(RX DQS PI sweep, memcon VrefDQ) -> manual
47 
 
…  read or write  read  write  read  … write  read are selectable. There are 
three selection options of the bank address are exist, one particular address, or 000  001 
 010  ...  111, or 000  111  000  111  …  000  111. And, two selection 
options of the MSB of the 3bits of the low address are exist, fixed 000 or 000  001  
010  …  111. Three DQ selection options are 2btye, which is DQ[0:15], or 1byte, 
which is DQ[0:7] or DQ[8:15], or one particular address. 
The write margin test is a feature added in order to check the margin of the write after 
training. From the eye center code, the timing and voltage margin are detected by timing 
and voltage code sweep. The timing shifted by the phase interpolator of the DQS and the 
voltage shifted by the reference generator of the memory. The mode register write (MRW) 
command is used to transmit the voltage shift command to the memory. The write margin 
test is automatically executed after the normal operation test. The read margin test is 
manually operate at the receiver of the controller to find margin of the read command at 
eye center of the read training. The read margin test uses almost the same ways as that of 
the write margin test. The command margin is tested during the command training.  
48 
 
 
 
 
CHAPTER 4 
 
 
 
LPDDR4 MEMORY CONTROLLER ARCHITECTURE 
MODELING AND CIRCUIT DESIGN 
 
 
 
4.1 OVERALL LPDDR4 MEMORY CONTROLLER ARCHITECTURE 
MODELING 
 
Fig. 4.1.1 shows architectural block diagram of the proposed LPDDR4 memory 
controller. It consists of the phase-locked loop, delay-locked loop, clocks and commands 
 
Fig. 4.1.1 Block diagram of the proposed LPDDR4 memory controller 
LPDDR4
memory
DQ[15:0], DMI[1:0]
DQS[1:0]
DQ local DLL
DQS local DLL
CMD(CS, CA[5:0])
CLK & CMD local DLL
CLK CH
RESET, ODT_CA, CKE
LPDDR4 controller
ZQ R
ADPLL Global DLL
Test
Circuit
I2C
SNU
LPDDR4
memory
controller
49 
 
transmitters which is command (CS and CA) and clock (CK) path, DQ and DQS 
transceivers which is the read and write path, digital circuit of the memory controller, and 
other test and control circuits. The read and write path of the transceiver consists of a 
transmission path of the DQs and DQSs to transmit write data, a receive path of the DQs 
and DQSs to receive read data, and a clock and command path to transmit command and 
clock signals to memory. The clock signal of the memory controller is generated at the 
phase-locked loop, and distributed to each delay-locked loops and transceivers, through the 
clock tree. The read strobe signal is distributed to each DQ through the receiver strobe 
delay-locked loop. 
The digital block operates with divided by 8 clock of the output clock of the phase-
locked loop, and transmits the data and the command to each transceivers. The each 
transmitters and receivers of the DQ, DMI, and command paths are connected by 16 data 
 
Fig. 4.1.2 Modeling diagram of the proposed LPDDR4 memory controller 
Training
CMD generation
DQ(S) generation
Adaptation
Calibration
RD/WR control
Main controller.v
DRAM initialization
Command control
Boot controller.v
DQ RD Path
LPDDR4 controller 
PHY.sv
LP
D
D
R
4 
m
em
or
y.
sv
Enable Done
RL/WL
DRV EN
DQ[15:0]
DQS(B)[1:0]
DMI[1:0]
CS
CA[5:0]
CLKPHY/8
DMI RD Path
DQS RD Path
DQ WR Path
DMI WR Path
DQS WR Path
CMD path
CLK path
PLL
DLLCMDboot
CLKREF
DQ[15:0]
DMI[1:0]
DQS(B)[1:0]
CS
CA[5:0]
CLK(B)
CKE
RESET
ODT_CA
LPDDR4 Memory controller
CH
CH
CH
CH
CH
CH
50 
 
signal lines from the digital control circuit. Resistance for ZQ calibration is exist in the 
memory side. Similarly, in the memory controller, resistance exists for ZQ calibration of 
the controller side. The value of the ZQ calibration resistance is 240Ω. The phase-locked 
loop operates with the 66MHz reference clock. The output frequency of the phase-locked 
loop varies from 266MHz to 2133MHz which is 266MHz step. The global delay-locked 
loop receives output clock of the phase-locked loop, and transmits lock code to the local 
delay-locked loops, which located at the transceivers. The global delay-locked loop stops 
operation after lock to reduce the power consumption. The local delay-locked loops exists 
in the transmission and receiver path of the DQS, and command and clock path. 
As shown in Fig. 4.1.2, the digital control circuits, such as the main controller and the 
boot controller, are coded in Verilog language. For verification of the digital control circuits, 
all analog circuits, such as the phase-locked loop, delay-locked loop and transceivers, are 
modeled in system Verilog language. As described before, the main controller operates with 
divided by 8 clock of the LPDDR4 operation frequency, and the boot controller operates 
with divided by 2 clock of the reference clock. Linked operation between these two digital 
blocks and operation flow chart of entire training sequence are given in appendix. In 
addition, system Verilog modeling of the LPDDR4 memory is provided from SKhynix in 
order to verify the proposed LPDDR4 memory controller. And the channel which makes it 
possible to realize the channel delay is modeled by system Verilog to realize the per pin 
skew. The timing blocks, such as serializer and de-serializer, are one-to-one matching of 
gate-level modeled to reduce the error. 
 
51 
 
4.2 SIMULATION RESULT OF LPDDR4 MEMORY CONTROLLER 
MODELING 
 
Section 4.2 shows simulation results of the memory controller modeling. As shown 
Fig 3.1.1, operation sequence of the proposed memory controller is shown. From Fig. 4.2.1 
to Fig. 4.2.3 show initialization step sequence of the boot up operation. First, the MRW 
commands are sent sequentially to the memory. After that, ZQ calibration and ZQ latch are 
 
Fig. 4.2.1 Boot up sequence - initialization step 1 
 
Fig. 4.2.2 Boot up sequence - initialization step 2 
52 
 
performed. 
From Fig. 4.2.4 to Fig. 4.2.12, modeling simulation results of the command training 
are shown. Fig 4.2.4 shows entry timing of the command training. When CKE signal goes 
low, the command training starts after the tCAENT. The operation speed is changed from 
boot frequency to the normal operation speed of the LPDDR4 memory. For example, speed 
 
Fig. 4.2.3 Boot up sequence - initialization step 3 
 
Fig. 4.2.4 Timing of command training entry 
53 
 
is changed from 33MHz to 2133MHz. Fig. 4.2.5 shows the setup and hold timing margin 
for the reference voltage sweep at the command training. The both margins are 2ns. Fig. 
4.2.6 shows the training pattern of the CS training. The values of the CA are fixed to 
predefined value, and only CS signal toggle width of 0.75tCK when the CS training. The 
 
Fig. 4.2.6 Training pattern of CS training 
 
Fig. 4.2.5 Setup and hold timing margin for reference voltage sweep at command training 
54 
 
CS training is the x axis timing sweep training, but if the timing pass zone is not found in 
the predefined reference voltage, the value of the reference voltage would be changed until 
it finds the timing pass zone. Also the x axis value is changed, if the voltage pass zone is 
not found in fix timing value. Fig. 4.2.7 shows this exceptional case of the reference voltage 
 
Fig. 4.2.8 1x2y3x eye detection algorithm in CA training 
 
Fig. 4.2.7 Reference voltage sweep in CS training 
55 
 
change. At the CS training, if there is no pass zone in particular a time code, the time code 
is changed and the voltage sweep is performed again to find the pass zone. Fig. 4.2.8 shows 
the CA training results. The 1x2y3x eye center detection algorithm is performed. First in 
the 1x sweep, the reference voltage code is fixed at 23.2 which means 23.2% of the VDDQ, 
and the timing training is performed. Second in the 2y sweep, the timing code is fixed at 
1001001 which means binary value of the phase interpolator code of 73/128, and the 
reference voltage training is performed. Finally in the 3x sweep, the reference voltage code 
is fixed at 23.2% of the VDDQ, and the timing training is performed. Fig. 4.2.9 shows 
training patterns of the CA training. The training patterns are changed in the order of A  
0  B  0  C 0  D  0  E  0  A···. The 5 kinds of data patterns are used 
to test various environment, and 0 pattern is inserted between data patterns to prevent 
timing error. Fig. 5.1.10 shows the result of the command training. The value of timing 
code is 0011110 which means binary value of the phase interpolator code of 30/128, and 
 
Fig. 4.2.9 Training pattern of CA training: 0A0B0C0D0E0A… 
0
pattern
A
Pattern
111001
0
pattern
B
Pattern
000110
0
pattern
C
Pattern
010001
0
pattern
D
Pattern
101110
0
pattern
E
Pattern
101101
0
pattern
 
Fig. 4.2.10 Result of the command training 
56 
 
the value of voltage code is 14.8% of the VDDQ.  
Fig. 4.2.11 and Fig 4.2.12 show the exit timing of the command training and changing 
of the operation speed at the end of the command training. After the command training, 
operation speed of the LPDDR4 memory lowers to 33MHz to write the result of the 
command training value at the LPDDR4 memory. As shown in Fig. 4.2.12, it will be fasted 
again for the write leveling. 
 
Fig. 4.2.11 Exit timing of the command training 
 
Fig. 4.2.12 Operation speed change in the end of command training 
57 
 
Fig. 4.2.13 and Fig. 4.2.14 show the write leveling. Fig. 4.2.13 shows entry timing of 
the write leveling. Predefined command is sent to the memory to enter the write leveling. 
After tWLDQSEN and tWLMRD, the DQS and other signals are sent to the memory. As 
shown in Fig. 4.2.14, the DQS[1] is aligned with CK at DQS1_CODE 6, and the DQS[0] 
is aligned with CK at DQS0_CODE 10. 
 
Fig. 4.2.13 Entry timing of the write leveling 
 
Fig. 4.2.14 Write leveling 
58 
 
From Fig. 4.2.15 to Fig. 4.2.17, simulation results of the read training are shown. As 
shown in Fig. 4.2.15, first, the phase interpolator code is swept to lock the phase 
interpolator code. And each delay line in DQ is swept to check the skews of the each DQ. 
Fig 4.2.16 shows command transmission of the read training. The DQs receive predefined 
clock patterns to lock the reference voltage. Fig. 4.2.17 shows the result of the read training. 
From DQ[0] to DQ[7], including DMI[0], the DQs have common eye open window. And 
DQS[0] signal leads every DQs to compensate DQS buffering delay in memory controller. 
 
Fig. 4.2.15 Training code sweep of the read training 
 
Fig. 4.2.16 Environment of the read training 
59 
 
Fig. 4.2.18 and Fig. 4.2.19 show the write training. In Fig. 4.2.18, the min code is 
reflected to each DQ code to compensate the tDQS2DQ skew. For example, DQ[0] code is 
 
Fig. 4.2.18 Training code sweep of the write training 
 
Fig. 4.2.17 Result of the read training 
 
Fig. 4.2.19 Write training pattern 
 
60 
 
reduced from 77 to 24 by subtraction of common min code 53. Fig 4.2.19 shows the write 
training patterns. Consecutive 5 write commands are sent to the LPDDR4 memory. After 
that, 5 consecutive DQS and DQs are sent to the memory. The DQS[0] signal leads every 
DQs to compensate DQS buffering delay called "tDQS2DQ delay" in the LPDDR4 
memory. 
 
61 
 
4.3 LPDDR4 MEMORY CONTROLLER CIRCUIT DESIGN 
 
Section 4.3 describes sub block design of the LPDDR4 memory controller, including 
the phase-locked loop and delay-locked loop which is related to clocking and transceiver 
circuits. 
 
4.3.1 PHASE-LOCKED LOOP 
 
Fig. 4.3.1 shows block diagram of the phase-locked loop. It consists of the phase 
frequency detector, digital loop filter, delta-sigma modulator, digitally controlled oscillator, 
and feedback divider. A phase-frequency detectable time-to-digital converter [4.3.1], acting 
as phase frequency detector, filters phase information. The digital loop filter and delta-
sigma modulator provide operation code of the digitally controlled oscillator. The digitally 
controlled oscillator generates output frequency. The divider circuit divides the output 
clock of the digitally controlled oscillator to provide feedback clock to the phase-frequency 
 
Fig. 4.3.1 Block diagram of the phase-locked loop 
PFDTDC DLFD[5:0]
Reference clock
66.6MHz
DSM DCOFCW[9:0]
LOCK DETECT
/8
/7
/6
/5
/1
/2
/4
/8
/8
PHY CLK
SYS CLK
DIVIDER
DLF CLK
62 
 
detectable time-to-digital converter. The time resolution of the phase-frequency detectable 
time-to-digital converter, which has 300ps of dynamic range, was designed with a 
resolution of 10ps. The digital loop filter consists of 10bits of integer bit and 25bits of 
fractional bit. 
The frequency of the reference clock is 66.6MHz. The operation range of the phase-
locked loop is from 1333MHz to 2133MHz, 266MHz step. As shown in Fig. 4.3.1, the 
operation frequency is determined by dividing factor of the divider. The value of the 
dividing factor is selected one of the 20, 24, 28, and 32. The dividing factor 20, 24, 28, and 
32 can be made by combination of 4 and 5 or 6 or 7 or 8. To provide 266MHz step of 
operation clock of the LPDDR4 memory from 266MHz to 2133MHz, output of the phase-
locked loop is divided by 1 or 2 or 4 or 8. The clock for digital circuit of the memory 
controller is divided by 8 of operation clock of the LPDDR4 memory. 
Fig. 4.3.2 shows architecture of the digital loop filter. The digital loop filter consists 
of the clock domain crossing circuit, thermometer to binary decoder, digital loop 
calculation, and first order delta-sigma modulator. The clock of the digital loop filter uses 
the divided by 5 or 6 of the digitally controller oscillator output clock.  
At front-end of the digital loop filter, 31-bit thermometer code of the time-to-digital 
 
Fig. 4.3.2 Digital loop filter architecture 
F/F F/F F/F
Th2D
α
β
F/F
+
+ F/F + +
F/F
Q
Thermo code
TDC CLK
DLF CLK
Proportional control
DCO 
control
code
Clock domain crossing Digital loop filter Delta sigma modulation
63 
 
converter output is clock domain crossed and transformed to 6-bit signed code. The 
transformed code is transmitted to proportional and integral path. The proportional path 
and integral path gain are designed to get the gain between 21 to 2-14 and 20 to 2-15, 
respectively. The integer value of the delta-sigma modulator is sent to digitally controlled 
oscillator to operate frequency control. 
A metastability problem [4.3.2]-[4.3.5] can be occurred at the clock domain crossing 
point of the front-end of the digital loop filter. Generally, sequentially placed flip-flop can 
be used to solve this problem. As shown in Fig. 4.3.2, output of the time-to-digital converter 
is sampled by sequentially placed three flip-flops with digital clock to reduce the 
metastability problem. The delta-sigma modulator is used to reduce in-band noise by 
shaping quantization noise. In addition, it has advantages of increasing effective resolution 
of digitally controlled oscillator. As shown in Fig. 4.3.2, the proposed phase-locked loop 
 
Fig. 4.3.3 Digitally controlled oscillator architecture 
Proportional controlDCO control code
DCR
VDD PLL
VDD DCO
+
-
+
-
+
-
-
+
-
+
-
+
Mode sel
64 
 
uses 1st order delta-sigma modulator in front of the digitally controller oscillator. A wide 
range digitally controller oscillator is needed to cover the range from 1333MHz to 
2133MHz for specification of the LPDDR4 data rate between 533Mbps to 4266Mbps. Fig. 
4.3.3 shows, architecture of the proposed ring type oscillator [4.3.6]-[4.3.8]. The voltage 
of the VDD DCO node is changed to get the target frequency, and voltage of the VDD 
DCO is controlled by the digitally controlled resistor. If the margin was 50% in order to 
satisfy all of the corner condition, a frequency range of the digitally controller oscillator in 
the nominal condition is from 500MHz to 3000MHz, approximately. However, if it is 
designed in this way, the KDCO of the oscillator will be very high, so the operating 
characteristics of the phase-locked loop is deteriorated. The 2 bit mode selection is adopted 
to solve this problem. The mode selection is designed to change the operation frequency 
with control code. The center frequency of the digitally controller oscillator is changed by 
the mode selection code. Therefore wide tuning range is achieved with low KDCO. 
 
65 
 
4.3.2 DELAY-LOCKED LOOP AND PHASE INTERPOLATOR 
 
The overall architecture of the delay-locked loop is described in Fig. 4.3.4 [4.3.9]. The 
delay-locked loop consists of a global delay-locked loop, located near the phase-locked 
loop, for fast-locking and a local delay-locked loop in each channel to compensate for PVT 
variations and to reduce the high-frequency jitter. The global delay-locked loop is 
composed of a coarse time-to-digital converter and a fine time-to-digital converter for fast-
locking, a digital block to handle delay codes transmitted from the time-to-digital converter, 
and a delay line which is made up of a coarse delay line and a fine delay line. The local 
delay-locked loop of each channel has a digital window phase detector to reduce the high-
frequency jitter, a digital loop filter and its own delay line. As shown in Fig. 4.3.4, the delay 
line consists of coarse delay lines and fine delay lines. 
 
Fig. 4.3.4 Delay-locked loop architecture and Coarse and fine delay cell of the delay-locked 
loop 
Coarse 
Delay Line
Fine Delay 
Line
Digital 
Block
CKIN
Delay Line
CDL FDL
Lock 
Detector
DLF RFDL
Delay Line
/N
Global DLL Local DLL
Fine
TDC
Coarse
TDC
CKIN CKDLL
/N
CKIN
Delay Unit
TDC
66 
 
Fig. 4.3.5 shows operation flow chart of the delay-locked loop. The global delay-
locked loop uses a coarse time-to-digital converter and a fine time-to-digital converter to 
lock quickly when the chip starts up. When the phase-locked loop locks, the coarse time-
to-digital converter locks coarse delay lines, and when this is accomplished, the coarse 
time-to-digital converter issues a lock detection signal that triggers the fine time-to-digital 
converter. After fine delay lines is locked, the fine time-to-digital converter issues its own 
lock detection signal. All the circuits in the global delay-locked loop are powered down, 
and then delay codes, generated by the time-to-digital converter, are transmitted to the 
digital loop filter in the local delay-locked loop. Powering down all the circuits in the global 
delay-locked loop when locking is complete, mitigates the impact of the high power 
consumption associated with the time-to-digital converter based delay-locked loop. 
The phase detector based local delay-locked loop responds to the delay codes issued 
 
Fig. 4.3.5 Operation flow chart of the delay-locked loop 
PLL Lock
Global DLL 
Power On
Coarse TDC 
Operation
Fine TDC 
Operation
Global DLL 
Power Off
Local DLL 
Power On
PVT Tracking
Global DLL
 Lock 
Complete
Local DLL
67 
 
by the global delay-locked loop to compensate for the PVT variations in each strobe 
channel. This local delay-locked loop uses a window phase detector, as shown in Fig. 4.3.6, 
rather than a bang-bang phase detector, to reduce the dithering and high-frequency jitter. 
The window phase detector is composed of replica fine delay lines, a lock detector that 
judges the success of the 180° phase-shift lock, and a frequency divider that provides a 
low-frequency clock signal to activate and drive the window phase detector after the lock 
flag has been received. Fig. 4.3.6 shows how the window phase detector operates. The 
replica fine delay lines change delay codes to control the values of the signals dCKIN + nΔt, 
 
Fig. 4.3.6 Block diagram of the digital window phase detector in the local delay-locked loop and 
timing diagram and operation of the digital window phase detector 
~~~~~~~~~~~~~~~~~~~
dCKDLL+2nΔt
/N /N
/N /N
Lock Complete
Lock Fail
Window
n ↓
Window
n ↑
2nΔt2nΔt
dCKIN+nΔt
dCKDLL
Lock Flag
~~~~~~~~~~~~~~~~~~~
Lock!!
dCKIN+nΔt
dCKDLL+2nΔt
/N
/N
Lock Flag
CKDLL
CKIN
CKDIG
RFDL
dCKDLL
DQ
D
D
Q
Q
68 
 
dCKDLL and dCKDLL + 2nΔt, where n is the value of delay codes sent to the replica fine 
delay lines, and Δt is their delay resolution. The rising edges of dCKDLL and dCKDLL + 2nΔt 
form a window with a width of 2nΔt. If the falling edge of dCKIN + nΔt is caught by this 
window, then the PD<1:0> signals have different values and a locking state is entered: The 
lock detector sets the lock flag to ‘1’ and sends it to the replica fine delay lines, frequency 
divider, and MUX. When the lock flag is transmitted to the replica fine delay lines, delay 
codes of the replica fine delay lines are decreased so as to narrow the window, and the 
window phase detector is now operated by the CKIN/N and CKDLL/N signals from the 
frequency divider to lower dynamic power consumption. If some combination of CKIN jitter, 
supply and ground noise, and PVT variations cause the local delay-locked loop to break 
the locking state, then the window phase detector increases delay codes of the replica fine 
delay lines to widen the width of its window, and it is again operated by CKIN and CKDLL 
instead of CKIN/N and CKDLL/N; and then it is re-entered the locking state. Repeatedly 
controlling window size to adjust loop gain of the local delay-locked loop prevents the 
 
Fig. 4.3.7 Phase interpolator 
Phase sel
Slew rate
control
Clock sel
69 
 
dithering phenomenon and reduces both the high-frequency jitter and the dynamic power 
consumption of the digital window phase detector which is proportional to the clock 
frequency. 
Fig 4.3.7 shows a phase interpolator. The phase interpolator is coupled to the local 
delay-locked loop to support each training operations and evaluate the performance of the 
LPDDR4 memory controller and memory system by generating 1UI/64step clock signals. 
The clock signal after the phase interpolator is transmitted to the serializer or de-serializer 
for data serializing or de-serializing. 
70 
 
4.3.3 TRANSMITTER OF LPDDR4 MEMORY CONTROLLER : WRITE 
PATH 
 
As shown in Fig. 3.3.12 of the write path, a transmitter consists of a serializer, delay 
line, and driver which are supporting LVSTL. This section describes sub circuits of the 
transmitter. 
Two kinds of delay line is used in the LPDDR4 memory controller. A 200ps delay line 
has 250ps delay coverage to compensate per-pin de-skewing of write and read data. A 
800ps delay line has 1000ps delay coverage to compensate tDQS2DQ delay of the DQS 
buffering. Fig. 4.3.8 shows the block diagram of the 200ps delay line. The delay line 
consists of coarse delay lines and a find delay line. The coarse delay unit of the coarse delay 
line has a single-input single-output NAND chain structure. One coarse unit delay has two 
NAND delay, approximately 64ps. The find delay line has 1/16 phase interpolator structure, 
and resolution of the find delay line is 4ps which is 1/16 of the coarse unit delay. The 200ps 
delay line has four coarse delay units and one fine delay line, and 800ps delay line has 
 
Fig. 4.3.8 Block diagram of the 200ps delay line 
PASS0PASS0b
CLK IN
C
oa
rs
e 
ou
t
PASS1PASS1b PASS0PASS0b PASS0PASS0b
High
High
Coarse delay unit Coarse delay unit Coarse delay unit Coarse delay unit
Coarse delay replica
CLK OUT
Fine delay line
71 
 
sixteen coarse delay units and one fine delay line. 
Fig. 4.3.9 shows the block diagram of the 16:1 serializer. The serializer receives 
sixteen parallel data and control signals from the digital control circuit. The serializer is 
designed in consideration of the specification of the LPDDR4 memory and source 
synchronous scheme. The delay more than 1UI at 4266Mbps operation is needed to meet 
the tDQS2DQ specification which is from 300ps to 800ps. In addition, same clock edges 
should be used to generate the DQS and DQ signal, which are source synchronous. The 
maximum data rate before serialization, which is 266Mbps (data period 3.75ns), can be 
sampled with tDQS2DQ delayed clock. A driver enable signal is used to enable the driver, 
because of bi-directional scheme of the LPDDR4 memory. The driver should be turn offed 
by driver disable signal at read mode. The driver enable signal was similarly designed. The 
driver enable signal and data signal take same stages of flip-flops to keep the same write 
latency. The driver is designed to turn on only for the time of the data sent by driver enable. 
The serializer receives control signals of the CLK EN, DRV EN and CLK. The CLK signal 
 
Fig. 4.3.9 Block diagram of the 16:1 serializer 
F/F
DRV EN
CLK EN
CLK
DIV EN
F/F F/F
2:1
SER
x8
2:1
SER
WL
WLD[15:0]
8 4
CLK/8
CLK/4
CLK/2
CLK
2:1
SER
4 2
F/F
F/F
2 2:1
SER
F/F F/F
DRIVER 
EN
DATA16
72 
 
is divided by 2, 4, and 8 in divider circuit. Each stages of the serializer carries out 2:1 
serializing with these divided clocks. As shown in Fig. 4.3.9, clock buffers are added to all 
clock paths for insensitive PVT variation. The write latency control circuit adjusts 2UI step 
of write latency of 2, 4, 6, and 8 by using the shifting register. The proposed serializer is 
used in CMD paths, DQ paths, and DQS paths for delay matching and write latency.  
The LPDDR4 memory controller has many I/O pins, and each pin is connected to the 
LPDDR4 memory through low voltage-swing terminated logic driver. Thirty one drivers 
are used in the LPDDR4 controller to transmit signals including DQ/DQS, DMI, command, 
and clock signals. Thus low power consumption and small chip area are important. Fig. 
4.3.10 shows the block diagram of the LVSTL driver [4.3.10]. The LVSTL driver is adopted 
to achieve low power consumption and small chip area. Each driver cell is placed parallel, 
and a driver cells are turned on or off by the control signal. In the receiver mode of the 
LPDDR4 memory controller, impedance matching is required to avoid reflection caused 
 
Fig. 4.3.10 Block diagram of LVSTL driver of the LPDDR4 memory controller 
X
X
OUT
Main driver
Pre-driver
Data
x6
x6
X
73 
 
by impedance mismatching between the channel and the receiver. A pull-down of the driver 
acts as ground termination in the receiver mode. The values of the impedances are 240Ω, 
120Ω, 80Ω, 60Ω, 48Ω, and 40Ω, which is integer divided values of 240Ω. As shown in Fig. 
4.3.10, six parallel pull-down NMOS is placed to match the six variety of impedance value. 
The output impedance value is determined by number of the turned on NMOS. When input 
data is low, the pull-down NMOS is turned on, and the pull-up NMOS is turned off, thus, 
output voltage goes 0V. Otherwise, when input data is high, the pull-down NMOS is turned 
off, and the pull-up NMOS is turned on, and value of the output voltage is determined by 
equation (4.3.1). 
(𝐕𝐕𝐕𝐕𝐕𝐕 × 𝑹𝑹𝑷𝑷𝑷𝑷)/(𝑹𝑹𝑷𝑷𝑷𝑷 + 𝑹𝑹𝑷𝑷𝑷𝑷 ),    (4.3.1) 
where VDD is the supply voltage of the LVSTL driver, RPU is the resistance of the pull-up, 
and RPD is the resistance of the pull-down. The output voltage swing is same as equation 
(4.3.1). The output swing specification of the LPDDR4 memory is VDD/3 or VDD/2.5. 
The target values of the pull-up impedance are 480Ω, 360Ω, 240Ω, 180Ω, 160Ω, 120Ω, 
 
Fig. 4.3.11 Pull down calibration circuit 
PDctrlPDctrl
VOH
240Ω
Comp.
Counter
Pull down calibration
ZQ 
PAD
74 
 
96Ω, and 80Ω. These values of the impedance of the pull-up NMOS are satisfied by six 
parallel pull-up 480Ω NMOS or three parallel pull-up 360Ω. 
First, the pull-down impedance calibration is need. As shown in Fig. 4.3.11, 
impedance of a replica circuit of the pull-down NMOS is adjusted to 240Ω. To calibrate 
the pull-down NMOS, replica cell of pull-down is connected to an external 240Ω resistor 
through the ZQ pad. The DC level of the ZQ pad is compared with VOH in the comparator. 
And counter changes impedance by adjusting the number of pull-down NMOS. The 
information about pull down control is transferred from the replica to the real driver. As 
shown in Fig. 4.3.12, the replica of pull-up NMOS is connected to the pull-down NMOS 
to calibrate pull-up impedance after the pull-down calibration. The number of turned on 
pull-up NMOS is determined by the comparator and counter in same way as the pull-down 
calibration. The default value of the pull-down is 240Ω and pull-up is 480Ω or 360Ω.  
 
 
Fig. 4.3.12 Pull up calibration circuits 
PDctrl
VOH
Comp.
Counter PUctrl
Pull up calibration
75 
 
4.3.4 DE-SERIALIZER WITH CLOCK DOMAIN CROSSING 
 
As shown in Fig. 3.3.10 of the read path, a receiver consists of a continuous time linear 
equalizer, delay-locked loop, phase interpolator, 200ps delay line, sampler, and 1:16 de-
serializer with clock domain crossing circuit. The Delay-locked loop, phase interpolator, 
and 200ps delay line are mentioned above. The conventional continuous time linear 
equalizer and sampler are used. 
The de-serializer is typically present in a receiver circuit. It receives data from 
 
Fig. 4.3.13 Block diagram and timing diagram of the de-serializer with clock domain crossing 
Sampler
(1:2DES)
DQ
DQS
2:4
DES CDC
4:16
DES
Do[0:15]
Dmask
PHYCLK
pre-emble post-emble
DQS
DQ
Ds Di Dc
Ds
D
15
D
14
D
13
D
12
D
11
D
10 D
9
D
8
D
7
D
6
D
5
D
4
D
3
D
2
D
1
D
0
D15
D14
D13
D12
D11
D10
D9
D8
D7
D6
D5
D4
D3
D2
D1
D0
Di D[15:12] D[11:8] D[7:4] D[3:0]
PHYCLK
Do D[15:0]
Dmask
Control code
 from controller
76 
 
transmitter, and transmit data to digital block after de-serialize. The de-serializer in the 
LPDDR4 memory controller performs clock domain changing and byte-aligning with the 
common feature of the de-serializer. The clock domain is changed from the transmitted 
DQS from the LPDDR4 memory to the clock of the LPDDR4 memory controller. First, 
sampling of the DQ is performed by the DQS with the source synchronous scheme. After 
first sampling, the clock domain crossing is needed because DQS toggles only when DQ is 
transferred. In addition, unlike the general receiver, the data transition length of the 
LPDDR4 memory is very short. It toggles sixteen times in the worst case, and it stops. Thus, 
byte-aligning is needed to deliver the aligned data to the digital circuit of the memory 
controller. Fig. 4.3.13 shows the block diagram and timing diagram of the de-serializer for 
the LPDDR4 memory controller. The de-serializer receives the DQS and the DQ signals. 
The DQ is sampled by rising and falling edge of the DQS and 1:2 de-serialized at the 
sampler. The sampled signals, Ds, are sampled divided by 2 signal of the DQS, and 2:4 de-
serialized. The clock domain crossing is performed at the clock domain crossing circuit, 
and clock domain of data, Di, is changed from the DQS to the clock of the memory 
controller. In the read training, pre-defined data is transmitted from the memory to cross 
the clock domain, and clock domain crossing point is detected by the digital control circuit. 
The 4:16 de-serializer in Fig. 4.3.13 receives four bit signal, Dc, and de-serialized it to 
sixteen bit signal. The final de-serialized sixteen bit data is aligned with divided by 8 clock 
and transferred to the digital control circuits to check byte-aligning. The digital control 
circuit automatically calculates timing difference between the clock and de-serialized data 
and performs byte-aligning. 
77 
 
 
 
 
CHAPTER 5 
 
 
 
MEASUREMENT RESULT OF LPDDR4 MEMORY 
CONTROLLER 
 
 
 
5.1 LPDDR4 MEMORY CONTROLLER MEASUREMENT SETUP 
 
5.1.1 LPDDR4 MEMORY CONTROLLER FLOOR PLAN AND LAYOUT 
 
 
Fig. 5.1.1 Microphotograph and layout of the LPDDR4 memory controller 
I2C
Main 
control
digital block
D
Q
8
D
Q
9
D
Q
10
D
Q
11
D
Q
12
D
Q
13
D
Q
14
D
Q
15
D
M
I1
D
Q
S
1
DQ3
DQ2
DQ1
DQ0
DQS0
DQ6
DQ5
DQ4
DMI0
DQ7
C
A4
C
A5
C
A3
C
A2
CA1
CS
CA0
CLK
MEMCLK DLL
CMDCLK DLL
D
Q
D
Q
S
1_
D
LL
DQDQS0_DLL
ADPLL
ADDLL
Boot 
control
and
Timer
Test block
DLL
Delayline
Vrefgen
I2C
Main 
control
digital 
block
D
Q
8
D
Q
9
D
Q
10
D
Q
11
D
Q
12
D
Q
13
D
Q
14
D
Q
15
D
M
I1
D
Q
S1
DQ3
DQ2
DQ1
DQ0
DQS0
DQ6
DQ5
DQ4
DMI0
DQ7
C
A
4
C
A
5
C
A
3
C
A
2
CA1
CS
CA0
CLK
MEMCLK DLL
CMDCLK DLL
D
Q
D
Q
S1
_D
LL
DQDQS0_DLL
ADPLL
ADDLL
Boot 
contro
l
and
Timer
Test block
DLL
Delayline
Vrefgen
78 
 
Fig. 5.1.1 shows microphotograph and layout of the proposed LPDDR4 memory 
controller. The total chip area of the LPDDR4 memory controller is 12mm2. The DQS[0] 
group of the transceivers, which is signal pins, DQ[0:7], DMI[0], and DQS[0], is located 
in right side of the chip. The DQS[1] group of the transceivers, which is signal pins from 
DQ[8:15], DMI[1], and DQS[1], is located in upper side of the chip. The command and 
CK groups of the transmitters, which are signal pins from CS, CA[0:5], and CK, are located 
in upper right side of the chip. The phase-locked loop and global delay-locked loop are 
located in lower part of the chip. The digital control circuits of the LPDDR4 memory 
controller are located in middle left of the chip. The I2C control circuit for testing is located 
in lower left of the chip. The size and location of the circuit based on layout design are 
considered when floor planning. The microphotograph and layout are one-to-one matched 
in Fig. 5.1.1. 
 
79 
 
5.1.2 PACKAGING AND TEST BOARD 
 
As shown in Fig. 5.1.2, four types of package placement are considered when 
packaging and testing. As shown in Fig. 5.1.2 of package on package (PoP) case, the 
LPDDR4 memory is stacked on the LPDDR4 memory controller. The POP is used to stack 
two chips. On the other hand, other printed circuit board (PCB) cases are planned for use 
of thin quad flat pack or ball grid array package. The PCB case1 is the way to connect the 
command and clock signals of the memory controller routing closely with the memory. The 
route length of the DQ[0] and DQ[8] signals is long. In the PCB case2, the DQ[0:7] is 
routed closely, but route length of the DQ[8:15] is long. In the PCB case3, the memory is 
placed on the bottom side of the PCB, on the other hand, the memory controller is placed 
 
Fig. 5.1.2 Packaging and test plan of LPDDR4 memory and memory controller 
MemoryController 
CLK
PoP case
LPDDR4 Controller chip
LPDDR4 Memory chip
CLK
PCB case1
LPDDR4 Controller  
chip
LPDDR4 Memory 
chip
DQ1
D
Q
0
DQ1
D
Q
0
Memory
CLK CLKDQ0
D
Q
1
DQ1
D
Q
0
PCB case2
LPDDR4 Controller  
chip
LPDDR4 Memory 
chip
Memory
C
LK
CLK
D
Q
0
DQ1 DQ1
D
Q
0
PCB case3
LPDDR4 Memory 
chip
LPDDR4 Controller 
chip
Memory
Controller 
CLK
CLK
DQ1
D
Q
0
DQ1
D
Q
0
Controller 
Controller 
80 
 
on the top side of the PCB. The signals of the memory and memory controller are connected 
through PCB vias. Unlike other the PCB cases, layout floor plan of the PCB case3 is 
contrary to other the PCB cases.  
In academic level research, PoP type package is hard to use, thus ball grid array type 
package is used. Among three of the PCB cases, the PCB case2 and PCB case3 are excluded 
because of timing skew caused by difference of channel length and signal attenuation 
caused by PCB via. Fig. 5.1.3 shows PCB of the LPDDR4 memory controller for evaluation. 
Like the PCB case 1 of the Fig. 5.2.2, the memory controller locates on center of the PCB 
and the memory is placed closely. 
 
Fig. 5.1.3 Photo of PCB 
LPDDR4
memory 
controller
LPDDR4
memory
81 
 
5.2 LPDDR4 MEMORY CONTROLLER SUB-BLOCK MEASUREMENT 
 
5.2.1 PHASE-LOCKED LOOP 
 
The phase-locked loop for the LPDDR4 memory controller must be able to create an 
output clock in the range of 1333MHz to 2133MHz. In addition, jitter of the phase-locked 
loop has to have a good performance because it is directly associated with the eye margin 
of signal transmission and reception. Fig 5.2.1 and Fig. 5.2.2 show the jitter measurement 
results of the phase-locked loop. Fig. 5.2.1 shows the measurement results of integrated 
 
Fig. 5.2.1 Measurement results of phase-locked loop integrated jitter 
(a) (c)
(b) (d)
82 
 
jitter according to operation frequency. The integrated jitter from 10kH to 100MHz is 
1.85ps, 2.22ps, 2.28ps and 2.75ps at 2133MHz, 1866MHz, 1600MHz, and 1333MHz 
respectively. Fig. 5.2.2 shows the results of RMS jitter and peak-to-peak jitter measured by 
sampling oscilloscope. The RMS jitter is 2.42ps, 2.46ps, 2.70ps and 3.16ps at 2133MHz, 
1866MHz, 1600MHz, and 1333MHz respectively. And peak-to-peak jitter is 19.60ps, 
21.20ps, 26.80ps and 31.60ps at 2133MHz, 1866MHz, 1600MHz, and 1333MHz. The 
circuit area of the phase-locked loop is 0.39mm2 and consumes 17.47mW at 2133MHz 
operation. 
 
Fig. 5.2.2 Measurement results of phase-locked loop jitter 
 
(a) (c)
(b) (d)
83 
 
5.2.2 DELAY-LOCKED LOOP 
 
Fig. 5.2.3 and Fig. 5.2.4 show the measurement results of the delay-locked loop. The 
waveforms in Fig. 5.2.3 show that the use of a time-to-digital converter in the global delay-
locked loop achieves fast-locking from 0.11GHz to 2.5GHz by observing Lock Start, Lock 
End, CKIN, and CKDLL. This scheme allows the 180° phase-shift delay-locked loop to lock 
 
Fig. 5.2.3 Measured waveforms illustrating delay-locked loop locking behavior at (a) 0.11GHz 
and (b) 2.5GHz 
CKIN
Lock Start
Lock End
6 Cycles
CKIN
CKDLL
0.11GHz 2.5GHz
17 Cycles
180o Lock 180
o Lock
 
Fig. 5.2.4 Measured long-term jitter performance of the proposed delay-locked loop at 2.5GHz 
and measured phase noise plot of the delay-locked loops using bang-bang phase 
detector and the proposed delay-locked loop at 2.5GHz 
CKDLL
Jrms = 2.64ps
Jpp = 20.6ps
 at 2.5GHz
CKIN
Jrms = 2.86ps
Jpp = 25.8ps
at 2.5GHz
CKIN
Pnoise(at 10MHz) 
= -118dBc/Hz
CKDLL(WPD)
Pnoise(at 10MHz) 
= -121dBc/Hz
CKDLL(BBPD)
Pnoise(at 10MHz) 
= -112dBc/Hz
9dB Reduction
CKDLL Integrated Jitter
(10kHz-100MHz) =953fsrms
84 
 
within 6 clock periods at 0.11GHz and within 17 clock periods at 2.5GHz. 
In order to verify jitter-reducing effect of the delay-locked loop using the window 
phase detector, both conventional bang-bang phase detector mode and window phase 
detector mode are employed in the delay-locked loop and the phase noise and jitter 
measurement are performed. At 10MHz frequency offset, with the bang-bang phase 
detector the phase noise of CKDLL is -112dBc/Hz, and with the digital window phase 
detector it is -121dBc/Hz, as shown in Fig. 5.2.4. Thus the phase noise of CKDLL with the 
digital window phase detector is -9dB better than CKDLL with the bang-bang phase detector 
at 10MHz frequency offset; and the integrated jitter from 10kH to 100MHz of CKDLL with 
the digital window phase detector at 2.5GHz is 953fsrms. Fig. 5.2.4, also, shows the long-
term jitter performance of the proposed delay-locked loop. At 2.5GHz, clock frequency, 
RMS jitter and peak-to-peak jitter are 2.64ps and 20.6ps respectively. 
The global delay-locked loop and local delay-locked loop occupy areas of 0.047mm2 
and 0.027mm2 respectively. The global delay-locked loop power offed after lock and the 
local delay-locked loop consumes 3.71mW at 2133MHz. 
85 
 
5.2.3 200PS AND 800PS DELAY LINE 
 
 
Fig. 5.2.6 Measurement results of 800ps delay line 
 
Fig. 5.2.5 Measurement results of 200ps delay line 
86 
 
Fig. 5.2.5 and Fig. 5.2.6 show measurement results of the 200ps delay line and 800ps 
delay line. The 200ps delay line has 64 step of delay control code, and the average 
resolution is 4ps. The dynamic range of the 200ps delay line is 274ps. The 800ps delay line 
has 256 step of delay control code, and average resolution is 4ps. The dynamic range of the 
800ps delay line is 1106ps. 
 
5.2.4 VOLTAGE REFERENCE GENERATOR 
 
Fig. 5.2.7 shows measurement results of the reference generator. The reference 
generator has 127 step of voltage control code, and average resolution is 4mV. The dynamic 
range of the reference generator is 508mV. The voltage reference generator covers from 0% 
of the VDDQ to 42.3% of VDDQ. 
 
Fig. 5.2.7 Measurement results of reference generator 
87 
 
5.2.5 PHASE INTERPOLATOR 
 
 
Fig. 5.2.8 Measured monotonicity of phase interpolator 
 
Fig. 5.2.9 Measured DNL of phase interpolator 
88 
 
Fig. 5.2.8 and Fig. 5.2.9 show measurement results of the phase interpolator. The 
phase interpolator has 64 control code and covers from 266MHz to 2133MHz without 
phase inversion. 
 
89 
 
 
 
Fig. 5.3.1 Measurement results of LPDDR4 
Boot
: Memory power on
CS training
: CS – CLK↑
CA/CS training
: CA/CS – CLK↑
Write leveling
: DQS↑ - CLK↑
Read training
: DQ - DQS↑
à  FIFO CLK↑
Write training
: DQ - memory DQS↑
: Read DQ
Normal operation 
check
: Read & Write verify
Margin test
Start
End
533 1066 1600 2133 2666 3200 3733 4266
2
1 Read EQ control code/rotator code fix
1
2 Read EQ control code/rotator code fixRXDQS DLL/PI code fix
3
3 Read EQ control code/rotator code fix
RXDQS DLL/PI code fix
Speed setup code 1066Mbps(operation 
1600Mbps)
VDDQ = 1.2 -> 1.42V
4 4 4 4 4
4
Read latency check fail
Read digital code fail
RX DQS DLL glitch - fail
5 5 5 5 5
5 External WR path check
DQ-DQ skew
DQ-DQS skew
Driver function
90 
 
5.3 LPDDR4 MEMORY SYSTEM OPERATION MEASUREMENT 
 
Fig. 5.3.1 shows overall measurement results of the LPDDR4 memory controller 
operation. All operations of the LPDDR4 memory controller including the normal read and 
write operation are confirmed up to 1600Mbps with the LPDDR4 memory. At 533Mbps 
and 1066Mbps operation, the digital code of the read training is set manually to bypass the 
digital control part of the read training. At 1600Mbps operation, supply voltage is risen 
from 1.2V to 1.42V for the delay-locked loop operation. The 533MHz setting of the delay-
 
Fig. 5.3.3 Measurement results of LPDDR4 at 1066Mbps operation 
write addr : 53 data : E0  //1066Mbps operation
read data : 
9A 9B 9C 9D 9E 9F A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
FF EF 24 00 3A 02 0A 30 00 00 08 CD 1B 77 49 40 01 50 01 00 1B 77
read addr :             
 
Fig. 5.3.2 Measurement results of LPDDR4 at 533Mbps operation 
write addr : 53 data : F0  //533Mbps operation
read data : 
9A 9B 9C 9D 9E 9F A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
FF 40 24 00 39 02 09 2E 00 00 08 CD 1B 7F 49 22 03 50 34 00 3F 7F
read addr :             
91 
 
locked loop and memory controller are applied at 800MHz operation. In this condition, the 
LPDDR4 memory system operates properly at 1600Mbps including the normal read and 
write operation. Fig. 5.3.2 and Fig. 5.3.3 show the training and operation results of the 
 
Table 5.3.1 Simulated power consumption of LPDDR4 memory controller at 4266Mbps 
operation 
ADPLL*
Local DLL for 
CMD,TX DQS**
TX CMD
CLK trasmission
TX DQ
TX DQS
RX DQ
RX DQS
CLK distribution
Digital controller
Power 
consumption
(mW)
Number of 
circuits
Total power 
consumption
(mW)
Operation
17.4
3.7
7.2
6
7.2
18
7.2
9
60
6
1
3
7
2
18
2
18
2
1
1
17.4
11.1
50.4
12
129.6
36
129.6
18
60
6
Always
Always
Always
Always
Write
Write
Read
Read
Always
Always
Local DLL for 
RX DQS** 3.7 2 7.4 Read
156.9mWOnly transmit command
311.9mWRead operation with command transmission
322.5mWWrite operation with command transmission
* Measured, **Measured value of Test local DLL
92 
 
LPDDR4 memory controller. The I2C register address of 9A means a state of state machine 
in the digital control circuit. The meaning of FF in the state machine state is end of all 
training and operation tests including the normal read and write operation. The I2C register 
addresses of A2 and A3 mean error detection result in normal read and write training. The 
values 00_00 of A2 and A3 addresses can be known that there are no errors in the normal 
operation. 
From 2133Mbps to 4266Mbps, the boot training, command training, and write 
leveling operations work properly. The read training is failed, because of the error of digital 
code in the read training and glitch problem in the local delay-locked loops when the coarse 
code is changed [5.3.1] [5.3.2]. The operation of the transmitters including the LVSTL 
driver is checked from 2133Mbps to 4266Mbps by using external bypass state. 
For the 1x2y3x eye center detection algorithm properly operates at 4266Mbps of the 
command training, which is single data rate mode with 0.75tCK mask. Therefore, its 
operation is verified up to 2843Mbps. 
Table 5.3.1 shows simulated power consumption of the LPDDR4 memory controller 
at 4266Mbps operation. Simulated power consumption of the LPDDR4 memory controller 
at 4266Mbps operation is 156.9mW, 311,9mW, and 322.5mW at the only transmit 
command mode, the read operation with read command transmission and the write 
operation with write command transmission respectively. 
93 
 
 
 
 
CHAPTER 6 
 
 
 
CONCLUSION 
 
 
 
In this thesis, the LPDDR4 memory controller architecture, which is operated with a 
LPDDR4 memory, is proposed and designed with efficient training algorithm, which is 
used at the LPDDR4 memory training sequence. The proposed architecture of the LPDDR4 
memory controller is designed based on the LPDDR4 memory specification in order to 
compose the memory system. The proposed 1x2y3x eye center detection algorithm is 23 
times faster than the conventional two-dimensional eye center detection algorithm. Also, 
the proposed algorithm uses a small memory, and is simple. All circuits and channels on 
system are modeled to verify system. This is necessary to reduce the design and simulation 
time in a system design methodology. 
The operation speed range of the LPDDR4 memory is from 533Mbps to 4266Mbps, 
and the LPDDR4 memory controller is designed to support this range of the LPDDR4 
memory. The phase-locked loop in the LPDDR4 controller is designed to operate between 
1333MHz and 2133MHz. To cover the range of the LPDDR4 memory, selectable frequency 
divider is used to provide operation clock. The output frequency of the phase-locked loop 
with divider is from 266MHz to 2133MHz. The delay-locked loop in the LPDDR4 memory 
94 
 
controller is designed to operate between 266MHz and 2133MHz with 180˚ phase locking. 
The delay-locked loop is used for each training operations, which are the command, date 
read and write. Two type of delay line, the 200ps and 800ps delay line, which have 4ps 
resolution, and the phase interpolator are designed to compensate timing training of the x 
axis. The reference generator, which has 4mV resolution, is designed to compensate voltage 
training of the y axis. 
Fabricated in 65nm CMOS process, the proposed LPDDR4 memory controller 
occupies 12mm2. The operation of the LPDDR4 memory system including all training 
sequence and the normal read and write training, is verified up to 1600Mbps. From 
2133Mbps to 4266Mbps, the boot training, command training, and write leveling 
operations work properly. The read training is failed because of the error of the digital code 
in the read training and glitch problem in the local delay-locked loops when coarse code 
changed. The operation of the transmitter including LVSTL driver is checked from 
2133Mbps to 4266Mbps by using external bypass state. For the 1x2y3x eye center detection 
algorithm, 4266Mbps speed of the command training, which is single data rate mode with 
0.75tCK mask, properly operate. Therefore, its operation is verified up to 2843Mbps. 
Simulated maximum power consumption of the LPDDR4 memory controller at 4266Mbps 
operation is 322.5mW. 
The revised LPDDR4 memory controller which can operate at 4266Mbps has been 
prepared to fix problem. 
 
95 
 
 
 
 
APPENDIX 
 
 
 
OPERATION FLOW CHART OF THE PROPOSED 
LPDDR4 MEMORY CONTROLLER  
 
 
 
 
Fig. A.1 LPDDR4 memory controller operation flow chart 1 
8'h00:RESET
START
8'h01:PWRON
8'h02:TINIT1
8'h03:TINIT3
wait TINIT1? TIMER
Y
N:timerstart
done
8'h04:TINIT5
wait TINIT3? TIMER
Y
N:timerstart
done
8'h05:PHY_PD_
CAL
wait TINIT5? TIMER
Y
N:timerstart
done
PHY_PD_CAL? ZQ CAL
Y
N:PD_CAL_start
done
8'h06:PHY_PU_
CAL
96 
 
 
 
Fig. A.2 LPDDR4 memory controller operation flow chart 2 
Y
8'h06:PHY_PU_
CAL
8'h07:INITIALIZ
E
LP4 BOOT?NBOOT STARTNOOP
8'h08:TCMDCK
EMRW_MA1
MRW_MA2
MRW_MA3
MRW_MA11
MRW_MA12
MRW_MA13
MRW_MA14
MRW_MA15
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
PHY_PU_CAL? ZQ CAL
Y
N:PU_CAL_start
done
8'h09:CKE_H2L
8'h0a:TCKELCK
wait 
TVREF_LONG? TIMER
Y
N
done
8'h10:
CBT_CS_INIT
8'h11:
CBT_CS_VREF_PRE
8'h12:
CBT_CS_VREF
8'h13:
CBT_CS_VREF_LONG
wait 
TVREF_LONG? TIMER
Y
N
done
8'h14:
CBT_CS_SEND
8'h15:
CBT_CS_TADR
wait TADR? TIMER
Y
N
done
8'h16:
CBT_CS_READ
wait 
RDTR_WAIT_RL? state_cnt
Y
N
done
8'h17:
CBT_CS_RECEIVE
wait MAX_WAIT? state_cnt
Y
N
done
97 
 
 
 
Fig. A.3 LPDDR4 memory controller operation flow chart 3 
MRW_MA20
MRW_MA22
MRW_MA32
MRW_MA40
MPC_ZQCAL
TZQCAL
MPC_ZQLAT
TZQLAT
MRW_MA13_FS
P1
MRW_MA11_FS
P1
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TZQCALTIMER
Y
N:TZQCAL_start
done
wait TMRW? N
Y
wait TZQLATTIMER
Y
N:TZQLAT_start
done
wait TMRW? N
Y
Y
repeat>=reserve?
Y
N
8'h17:
CBT_CS_RECEIVE
8'h18:
CBT_CS_UPDATE
8'h19:
CBT_CS_CAL
CBT_CAL end?
Y
N
8'h1a:
CBT_CS_END
8'h1b:
CBT_CS_END1
8'h1c:
CBT_CS_END2
8'h1d:
CBT_CS_END3
8'h1d:
CBT_CS_END3
8'h1e:
CBT_CHECK
CBT_CS fail?
N
Y
8'h20:
CBT_CAT1_INIT
wait TADR? TIMER
Y
N
done
wait 
TVREF_LONG? TIMER
Y
N
done
8'h21:
CBT_CAT1_VREF
8'h22:
CBT_CAT1_TVREF_LONG
8'h23:
CBT_CAT1_SEND
8'h24:
CBT_CAT1_TADR
8'h25:
CBT_CAT1_READ
98 
 
 
 
Fig. A.4 LPDDR4 memory controller operation flow chart 4 
MRW_MA11_FS
P1
MRW_MA13_FS
P0
MRW_MA13_C
BT_EN
TMRD
WAIT_CBT_EN
D
TVREF_LONG
MRW_MA13_C
BT_DEN
TVREF_LONG2
BOOT END1
UPDATA_CBT_END=Y
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRW? N
Y
wait TMRDTIMER
Y
N:TMRD_start
done
CA/CS update? N
Y
wait 
TVREFLONG?TIMER
Y
N:BOOT_TVREF
LONG
done
wait TMRW? N
Y
MRW_MA12_V
REFCA
wait TMRW? N
Y
wait 
TVREFLONG?TIMER
Y
N:BOOT_TVREF
LONG
done
Y
wait 
RDTR_WAIT_RL? state_cnt
Y
N
done
repeat>=reserve?
Y
N
8'h26:
CBT_CAT1_RECEIVE
8'h27:
CBT_CAT1_UPDATE
8'h29:
CBT_CAT1_CAL
CBT_CAT1 end?
Y
N
8'h25:
CBT_CAT1_READ
8'h28:
CBT_CAT1WAIT_UPDATE
wait 
WAIT_UPDATE? state_cnt
Y
N
done
8'h2a:
CBT_CAT1_END
8'h2b:
CBT_CAT1_END1
8'h2c:
CBT_CAT1_END2
8'h2d:
CBT_CAT1_END3
8'h30:
CBT_CAV2_INIT
wait 
TVREF_LONG? TIMER
Y
N
done
8'h32:
CBT_CAV2_VREF
8'h33:
CBT_CAV2_TVREF_LONG
8'h34:
CBT_CAV2_SEND
8'h35:
CBT_CAV2_TADR
8'h31:
CBT_CAV2_VREF_PRE
99 
 
 
 
Fig. A.5 LPDDR4 memory controller operation flow chart 5 
wait TADR? TIMER
Y
N
done
wait 
RDTR_WAIT_RL? state_cnt
Y
N
done
repeat>=reserve?
Y
N
8'h37:
CBT_CAV2_RECEIVE
8'h35:
CBT_CAV2_TADR
8'h38:
CBT_CAV2_UPDATE
CBT_CAV2 end?
Y
N
8'h36:
CBT_CAV2_READ
8'h39:
CBT_CAV2_cal
8'h3a:
CBT_CAV2_END
8'h3b:
CBT_CAV2_END1
8'h3c:
CBT_CAV2_END2
8'h3d:
CBT_CAV2_END3
8'h40:
CBT_CAT3_INIT
wait TADR? TIMER
Y
N
done
wait 
TVREF_LONG? TIMER
Y
N
done
8'h41:
CBT_CAT3_VREF
8'h42:
CBT_CAT3_TVREF_LONG
8'h43:
CBT_CAT3_SEND
8'h44:
CBT_CAT3_TADR
8'h45:
CBT_CAT3_READ
100 
 
 
 
Fig. A.6 LPDDR4 memory controller operation flow chart 6 
8'h3f:UPDATE_CBT
CA/CS update=Y
UPDATE_CBT_
END? N
Y
Y
wait 
RDTR_WAIT_RL? state_cnt
Y
N
done
repeat>=reserve?
Y
N
8'h46:
CBT_CAT3_RECEIVE
8'h47:
CBT_CAT3_UPDATE
8'h49:
CBT_CAT3_CAL
CBT_CAT3 end?
Y
N
8'h45:
CBT_CAT3_READ
8'h48:
CBT_CAT3_WAIT_UPDATE
wait 
WAIT_UPDATE? state_cnt
Y
N
done
8'h4a:
CBT_CAT3_END
8'h4b:
CBT_CAT3_END1
8'h4c:
CBT_CAT3_END2
8'h4d:
CBT_CAT3_END3
8'h4e:
CBT_TR_CHECK
8'h4f:
TCKCKEH
wait 
TVREF_LONG? TIMER
Y
N
done
8'h2e:
CKE_L2H
8'h2f:
TCKEHCMD
wait MAX_WAIT? state_cnt
Y
N
done
8'h50:
UPDATA_TVREF_LONG
101 
 
 
 
Fig. A.7 LPDDR4 memory controller operation flow chart 7 
Y
8'h50:
UPDATA_TVREF_LONG
wait 
TVREF_LONG? TIMER
Y
N
done
8'h51:
WL_MRW2_START
8'h52:
WL_TWLDQSEN
wait TWLMRD? state_cnt
Y
N
done
8'h53:
WL_TWLMRD
wait TWLMRD? state_cnt
Y
N
done
8'h54:
WL_SEND
8'h55:
WL_TWLO
wait TWLO? TIMER
Y
N
done
8'h56:
WL_READ
wait MAX_WAIT? state_cnt
Y
N
done
8'h57:
WL_UPDATE
8'h58:
WL_WAIT UPDATE
8'h59:
WL_CAL
wait
WAIT_UPDATE? state_cnt
Y
N
done
8'h5a:
WL_CAL2
8'h5b:
WL_CHECK
DQS0&DQS1 CAL 
end?
Y
N
8'h5c:
WL_MRW2_END
8'h5d:
WL_TMRD
102 
 
 
 
Fig. A.8 LPDDR4 memory controller operation flow chart 8 
8'h5d:
WL_TMRD
wait TMRD? TIMER
Y
N
done
8'h60:
RDTR_INIT
wait
RDTR_WAIT_RL? state_cnt
Y
N
done
8'h61:
RDTR_EQTR
RDTR_EQTR end? EQ
Y
N
done
8'h62:
RDTR_WAIT_EQTR
8'h63:
RDTR_MRW
8'h64:
RDTR_WAIT_MRW
wait MAX_WAIT? state_cnt
Y
N
done
wait MAX_WAIT? state_cnt
Y
N
done
RDTR_MRW_cnt=3?
Y
N
RDTR_MRW_cnt+1
8'h66:
RDTR_RL_INIT
8'h67:
RDTR_RL_RD_DQ_CAL
8'h68:
RDTR_RL_WAIT
wait MAX_WAIT? state_cnt
Y
N
done
wait
RDTR_WAIT_RL? state_cnt
Y
N
done
8'h69:
RDTR_RL_WAIT_DES
wait
WAIT_DES? state_cnt
Y
N
done
8'h68:
RDTR_RL_CHECK
103 
 
 
 
Fig. A.9 LPDDR4 memory controller operation flow chart 9 
Y
8'h68:
RDTR_RL_CHECK
RDTR_RL_C
fail?
N
Y 8'hfd
RDTR_FAIL
RDTR_RL_F
fail?
N
Y 8'h6d:
RDTR_RL_C_UNLOCK
RDTR_RL_C lock? N
Y
8'h6b:
RDTR_RL_C_UPDATE
RDTR_RL_F lock? N
Y
8'h6c:
RDTR_RL_F_UPDATE
8'h6e:
RDTR_RL_DQ_UPDATE
DQ_sel=DMI1_SEL?
[RDTR END]?
N
Y
8'h6f:
RDTR_RL_END
8'h70:
WRTR_T1_INIT
8'h71:
WRTR_T1_VREF
8'h72:
WRTR_T1_TVREF_LONG
wait 
TVREF_LONG? TIMER
Y
N
done
8'h73:
WRTR_T1_MPC1_WRFIFO
wait
MPC1_WRFIFO? state_cnt
Y
N
done
8'h74:
WRTR_T1_TWRRD_FIFO
wait
TWRRD_FIFO? state_cnt
Y done
8'h75:
WRTR_T1_MPC1_RDFIFO
8'h76:
WRTR_T1_WAIT_RDFIFO
N
wait
RDTR_WAIT_RL? state_cnt
Y done
N
8'h77:
WRTR_T1_WAIT_DES
wait
WAIT_DES? state_cnt
Y done
N
8'h78:
WRTR_T1_RECEIVE
WRTR_receive_cnt=4?
Y
N
8'h79:
WRTR_T1_UPDATE
104 
 
 
 
Fig. A.10 LPDDR4 memory controller operation flow chart 10 
Y
8'h79:
WRTR_T1_UPDATE
wait
WAIT_UPDATE? state_cnt
Y done
N
8'h7a:
WRTR_T1_WAIT_UPDATE
8'h7b:
WRTR_T1_CAL
WRTR_cal_end?
Y
N
8'h7c:
WRTR_T1_END
8'h7d:
WRTR_T1_END1
8'h7e:
WRTR_T1_END2
DQ_sel=DMI1_SEL?
[WRTR T1 END]?
N
Y
8'h7f:
WRTR_T1_END3
wait
WRTR_END? state_cnt
Y done
N
8'h80:
WRTR_V2_INIT
8'h81:
WRTR_V2_MRW_VREF
8'h82:
WRTR_V2_TVREF_LONG
wait 
TVREF_LONG? TIMER
Y
N
done
8'h83:
WRTR_V2_MPC1_WRFIFO
wait
MPC1_WRFIFO? state_cnt
Y
N
done
8'h84:
WRTR_V2_TWRRD_FIFO
wait
TWRRD_FIFO? state_cnt
Y done
8'h85:
WRTR_V2_MPC1_RDFIFO
8'h86:
WRTR_V2_WAIT_RDFIFO
N
105 
 
 
 
Fig. A.11 LPDDR4 memory controller operation flow chart 11 
8'h86:
WRTR_V2_WAIT_RDFIFO
wait
RDTR_WAIT_RL? state_cnt
Y done
N
8'h87:
WRTR_V2_WAIT_DES
wait
WAIT_DES? state_cnt
Y done
N
8'h88:
WRTR_V2_RECEIVE
WRTR_receive_cnt=4?
Y
N
8'h89:
WRTR_V2_UPDATE
8'h8a:
WRTR_T1_CAL
WRTR_cal_end?
Y
N
8'h8b:
WRTR_V2_END
8'h8c:
WRTR_V2_END1
8'h8d:
WRTR_V2_END2
DQ_sel=DMI1_SEL?
[WRTR V2 END]?
N
Y
8'h8e:
WRTR_V2_END3
8'h8f:
WRTR_V2_END4
8'h90:
WRTR_T3_INIT
8'h91:
WRTR_T3_TVREF_LONG
wait 
TVREF_LONG? TIMER
Y
N
done
8'h92:
WRTR_T3_MPC1_WRFIFO
wait
MPC1_WRFIFO? state_cnt
Y
N
done
8'h93:
WRTR_T3_TWRRD_FIFO
wait
TWRRD_FIFO? state_cnt
Y done
8'h94:
WRTR_T3_MPC1_RDFIFO
N
106 
 
 
 
Fig. A.12 LPDDR4 memory controller operation flow chart 12 
Y
8'h94:
WRTR_T3_MPC1_RDFIFO
8'h95:
WRTR_T3_WAIT_RDFIFO
wait
RDTR_WAIT_RL? state_cnt
Y done
N
8'h96:
WRTR_T3_WAIT_DES
wait
WAIT_DES? state_cnt
Y done
N
8'h97:
WRTR_T3_RECEIVE
WRTR_receive_cnt=4?
Y
N
8'h98:
WRTR_T3_UPDATE
wait
WAIT_UPDATE? state_cnt
Y done
N
8'h99:
WRTR_T3_WAIT_UPDATE
8'h9a:
WRTR_T3_CAL
WRTR_cal_end?
Y
N
8'h9b:
WRTR_T3_END
8'h9c:
WRTR_T3_END1
8'h9d:
WRTR_T3_END2
DQ_sel=DMI1_SEL?
[WRTR T3 END]?
N
Y
8'h9e:
WRTR_T3_END3
wait
WRTR_END? state_cnt
Y done
N
8'ha0:
NOR_INIT
8'ha1:
NOR_MRW3
107 
 
 
 
Fig. A.13 LPDDR4 memory controller operation flow chart 13 
8'ha1:
NOR_MRW3
8'ha2:
NOR_TMRD
wait TMRD? TIMER
Y
N
done
8'ha3:
NOR_BANK_ACT
8'ha4:
NOR_BANK_ACT_WAIT
wait
TRCD? state_cnt
Y done
N
Y
NOR_bank_add=3'b000 N
8'ha5:
NOR_INIT2
8'ha6:
NOR_WR
NOR_seamless? Y
N
8'ha7:
NOR_WTR_WAIT
NOR_wr_cmd_cnt=7?
Y
N
wait
TWTR? state_cnt
Y done
8'ha8:
NOR_RD
NOR_seamless? Y
N
8'ha9:
NOR_WAIT_RD
NOR_rd_cmd_cnt=7?
Y
N
N
wait
RDTR_WAIT_RL? state_cnt
Y done
N
8'haa:
NOR_WAIT_DES
wait
WAIT_DES? state_cnt
Y done
N
8'hab:
NOR_CHECK
NOR_TEST end?
Y
N
8'hac:
NOR_DQ_UPDATE
108 
 
 
 
Fig. A.14 LPDDR4 memory controller operation flow chart 14 
Y
8'hac:
NOR_DQ_UPDATE
DQ_sel=DQ15_SEL?
[NOR DQ TEST END]?
N
Y
8'hb0:
WMT_INIT
8'hb1:
WMT_MRW_VREF
8'hb2:
WMT_TVREF_LONG
wait
TVERF_LONG? timer
Y done
N
8'hb3:
WMT_WR
NOR_seamless? Y
N
8'hb4:
WMT_WTR_WAIT
NOR_wr_cmd_cnt=7?
Y
N
wait
TWTR? state_cnt
Y done
8'hb5:
WMT_RD
NOR_seamless? Y
N
8'hb6:
WMT_WAIT_RD
NOR_rd_cmd_cnt=7?
Y
N
N
wait
RDTR_WAIT_RL? state_cnt
Y done
N
8'hb7:
WMT_WAIT_DES
wait
WAIT_DES? state_cnt
Y done
N
8'hb8:
WMT_CHECK
NOR_TEST end?
Y
N
8'hb9:
WMT_DQ_UPDATE
N
Y
8'hba:
WMT_CODE_UPDATE
DQ_sel=DQ7_SEL?
[WMT DQ TEST END]?
109 
 
 
State Definition 
RESET   =8'h00, // RESET : Reset 
PWRON  =8'h01, // PWRON : Power On 
TINIT1   =8'h02, // TINIT1 : Reset_n Low. Supply On. Wait 
TINIT3   =8'h03, // TINIT3 : Reset_n High. CKE Low. Wait 
TINIT5   =8'h04, // TINIT5 : CKE High. Exit PD. Wait 
 
Fig. A.15 LPDDR4 memory controller operation flow chart 15 
Y
8'hba:
WMT_CODE_UPDATE
N
Y
WMT_END_CODE?
8'hbb:
WMT_END
N
Y
WMT_T_SW
8'hc0:
RMT_INIT
8'hc1:
RMT_MRW_VREF
8'hc2:
RMT_TVREF_LONG
wait
TVERF_LONG? timer
Y done
N
8'hc3:
RMT_WR
8'hc4:
RMT_WTR_WAIT
wait
TWTR?
Y done
NOR_seamless?
N
Y NOR_wr_cmd_cnt=7?
Y
N
state_cnt
8'hc5:
RMT_CHECK
NOR_TEST end?
Y
N
8'hff:
END
8'hc6:
RMT_RD
by i2c
110 
 
PHY_PD_CAL  =8'h05, // PHY_PD_CAL : PHY PD Calibration   
PHY_PU_CAL  =8'h06, // PHY_PU_CAL : PHY PU Calibration   
INITIALIZE  =8'h07, // INITIALIZE : Boot Frequency Controller Operating.  
TCMDCKE  =8'h08, // CKE Low & Low Speed OP stay 
CKE_H2L  =8'h09, // CKE Low & Low Speed OP stay 
TCKELCK  =8'h0a, // CKE Low & Low Speed OP stay 
CBT_CS_INIT  =8'h10, // CBT_CS_INIT : INITIALIZE 
CBT_CS_VREF_PRE =8'h11, // CBT_CS_INIT : INITIALIZE 
CBT_CS_VREF  =8'h12, // CBT_CS_INIT : INITIALIZE 
CBT_CS_TVREF_LONG =8'h13, // CBT_CS_TVREF_LONG : TVrefLong wait  
CBT_CS_SEND  =8'h14, // CBT_CS_SEND : Pattern Send 
CBT_CS_TADR  =8'h15, // CBT_CS_TADR : TADR wait 
CBT_CS_READ  =8'h16, // CBT_CS_READ : READ_FB by o_RD_FIFO_en  
CBT_CS_RECEIVE =8'h17, // CBT_CS_RECEIVE : Pattern Receive 
CBT_CS_UPDATE =8'h18, // CBT_CS_UPDATE : Code Update 
CBT_CS_CAL  =8'h19, // CBT_CS_CAL : Calibration 
CBT_CS_END  =8'h1a, // CBT_CS_END : Width Update 
CBT_CS_END_1 =8'h1b, // CBT_CS_END_1 : Width Compare 
CBT_CS_END_2 =8'h1c, // CBT_CS_END_2 : Center Point Cal 
CBT_CS_END_3 =8'h1d, // CBT_CS_END_3 : SW_Value_Update  
CBT_CHECK  =8'h1e, // CBT_CHECK : CS Training Result Check 
CBT_CAT1_INIT =8'h20, // CBT_CAT1_INIT : INITIALIZE 
CBT_CAT1_VREF =8'h21, // CBT_CAT1_INIT : INITIALIZE 
CBT_CAT1_TVREF_LONG =8'h22, // CBT_CAT1_TVREF_LONG : TVrefLong 
wait  
CBT_CAT1_SEND =8'h23, // CBT_CAT1_SEND : Pattern Send 
CBT_CAT1_TADR =8'h24, // CBT_CAT1_TADR : TADR wait 
CBT_CAT1_READ =8'h25, // CBT_CAT1_READ : READ_FB by o_RD_FIFO_en  
CBT_CAT1_RECEIVE =8'h26, // CBT_CAT1_RECEIVE : Pattern Receive 
111 
 
CBT_CAT1_UPDATE =8'h27, // CBT_CAT1_UPDATE : Code Update 
CBT_CAT1_WAIT_UPDATE =8'h28, // CBT_CAT1_WAIT_UPDATE : Wait for 
updating  
CBT_CAT1_CAL =8'h29, // CBT_CAT1_CAL : Calibration 
CBT_CAT1_END =8'h2a, // CBT_CAT1_END : Width Update 
CBT_CAT1_END_1 =8'h2b, // CBT_CAT1_END_1 : Width Compare 
CBT_CAT1_END_2 =8'h2c, // CBT_CAT1_END_2 : Center Point Cal, Length/2  
CBT_CAT1_END_3 =8'h2d, // CBT_CAT1_END_3 : SW Value Update 
CBT_CAV2_INIT =8'h30, // CBT_CAV2_INIT : INITIALIZE 
CBT_CAV2_VREF_PRE =8'h31, // CBT_CAV2_INIT : INITIALIZE 
CBT_CAV2_VREF =8'h32, // CBT_CAV2_INIT : INITIALIZE 
CBT_CAV2_TVREF_LONG =8'h33, // CBT_CAV2_TVREF_LONG : TVrefLong 
wait  
CBT_CAV2_SEND =8'h34, // CBT_CAV2_SEND : Pattern Send 
CBT_CAV2_TADR =8'h35, // CBT_CAV2_TADR : TADR wait 
CBT_CAV2_READ =8'h36, // CBT_CAV2_READ : READ_FB by o_RD_FIFO_en  
CBT_CAV2_RECEIVE =8'h37, // CBT_CAV2_RECEIVE : Pattern Receive 
CBT_CAV2_UPDATE =8'h38, // CBT_CAV2_UPDATE : Code Update 
CBT_CAV2_CAL =8'h39, // CBT_CAV2_CAL : Calibration 
CBT_CAV2_END =8'h3a, // CBT_CAV2_END : Width Update 
CBT_CAV2_END_1 =8'h3b, // CBT_CAV2_END_1 : Width Compare 
CBT_CAV2_END_2 =8'h3c, // CBT_CAV2_END_2 : Center Point Cal, Length/2  
CBT_CAV2_END_3 =8'h3d, // CBT_CAV2_END_3 : SW Value Update 
CBT_CAT3_INIT =8'h40, // CBT_CAT3_INIT : INITIALIZE 
CBT_CAT3_VREF =8'h41, // CBT_CAT3_INIT : INITIALIZE 
CBT_CAT3_TVREF_LONG =8'h42, // CBT_CAT3_TVREF_LONG : TVrefLong 
wait  
CBT_CAT3_SEND =8'h43, // CBT_CAT3_SEND : Pattern Send 
CBT_CAT3_TADR =8'h44, // CBT_CAT3_TADR : TADR wait 
112 
 
CBT_CAT3_READ =8'h45, // CBT_CAT3_READ : READ_FB by o_RD_FIFO_en  
CBT_CAT3_RECEIVE =8'h46, // CBT_CAT3_RECEIVE : Pattern Receive 
CBT_CAT3_UPDATE =8'h47, // CBT_CAT3_UPDATE : Code Update 
CBT_CAT3_WAIT_UPDATE =8'h48, // CBT_CAT3_WAIT_UPDATE : Wait 
Updating  
CBT_CAT3_CAL =8'h49, // CBT_CAT3_CAL : Calibration 
CBT_CAT3_END =8'h4a, // CBT_CAT3_END : Width Update 
CBT_CAT3_END_1 =8'h4b, // CBT_CAT3_END_1 : Width Compare 
CBT_CAT3_END_2 =8'h4c, // CBT_CAT3_END_2 : Center Point Cal, Length/2  
CBT_CAT3_END_3 =8'h4d, // CBT_CAT3_END_3 : SW Value Update 
CBT_TR_CHECK =8'h4e, // CBT_CAT3_END_3 : SW Value Update 
TCKCKEH  =8'h4f, // TCKCKEH: before CKE high. turn on low-speed op 
& WAIT 2 refclk  
CKE_L2H  =8'h2e, // CKE Low to High Wait tVref Long to tCKEHCMD 
TCKEHCMD  =8'h2f, // TCKEHCMD DQS reamin 
UPDATE_CBT  =8'h3f, // CBT end wait for Update. MRW, Vref_ca change. 
tref Long wait 
UPDATE_TVREF_LONG =8'h50, // o_ODT_CA value change + Speed change. Waiting 
time  
WL_MRW2_START =8'h51, // WL_MRW2_START : Write Leveling start. Drive 
DQS_t/c to Low/High 
WL_TWLDQSEN =8'h52, // WL_enable command to DQSB high 20tCK 
WL_TWLMRD  =8'h53, // WL_TWLMRD : 40tCK wait  
WL_SEND  =8'h54, // WL_SEND : SEND DQS_t/c [0/1] 4 pulses 
WL_TWLO  =8'h55, // WL_TWLO : 20ns Wait 
WL_READ  =8'h56, // WL_READ : READ FB Value by COMPARATOR 
value 
WL_UPDATE  =8'h57, // WL_UPDATE : Update Location  
WL_WAIT_UPDATE =8'h58,  
113 
 
WL_CAL  =8'h59, // WL_CAL : Calibration 
WL_CAL2  =8'h5a, // WL_CAL2 : Calibration-2 
WL_CHECK  =8'h5b, // WL_CHECK : Sweep End Check  
WL_MRW2_END  =8'h5c, // WL_MRW2_END : WL End  
WL_TMRD  =8'h5d, // WL_TMRD : MRW wait time to Write FIFO MAX  
RDTR_INIT  =8'h60, // RDTR_INIT : Read Training Initialize 
RDTR_EQTR  =8'h61, // RDTR_EQTR : Read EQ Adapatation  
RDTR_WAIT_EQTR =8'h62, // RDTR_RL_INIT : Initialize RL Training 
RDTR_MRW  =8'h63, // RDTR_MRW 15 20 32 40 
RDTR_WAIT_MRW =8'h64, // RDTR_RL_INIT : Initialize RL Training 
RDTR_RL_INIT  =8'h66, // RDTR_RL_INIT : Initialize RL Training 
RDTR_RL_RD_DQ_CAL =8'h67, // RDTR_RL_RD_DQ_CAL: RD Cal pattern READ 
RDTR_RL_WAIT =8'h68, // RDTR_RL_WAIT: RD Cal pattern Wait until fifo_en 
RDTR_RL_WAIT_DES =8'h69, 
RDTR_RL_CHECK =8'h6a, 
RDTR_RL_C_UPDATE =8'h6b, 
RDTR_RL_F_UPDATE =8'h6c, 
RDTR_RL_C_UNLOCK =8'h6d, 
RDTR_RL_DQ_UPDATE =8'h6e, 
RDTR_RL_END  =8'h6f, 
WRTR_T1_INIT  =8'h70, // WRTR_T1_INIT : Write Training (DQ_DQS) Time 
Sweep Start 
WRTR_T1_VREF =8'h71, // WRTR_T1_INIT : Write Training (DQ_DQS) Time 
Sweep Start 
WRTR_T1_TVREF_LONG =8'h72, // WRTR_V2_TVREF_LONG : Wait Tvref 
Long 
WRTR_T1_MPC1_WRFIFO =8'h73, // WRTR_T1_MPC1_WRFIFO : Write FIFO 
Pattern 1 
WRTR_T1_TWRRD_FIFO =8'h74, // WRTR_T1_TWRRD_FIFO : 10ns 
114 
 
(~21.2tCK) + 9tCK Write FIFO to Read FIFO Delay = 32tCK , Base on Max Speed 
Setting 56tCK 
WRTR_T1_MPC1_RDFIFO=8'h75, // WRTR_T1_MPC1_RDFIFO : Read FIFO Pattern 
WRTR_T1_WAIT_RDFIFO =8'h76, // WRTR_T1_WAIT_RDFIFO : Extra State 
for Read Latency 
WRTR_T1_WAIT_DES =8'h77, // WRTR_T1_WAIT_DES : Deseiralizer Latency Wait  
WRTR_T1_RECEIVE =8'h78, // WRTR_T1_RECEIVE : Check Read FiFo Results. 
WRTR_receive_cnt Update 
WRTR_T1_UPDATE =8'h79, // WRTR_T1_UPDATE : Update Read FiFo time 
Code  
WRTR_T1_WAIT_UPDATE  =8'h7a,  
WRTR_T1_CAL  =8'h7b, // WRTR_T1_CAL : Calibration  
WRTR_T1_END  =8'h7c, // WRTR_T1_END : Update Width  
WRTR_T1_END_1 =8'h7d, // WRTR_T1_END_1 : Center Point Cal,  
WRTR_T1_END_2 =8'h7e, // WRTR_T1_END_2 : Value Update, DQ_SEL 
Update 
WRTR_T1_END_3 =8'h7f, // WRTR_T1_END_3 : Value Update, DQ_SEL 
Update 
WRTR_V2_INIT  =8'h80, // WRTR_V2_INIT : Start Voltage Sweep. Initialize 
Tvref_DQ 
WRTR_V2_MRW_VREF =8'h81, // WRTR_V2_MPC_VREF : Mode Register Write 
Vref  
WRTR_V2_TVREF_LONG =8'h82, // WRTR_V2_TVREF_LONG : Wait Tvref 
Long 
WRTR_V2_MPC1_WRFIFO =8'h83, // WRTR_V2_MPC1_WRFIFO : Write 
FIFO Pattern 1 
WRTR_V2_TWRRD_FIFO =8'h84, // WRTR_V2_TWRRD_FIFO : 10ns 
(~21.2tCK) + 9tCK Write FIFO to Read FIFO Delay = 32tCK , Base on Max Speed 
Setting 56tCK 
115 
 
WRTR_V2_MPC1_RDFIFO =8'h85, // WRTR_V2_MPC1_RDFIFO : Read FIFO 
Pattern 
WRTR_V2_WAIT_RDFIFO =8'h86, // WRTR_V2_WAIT_RDFIFO : Extra State 
for Read Latency 
WRTR_V2_WAIT_DES =8'h87, // WRTR_V2_WAIT_DES : Deseiralizer Latency Wait  
WRTR_V2_RECEIVE =8'h88, // WRTR_V2_RECEIVE : Check Read FiFo Results. 
WRTR_receive_cnt Update 
WRTR_V2_UPDATE =8'h89, // WRTR_V2_UPDATE : Update Read FiFo time 
Code  
WRTR_V2_CAL  =8'h8a, // WRTR_V2_CAL : Calibration  
WRTR_V2_END  =8'h8b, // WRTR_V2_END : Update Width  
WRTR_V2_END_1 =8'h8c, // WRTR_V2_END_1 : Center Point Cal 
WRTR_V2_END_2 =8'h8d, // WRTR_V2_END_2 : Value Update, DQ_SEL 
Update 
WRTR_V2_END_3 =8'h8e, // WRTR_V2_END_3 : Select Boundary 
WRTR_V2_END_4 =8'h8f, // WRTR_V2_END_4 : Change Boundary  
WRTR_T3_INIT  =8'h90, // WRTR_T3_INIT : Write Training (DQ_DQS) Time 
Sweep Start 
WRTR_T3_TVREF_LONG =8'h91, // WRTR_T3_TVREF_LONG : Wait Tvref 
Long 
WRTR_T3_MPC1_WRFIFO =8'h92, // WRTR_T3_MPC1_WRFIFO : Write FIFO 
Pattern 1 
WRTR_T3_TWRRD_FIFO =8'h93, // WRTR_T3_TWRRD_FIFO : 10ns 
(~21.2tCK) + 9tCK Write FIFO to Read FIFO Delay = 32tCK , Base on Max Speed  
WRTR_T3_MPC1_RDFIFO =8'h94, // WRTR_T3_MPC1_RDFIFO : Read FIFO 
Pattern 
WRTR_T3_WAIT_RDFIFO =8'h95, // WRTR_T3_WAIT_RDFIFO : Extra State 
for Read Latency 
WRTR_T3_WAIT_DES =8'h96, // WRTR_T3_WAIT_DES : Deseiralizer Latency Wait  
116 
 
WRTR_T3_RECEIVE =8'h97, // WRTR_T3_RECEIVE : Check Read FiFo Results. 
WRTR_receive_cnt Update 
WRTR_T3_UPDATE =8'h98, // WRTR_T3_UPDATE : Update Read FiFo time 
Code  
WRTR_T3_WAIT_UPDATE  =8'h99,  
WRTR_T3_CAL  =8'h9a, // WRTR_T3_CAL : Calibration  
WRTR_T3_END  =8'h9b, // WRTR_T3_END : Update Width  
WRTR_T3_END_1 =8'h9c, // WRTR_T3_END_1 : Center Point Cal,  
WRTR_T3_END_2 =8'h9d, // WRTR_T3_END_2 : Value Update, DQ_SEL 
Update 
WRTR_T3_END_3 =8'h9e, // WRTR_T3_END_2 : Value Update, DQ_SEL 
Update 
NOR_INIT   =8'ha0, // NOR_INIT : Normal Operation INIT  
NOR_MRW3  =8'ha1, // NOR_MRW3 : DBI_WR change from init value 
1'b1 to 1'b0  
NOR_TMRD  =8'ha2, // NOR_TMRD : MRW to Other Command Delay 
MRD max 14ns & 10nTCK. ref.CLK. 2time. = slowest * 16tCK 
NOR_BANK_ACTIVE =8'ha3, // NOR_BANK_ACTIVE : BANK Active Command
  
NOR_BANK_ACTIVE_WAIT =8'ha4, // NOR_BANK_ACTIVE_WAIT : BANK 
Active Command wait for tFAW & tRRI 
NOR_INIT2  =8'ha5, // NOR_INIT2 : Normal Operation INIT2 + Bank 
Active to Write Wait  
NOR_WR  =8'ha6, // NOR_WR: Normal Operation. Write 
NOR_WTR_WAIT =8'ha7, // NOR_WAIT : Normal Operation. WAIT T Write to 
Read  
NOR_RD  =8'ha8, // NOR_RD : Normal Op.  
NOR_WAIT_RD  =8'ha9, // NOR_WAIT_RD : Read Latency Wait 
NOR_WAIT_DES =8'haa, // NOR_WAIT_DES : Deserializer Latency Wait 
117 
 
NOR_CHECK  =8'hab, // NOR_CHECK : Normal Op. PASS FAIL Checking 
NOR_DQ_UPDATE =8'hac, // NOR_DQ_UPDATE : Normal Op. DQ sel update 
WMT_INIT  =8'hb0, // WMR_INIT : Initialize  
WMT_MRW_VREF =8'hb1, // WMT_MRW_VREF : Mode Register Write Vref 
WMT_TVREF_LONG =8'hb2, // WMT_TVREF_LONG : Normal Op. DQ sel update 
WMT_WR  =8'hb3, // WMT_WR: Normal Operation. Write 
WMT_WTR_WAIT =8'hb4, // WMT_WAIT : Normal Operation. WAIT T Write to 
Read  
WMT_RD  =8'hb5, // WMT_RD : Normal Op.  
WMT_WAIT_RD =8'hb6, // WMT_WAIT_RD : Read Latency Wait 
WMT_WAIT_DES =8'hb7, // WMT_WAIT_DES : Deserializer Latency Wait 
WMT_CHECK  =8'hb8, // NOR_CHECK : Normal Op. PASS FAIL Checking 
WMT_DQ_UPDATE =8'hb9, // WMT_DQ_UPDATE : Normal Op. DQ sel update 
WMT_CODE_UPDATE =8'hba, // WMT_CODE_UPDATE : Normal Op. DQ sel 
update 
WMT_END  =8'hbb, // WMT_END : Normal Op. DQ sel update 
RMT_INIT  =8'hc0, // WMR_INIT : Initialize  
RMT_MRW_VREF =8'hc1, // WMT_MRW_VREF : Mode Register Write Vref 
RMT_TVREF_LONG =8'hc2, // WMT_TVREF_LONG : Normal Op. DQ sel update 
RMT_WR  =8'hc3, // WMT_WR: Normal Operation. Write 
RMT_WTR_WAIT =8'hc4, // WMT_WAIT : Normal Operation. WAIT T Write to 
Read  
RMT_CHECK  =8'hc5, // WMT_DQ_UPDATE : Normal Op. DQ sel update 
RMT_RD  =8'hc6, // WMT_RD : Normal Op.  
RDTR_FAIL  =8'hfd, // RDTR_RL sweep fail  
END   =8'hff; 
118 
 
BIBLIOGRAPHY 
 
 
 
[1.1.1] B. Sanou, "The world in 2014 ICT facts and figures," 2014. 
[1.1.2] B. Sanou, "The world in 2015 ICT facts and figures," 2015. 
[1.1.3] O.-H. Kwon, "Value-driven memory technology for the future semiconductor 
market," GSA memory conference, 2010. 
[1.1.4] H. Vuong, "Mobile memory technology roadmap," JEDEC, 2013. 
[1.1.5] JEDEC standard LPDDR SDRAM specification, JESD209B, Feb. 2009. 
[1.1.6] S.-H. Kim, W.-O. Lee, J.-H. Kim, S.-S. Lee, S.-Y. Hwang, C.-I. Kim, T.-W. Kwon, 
B.-S. Han, S.-K. Cho, D.-H. Kim, J.-K. Hong, M.-Y. Lee, S.-W. Yin, H.-G. Kim, 
J.-H. Ahn, Y.-T. Kim, Y.-H. Koh, and J.-S. Kih, "A low power and highly reliable 
400 Mbps mobile DDR SDRAM with on-chip distributed ECC," IEEE Asian 
Solid-State Circuits Conf., 2007, pp. 34-37. 
[1.1.7] JEDEC standard LPDDR2 SDRAM specification, JESD209-2F, Apr. 2011. 
[1.1.8] JEDEC standard LPDDR3 SDRAM specification, JESD209-3C, Aug. 2013. 
[1.1.9] Y.-C. Bae, J.-Y. Park, S. J. Rhee, S. B. Ko, Y. Jeong, K.-S. Noh, Y. Son, J. Youn, 
Y. Chu, H. Cho, M. Kim, D. Yim, H.-C. Kim, S.-H. Jung, H.-I. Choi, S. Yim, J.-
B. Lee, J. S. Choi, and K. Oh, "A 1.2 V 30nm 1.6 Gb/s/pin 4Gb LPDDR3 
SDRAM with input skew calibration and enhanced control scheme," IEEE Int. 
Solid-State Circuits Conf. Dig. Tech. Papers, 2012, pp. 44-45. 
[1.1.10] JEDEC standard LPDDR4 SDRAM specification, JESD209-4A, Nov. 2015. 
[1.1.11] JEDEC standard Wide I/O SDR specification, JESD229, Dec. 2011. 
[1.1.12] JEDEC standard Wide I/O 2 specification, JESD229-2, Aug. 2014.  
[1.1.13] K. Song, S. Lee, D. Kim, Y. Shim, S. Park, B. Ko, D. Hong, Y. Joo, W. Lee, Y. 
Cho, W. Shin, J. Yun, H. Lee, J. Lee, E. Lee, N. Jang, J. Yang, H.-k. Jung, J. Cho, 
H. Kim, and J. Kim, "A 1.1 V 2y-nm 4.35 Gb/s/pin 8 Gb LPDDR4 Mobile Device 
With Bandwidth Improvement Techniques," IEEE J. Solid-State Circuits, vol. 47, 
no. 1, pp.107-116, Jan. 2012. 
119 
 
[1.2.1] M. Motoyoshi, "Through-siliconvia," in Proc. of the IEEE, vol.97, no. 1, pp.43-
48, Jan. 2009. 
[1.2.2] J.-S. Kim, C. S. Oh, H. Lee, D. Lee, H. R. Hwang, S. Hwang, B. Na, J. Moon, J.-
G. Kim, H. Park, J.-W. Ryu, K. Park, S. K. Kang, S.-Y. Kim, H. Kim, J.-M. Bang, 
H. Cho, M. Jang, C. Han, J.-B. Lee, J. S. Choi, and Y.-H. Jun, "A 1.2 V 12.8 GB/s 
2 Gb Mobile Wide-I/O DRAM With 4 × 128 I/Os Using TSV Based Stacking," 
IEEE J. Solid-State Circuits, vol. 47, no. 1, pp.107-116, Jan. 2012. 
[1.2.3] Q. Ma and H. Fujimoto, "Silicon interposer and multi-chip-module (MCM) with 
through substrate vias," U.S. patent 6,229,216, May 8, 2001. 
[1.2.4] X. Zhang, TC Chai, J. H. Lau, C. S. Selvanayagam, K. Biswas, S. Liu, D. Pinjala, 
GY Tang, YY Ong, SR Vempati, E. Wai, HY Li, EB Liao, N. Ranganathan, V. 
Kripesh, J. Sun, J. Doricko, and C. J. Vath III, "Development of through silicon 
via (TSV) interposer technology for large die (21x21mm) fine-pitch Cu/low-k 
FCBGA package," IEEE Electronic Components and Technology Conf. 2009, pp. 
305-312. 
[2.1.1] Y.-C. Cho, Y.-C. Bae, B.-M. Moon, Y.-J. Eom, M.-S. Ahn, W.-Y. Lee, C.-R. Cho, 
M.-H. Park, Y.-J. Jeon, J.-O. Ahn, B.-K. Choi, D.-K. Kang, S.-H. Yoon, Y.-S. 
Yang, K.-I. Park, J.-H. Choi, J.-B. Lee, and J.-S. Choi, "A sub-1.0V 20nm 
5Gb/s/pin post-LPDDR3 I/O interface with low voltage-swing terminated logic 
and adaptive calibration scheme for mobile application," IEEE in Symp. on VLSI 
Circuits Dig., 2013, pp. 240-241. 
[2.1.2] M. Bucher, R. T. Kollipara, B. Su, L. Gopalakrishnan, K. Prabhu, P. K. 
Venkatesan, K. Kaviani,B. Daly, B. W. F. Stonecypher, W. Dettloff, T. Stone, F. 
Heaton, Y. Lu, C. Madden, S. Bangalore, J. C. Eble, N. M. Nguyen, and L. Luo, 
"A 6.4-Gb/s near-ground single-ended transceiver for dual-rank DIMM memory 
interface systems," IEEE J. Solid-State Circuits, vol. 49, no. 1, pp. 127-139, Jan. 
2013. 
[2.1.3] R. Palmer, J. Poulton, W. J. Dally, J. Eyles, A. M. Fuller, T. Greer, M. Horowitz, 
M. Kellam, F. Quan, and F. Zarkeshvari, "A 14mW 6.25Gb/s transceiver in 90nm 
120 
 
CMOS for serial chip-to-chip communications," IEEE Int. Solid-State Circuits 
Conf. Dig. Tech. Papers, 2007, pp. 440-441. 
[2.1.4] H.-K. Jung, J. Yang, J. Lee, H. Ko, H. Lee, T. Song, J. Shim, S.-K. Lee, K. Song, 
D.-K. Kim, H. Kim, and Y. Kim, "A 4.35Gb/s/pin LPDDR4 I/O interface with 
multi-VOH level, equalization scheme, and duty-training circuit for mobile 
applications," IEEE in Symp. on VLSI Circuits Dig., 2015, pp. 184-185. 
[2.1.5] H. J. Lee and Y.-B. Kim, "A process tolerant semi-self impedance calibration 
method for LPDDR4 memory controller," IEEE in Symp. Midwest Circuits and 
Systems, 2015, pp.1-4. 
[2.1.6] T.-Y. Oh, H. Chung, Y.-C. Cho, J.-W. Ryu, K. Lee, C. Lee, J.-I. Lee, H.-J. Kim, 
M. S. Jang, G.-H. Han, K. Kim, D. Moon, S. Bae, J.-Y. Park, K.-S. Ha, J. Lee, 
S.-Y. Doo, J.-B. Shin, C.-H. Shin, K. Oh, D. Hwang, T. Jang, C. Park, K. Park, 
J.-B. Lee, and J. S. Choi, "A 3.2Gb/s/pin 8Gb 1.0V LPDDR4 SDRAM with 
integrated ECC engine for sub-1V DRAM core operation," IEEE Int. Solid-State 
Circuits Conf. Dig. Tech. Papers, 2014, pp. 430-431. 
[2.3.1] JEDEC standard LPDDR4 SDRAM specification, JESD209-4A, Aug. 2014. 
 
[2.3.2] W.-J. Kim, Y.-H. Ro, and S.-W. Park, "Package on package," U.S. patent 
8,446,018, May 21, 2013. 
[2.3.3] A. Yoshida, J. Taniguchi, K. Murata, M. Kada, Y. Yamamoto, Y. Takagi, T. 
Notomi, and A. Fujita, "A study on package stacking process for package-on-
package (PoP)," in Proc. Electronic Components and Technology Conference, 
2006, pp. 825-830. 
[3.2.1] I. Fujimori and M. S. Nejad, "Eye monitoring and reconstruction using CDR ans 
sub-sampling ADC," U.S. patent 7,460,589, Dec. 2, 2008. 
[3.2.2] Y. Miki, T. Saito, H. Yamashita, F. Yuki, T. Baba, A. Koyama, and M. Sonehara, 
"A 50-mW/ch 2.5-Gb/s/ch data recovery circuit for the SFI-5 interface with 
digital eye-tracking," IEEE J. Solid-State Circuits, vol. 39, no. 4, pp. 613-621, 
Apr. 2004. 
121 
 
[3.2.3]  B. Analui, A. Rylyakov, S. Rylov, M. Meghelli, and A. Hajimiri, "A 10-Gb/s two-
dimensional eye-opening monitor in 0.13-μm standard CMOS," IEEE J. Solid-
State Circuits, vol. 40, no. 12, pp. 2689-2699, Dec. 2005. 
[3.2.4]  H. Noguchi, N. Yoshida, H. Uchida, M. Ozaki, S. Kanemitsu, and S. Wada, "A 
40-Gb/s CDR circuit with adaptive decision-point control based on eye-opening 
monitor feedback," IEEE J. Solid-State Circuits, vol. 43, no. 12, pp. 2929-2938, 
Dec. 2008. 
[3.2.5]  J. F. Bulzacchelli, T. O. Dickson, Z. T. Deniz, H. A. Ainspan, B. D. Parker, M. P. 
Beakes, S. V. Rylov, and D. J. Friedman, "A 78mW 11.1Gb/s 5-Tap DFE receiver 
with digitally calibrated current-integrating summers in 65nm CMOS," IEEE Int. 
Solid-State Circuits Conf. Dig. Tech. Papers, 2009, pp. 368-369. 
[3.2.6]  M. Loh, and A. Emami-Neyestanak, "A 3×9 Gb/s shared, all-digital CDR for 
high-speed, high-density I/O," IEEE J. Solid-State Circuits, vol. 47, no. 3, pp. 
641-651, Mar. 2012. 
[3.2.7]  J. F. Bulzacchelli, C. Menolfi, T. J. Beukema, D. W. Storaska, J. Hertle, D. R. 
Hanson, P.-H. Hsieh, S. V. Rylov, D. Furrer, D. Gardellini, A. Prati, T. Morf, V. 
Sharma, R. Kelkar, H. A. Ainspan, W. R. Kelly, L. R. Chieco, G. A. Ritter, J. A. 
Sorice, J. D. Garlett, R. Callan, M. Brändli, P. Buchmann, M. Kossel, T. Toifl, 
and D. J. Friedman, "A 28-Gb/s 4-tap FFE/15-tap DFE serial link transceiver in 
32-nm SOI CMOS technology," IEEE J. Solid-State Circuits, vol. 47, no. 12, pp. 
3232-3248, Dec. 2012. 
[3.2.8]  H. Kimura, P. M. Aziz, T. Jing, A. Sinha, S. P. Kotagiri, R. Narayan, H. Gao, P. 
Jing, G. Hom, A. Liang, E. Zhang, A. Kadkol, R. Kothari, G. Chan, Y. Sun, B. 
Ge, J. Zeng, K. Ling, M. C. Wang, A. Malipatil, L. Li, C. Abel, and F. Zhong, "A 
28 Gb/s 560 mW multi-standard SerDes with single-stage analog front-end and 
14-tap decision feedback equalizer in 28 nm CMOS," IEEE J. Solid-State 
Circuits, vol. 49, no. 12, pp. 3091-3103, Dec. 2014. 
[3.2.9] K. Song, S. Lee, D. Kim, Y. Shim, S. Park, B. Ko, D. Hong, Y. Joo, W. Lee, Y. 
Cho, W. Shin, J. Yun, H. Lee, J. Lee, E. Lee, J. Yang, H. Jung, N. Jang, J. Cho, 
122 
 
H. Kim, and J. Kim, "A 1.1V 2y-nm 4.35Gb/s/pin 8Gb LPDDR4 mobile device 
with bandwidth improvement techniques," IEEE Custom Integrated Circuits 
Conf. 2014. 
[3.2.10] C.-K. Lee, M. Ahn, D. Moon, K. Kim, Y.-J. Eom, W.-Y. Lee, J. Kim, S. Yoon, B. 
Choi, S. Kwon, J.-Y. Park, S.-J. Bae, Y.-C. Bae, J.-H. Choi, S.-J. Jang, G. Jin, "A 
6.4Gb/s/pin at sub-1V supply voltage TX-interleaving technique for mobile 
DRAM interface," IEEE Symp. on VLSI Circuits, 2015. pp. 182-183. 
[4.2.1] D.-H. Oh, K.-J. Choo, and D.-K. Jeong, "Phase-frequency detecting time-to-
digital converter," IEEE Electronics Lett., Vol. 45, No. 4, pp. 201-202, Feb. 2009. 
[4.2.2] R. B. Staszewski, J. L. Wallberg, S. Rezeq, C.-M. Hung, O. E. Eliezer, S. K. 
Vemulapalli, C. Fernando, K. Maggio, R. Staszewski, N. Barton, M.-C. Lee, P. 
Cruise, M. Entezari, K. Muhammad, and D. Leipold, "All-digital PLL and 
transmitter for mobile phones," IEEE J. Solid-State Circuits, vol. 40, no. 12, pp. 
2469-2482, Dec. 2005. 
[4.2.3] Y. Park and D. D. Wentzloff, "A cyclic vernier TDC for ADPLLs synthesized 
from a standard cell library," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, 
no. 7, pp. 1511-1517, Jul. 2011. 
[4.2.4] T.-K. Jang, X. Nan, F. Liu, J. Shin, H. Ryu, J. Kim, T. Kim, J. Park, and H. Park, 
"A 0.026mm2 5.3mW 32-to-2000MHz digital fractional-N phase locked-loop 
using a phase-interpolating phase-to-digital converter," IEEE Int. Solid-State 
Circuits Conf. Dig. Tech. Papers, 2013, pp. 254-255. 
[4.2.5] A. Samarah and A. C. Carusone, "A digital phase-locked loop with calibrated 
coarse and stochastic fine TDC," IEEE J. Solid-State Circuits, vol. 48, no. 8, pp. 
1829-1841, Aug. 2013. 
[4.2.6] D.-H. Oh, D.-S Kim, S. Kim, D.-K. Jeong, and W. Kim, "A 2.8Gb/s all-digital 
CDR with a 10b monotonic DCO," IEEE Int. Solid-State Circuits Conf. Dig. Tech. 
Papers, 2007, pp. 222-223. 
[4.2.7] D.-S. Kim, H. Song, T. Kim, S. Kim, and D.-K. Jeong, "A 0.3–1.4 GHz all-digital 
fractional-N PLL with adaptive loop gain controller," IEEE J. Solid-State 
123 
 
Circuits, vol. 45, no. 11, pp. 2300-231, Nov. 2010. 
[4.2.8] H. Song, D.-S. Kim, D.-H. Oh, S. Kim, and D.-K. Jeong, "A 1.0–4.0-Gb/s all-
digital CDR with 1.0-ps period resolution DCO and adaptive proportional gain 
control," IEEE J. Solid-State Circuits, vol. 46, no. 2, pp. 424-434, Feb. 2011. 
[4.2.9]  J.-H Chae, G.-M. Hong, J. Park, M. Kim, H. Ko, W.-Y. Shin, H. Chi, D.-K. Jeong, 
and S. Kim, "A 1.74mW/GHz 0.11-2.5GHz fast-locking, jitter-reducing, 180° 
phase-shift digital DLL with a window phase detector for LPDDR4 memory 
controllers," IEEE Asian Solid-State Circuits Conf., 2015. 
[4.2.10] T.-Y. Oh, H. Chung, J.-Y. Park, K.-W. Lee, S. Oh, S.-Y. Doo, H.-J. Kim, C. Lee, 
H.-R. Kim, J.-H. Lee, J.-I. Lee, K.-S. Ha, Y. Choi, Y.-C. Cho, Y.-C. Bae, T. Jang, 
C. Park, K. Park, S. Jang, and J. S. Choi, "A 3.2 Gbps/pin 8 Gbit 1.0 V LPDDR4 
SDRAM with integrated ECC engine for sub-1 V DRAM core operation," IEEE 
J. Solid-State Circuits, vol. 50, no. 1, pp. 178-190, Jan. 2015. 
[5.3.1] D.-G Lin, B.-H. Lu, and H. Chiueh, "An 100MHz to 1.6GHz DLL-based clock 
generator using a feedback-switching detector," IEEE International Conference 
on VLSI and System-on-Chip, pp. 101-104, Sep. 2010. 
[5.3.2] S. M. K. John and S. P.R., " Low power glitch free dual output coarse digitally 
controlled delay lines," IEEE International Conference on Advanced compupting 
communication Systems, pp. 1-11, Dec. 2013. 
 
124 
 
 
 
 
한글초록 
 
 
 
고속 저전력 동작을 지원하는 모바일 메모리에 대한 요구가 점점 더 커지
고 있다. 본 연구에서는 LPDDR4 메모리와 함께 동작하는 LPDDR4 메모리 
컨트롤러의 구조를 제안 및 설계하였고, 이러한 구조에 적합한 효율적인 트레
이닝 알고리즘을 메모리 트레이닝과 검증을 위해 제안하였다. 
스펙 상의 LPDDR4 메모리는 533Mbps에서 4266Mbps까지 동작이 가능
하여야 하고, LPDDR4 메모리 컨트롤러는 그 속도에 맞추어 동작이 가능하도
록 모델링 및 설계되었다. 1333MHz부터 2133MHz의 범위에서 작동하는 위
상 고정 루프를 설계하였다. LPDDR4 메모리의 동작속도 영역에 맞추어 동작
하기 위해 선택 가능한 클럭 분주기를 사용하였다. 위상 고정 루프의 출력 주
파수는 LPDDR4 메모리의 동작주파수인 266MHz에서 2133MHz까지이다. 지
연 고정 루프는 266MHz부터 2133MHz 범위에서 180˚ 위상 고정하도록 설
계하였다. 위상 고정 루프는 각 트레이닝 단계인 읽기와 쓰기 동작의 데이터 
및 명령 경로에 사용하였다. 각 트레이닝 단계에서 트레이닝의 완료를 위하여 
눈 중심 찾기 방법을 이용하였다. 또한 제안하는 눈 중심 찾기 방법에 필요한 
지연 회로, 위상 보간기, 기준 발생기를 설계 및 검증하였다. 제안하는 
1x2y3x 눈 중심 찾기 방법은 기존의 2차원 눈 중심 찾기 방식에 비해 최대 
125 
 
23배 빠른 트레이닝 속도를 달성할 수 있으며, 간단한 구조로 구현이 가능하
다. 
제안하는 메모리 컨트롤러는 65nm CMOS 공정을 이용해 12mm2의 크기
로 제작되었다. 설계한 LPDDR4 메모리 컨트롤러의 동작 검증을 위해 상용 
LPDDR4 메모리를 사용하였다. 전원 입력과, 초기화, 커맨트 트레이닝, 쓰기 
평준화, 읽기와 쓰기 트레이닝과 같은 모든 트레이닝 과정의 검증은 위의 환
경에서 이루어 졌다. 저 전압 스윙 종료 드라이버와 쓰기 평준화를 포함하는 
몇몇 기능들은 4266Mbps까지 작동이 검증하였고, LPDDR4 메모리와 설계한 
메모리 컨트롤러를 연결하여 533Mbps에서 1600Mbps까지 정상 동작 함을 
확인하였다. 제안하는 눈 중심 찾기 방법은 533Mbps에서 2843Mbps까지 동
작을 검증하였다. 
 
 
주요어: LPDDR4, 모바일 메모리, 메모리 컨트롤러, 메모리 인터페이스, 송수
신기, 트레이닝 알고리즘, 눈 중심 찾기 
 
학번: 2011-30263 
