High Performance Computing on Fast Lock Delay Locked Loop with Low Power State and Simultanoeus Switching Noise Reduction by V. J.S. Kumar & T. S. Karthik
Journal of Computer Science 8 (3): 305-309, 2012 
ISSN 1549-3636 
© 2012 Science Publications 
Corresponding Author:  Karthik, T.S., Department of Electronics and Communication Engineering, College of Engineering-Guindy, 
Anna University, Chennai-600 025, India 
305 
 
High Performance Computing on Fast Lock Delay Locked  
Loop with Low Power State and Simultanoeus Switching Noise Reduction 
 
Karthik, T.S. and V. Jawahar Senthil Kumar 
Department of Electronics and Communication Engineering, 
 College of Engineering-Guindy, Anna University, Chennai-600 025, India 
 
Abstract: Problem statement: In any multimedia processor, controller may consume most of the on-
chip memory resources. The memory requirement is directly depends on algorithm shared by different 
blocks, so leads to failure in the system models. Approach: This study presents the implementation of 
DLL unit used for memory optimization. Various aspects of the underlying coarse lock detector are 
explored and modifications are made with software reference implementation. The whole system is 
implemented in 0.18 µm CMOS technology, where an input reference clock to an outgoing data clock 
monitors  and  true  locking  is  initialized  with  50%  duty  cycle  correction.  Results:  From  the 
measurement result of DLL operation, the output clock jitter is analyzed. Power consumption of DLL 
including  large  size  output  buffer  is  about  few  mW.  Conclusion:  The  great  challenge  in  this 
implementation  is  communication  bandwidth,  has  brought  to  process  variation  and  power  state 
reduction techniques. In addition, inefficiency of computing capacity and simultaneous switching noise 
is reduced in the real time applications. 
 
Key words: Delay locked loop, voltage control delay line, coarse lock detector, process variation, 
simultaneous switching noise 
 
INTRODUCTION 
 
  In  recent  years,  the  performance  is  the  most 
important in real time image and video applications. As 
circuit speed increases with shrinking device dimension, 
the clock frequencies increase and the effects of clock 
skew  and  jitter  on  a  system  becomes  an  increasingly 
larger percentage of data valid window (tDQV). If false 
lock prevails, by detecting and correcting  the pulse to 
withstand the data. There is no system exists to handle 
this type of issues in the realistic implementations.  
   A  Double-Data-Rate  Synchronous  Dynamic 
Random  Access  Memory  (DDR  SDRAM)  is  an 
example  of  an  application  that  uses  a  Delay-Locked 
Loop (DLL) to maximize the data valid window (Shin 
et al., 2009b). When the data valid window shrinks, the 
integrity  of  the  system  is  detrimentally  affected  and 
high performance suffers. If  a fixed amount of delay 
were used instead of a DLL to align the incoming clock 
and  output  data,  then  variations  in  PVT  would 
significantly increase clock skew (tDQSCK). This increase 
effectively  shrinks  the  data  valid  window  (tDQV)  and 
makes the system more subject to timing errors. In a 
Very Large Scale Integrated (VLSI) circuit design, it is 
advantageous to use a digital DLL design (Garlepp et 
al., 1999), for its portability across process nodes. 
MATERIALS AND METHODS 
 
  This  study  proposes  a  new  approach  to  achieve 
fast, true locking and evaluated the  impact of  timing 
jitter. In a PLL implementation the own reference clock 
that is phase-locked to an external clock pulse, where 
noise on the reference clock dominates and self-induced 
jitter  within  the  VCO  is  negligible.  In  multimedia 
(streaming) applications, it processes more data, which 
stresses  communication  bandwidth,  raises  more 
computational  complexity.  So  the  multistage  clock 
buffer implementation with a long inverter chain leads 
to drive a heavy capacitive load. It is difficult to hold 
the clock duty cycle at its ideal value 50%, irregularity 
in  signal  and  variation  (P  and  N  MOS)  in  the  long 
buffer. So the clock duty cycle deviates, the clock pulse 
may  vanish  inside  the  clock  buffer,  width  becomes 
Shorten or lengthy (Garlepp et al., 1999). 
  A delay buffer acts as a timing adjusting element, 
is  of  identical  structure.  Here  Current  Starved  Delay 
Element (CSDE) is used for the realization of the delay 
buffer (Stojcev and Jovanovic, 2008). The contribution 
for the design was the following: J. Computer Sci., 8 (3): 305-309, 2012 
 
306 
·  Independent  delay  regulation  in  both  rising  and 
falling edges 
·  The  current  variation  in  p-type  and  n-type  MOS 
transistor, leads to independent delay regulation 
 
  In  this  approach,  control  voltages  directly  drive 
gates of transistors, act as symmetric loads in the Fig. 1 
and  are  used  for  two  purposes:  (a)  linearizing  the 
voltage  to  delay  transfer  function  and  (b)  provides 
correct  initiator  for  DLL  operation  even  control 
voltages  are  out-of-regulation  limits.  If  mismatches 
between  taps  are  negligible,  the  tap  delay  is 
independent of device parameters even in the presence 
of temperature and process variations. DLLs have much 
more  relaxed  tradeoffs  among  gain,  bandwidth  and 
stability since it can be designed as a first-order system, 
a simple capacitor for the loop filter. 
  Here,  Current  Starved  Delay  Element  (CSDE) 
offers good delay stability in respect to temperature and 
voltage variations. But it is relatively limited range of 
delay regulation. The nominal delay which corresponds 
to  one  clock  period  determines  the  optimization  of 
delay elements. 
  The  proposed  DLL  architecture  is  shown  in  the 
Fig.  2.  The  clock  aligner  is  composed  of  a  voltage 
controlled delay line, VCDL, two phase detectors, PD1 
and PD2, two charge-pumps, CP1 and CP2, two first 
order low-pass filters, LP1 and LP2 and a multistage 
clock  buffer,  CB.  The  negative  feedback  in  the  loop 
adjusts the delay through the VCDL by integrating the 
phase  shift  errors  that  result  between  the  periodic 
reference  input,  CLKref  and  the  multistage  output, 
CLKout.  The  underlying  idea  for  this  approach  is  to 
provide delay regulation for both a rising and trailing 
edge  of  the  output  clock  pulse  CLKout.  For 
implementation  of  variable  delay  regulation  the 
building block VCDL is used. The control voltage Vbn 
(Vbp)  defines  delay  regulation  of  a  rising  (trailing) 
clock  pulse  edge.  The  phase  detector  PD1  (PD2) 
compares  a  phase  shift  of  rising  (trailing)  edges 
between  the  input,  CLKin  and  output,  CLKout,  clock 
pulses.  
  The  LP1’s  (LP2’s)  output,  Vctrl1  (Vctrl2),  is 
connected  to  the  VCDL  control  input  at  node  Vbn 
(Vbp).When the system enters in stable state both edges 
of CLKout are synchronized and phase shifted in respect 
to the referent clock CLKref. 
 
 
 
Fig. 1: Block diagram of DLL 
  A false lock should be avoided when the maximum 
delay  of  VCDL  equals  to  the  twice  of  input  clock 
period.  If  the  VCDL  electrical  length  is  2TCLK,  the 
quadrature output is TCLK/2 from CLK-IN instead of 
TCLK/4.  Another  way  to  implement  DLL  uses  all-
digital techniques and it can be designed as a 0th-order 
system, where no integration is taken place in such a 
system. The delay is changed using method one, i.e., 
changing the number of delay taps (Tu, 2006). The big 
benefit  of  all  digital  DLLs  is  easy  to  scale  for  other 
processes and applications 
  The  coarse  lock  detector  receives  an  input  clock 
and odd-numbered phases, from the VCDL. The HCLD 
generates a clock whose frequency is half the input’s 
and counts edges in its every evaluation phase (Chi et 
al.,  2011),  thus  it  can  avoid  harmonic  lock  and  stuck 
problems  without  requiring  any  external  reset.  The 
conventional CLD has shortcomings in speed and area. 
To overcome these problems, before entering a flip-flop, 
a  clock  is  delayed  for  the  same  delay  amount  of  the 
counting logic and we can acquire some timing margin. 
  At first, the HCLD locks in a narrow mode. After 
the coarse lock lasts 3 cycles, it changes the coarse lock 
range to a wide mode. The specifications also limit the 
allowable  clock  jitter  to  Gaussian  (or  normal) 
distribution. If the clock jitter is not Gaussian in nature, 
then the clock violates the timing specifications. In a 
DLL  design,  the  effect  of  SSN  ripples  must  be 
considered  along  with  harmonic  lock  and  stuck 
problems. As system bandwidth increases, power and 
ground  distribution  for  high-speed  systems  becomes 
critical (Oklobdzija, 2003). In these high performance 
systems,  Delay  Locked  Loops  (DLLs)  and  Phase 
Locked Loops (PLLs) are usually used to generate the 
clock  signal  which  is  required  to  implement  clock 
deskewing  circuit  in  RF  transceiver,  inter-chip 
communication  interfaces  and  clock  distribution. 
Variations  in  this  timing  reference  (i.e.,  clock  jitter) 
require designs to incorporate additional margins that 
degrade  performance  and  can  cause  bit  errors  in 
communications  systems.  When  DLL  circuit  is 
operating  in  real  on-chip  circumstance,  it  is  suffered 
from several noise sources. The major noise source of 
the DLL circuit include Simultaneous Switching Noise 
(SSN)  from  external  circuit,  phase  detection  noise, 
Voltage Controlled Delay Line (VCDL) internal noise 
from substrate, external reference clock input noise and 
VCDL control voltage noise. The data rate of current 
DDR3/4 systems is expected to move from 2Gbps to 
4Gbps. At such high data rates, SSO noise introduced 
by  output  drivers  becomes  the  major  bottleneck  in 
designing memory channels. GDDR3/4 is based on the 
Pseudo  Open  Drain  Logic  (PODL)  (single  ended) 
signaling.  It  generates  substantial  AC  current  peaks J. Computer Sci., 8 (3): 305-309, 2012 
 
307 
when  output  drivers  are  simultaneously  switching. 
These  current  peaks  generate  a  large  amount  of 
(Simulataneous  Switching  Output)  SSO  noise  in  the 
system  if  the  impedance  of  the  power  distribution 
system is not sufficiently low. 
  While SSO noise by itself can be simulated without 
much  difficulty,  its  impact  on  channel  voltage  and 
timing  margin  is  much  harder  to  characterize.  Co-
simulation  of  PDN  and  channel  model  requires  long 
simulation time and often results in convergence issues 
(Chun et al., 2001). Furthermore, supply noise in the 
system strongly depends on the switching data pattern. 
The  worst  case  data  pattern  for  supply  noise  is  a 
function of the PDN resonance; whereas, the worst case 
pattern for signal noise, such as crosstalk and Inter-
Symbolic Interference (ISI), depends on the channel 
transfer  function.  Therefore,  finding  the  excitations 
that models the worst case system voltage and timing 
margin  considering  both  signal  and  power  integrity 
effects is a challenging task. 
  Conventional  charge-pump  type  Delay  Locked 
Loop  (DLL)  usually  has  been  applied  for  multiphase 
clock  generator  in  transceiver  system  (Shin  et  al., 
2009a).  The  core  loop  consists  of  eight  pseudo 
differential delay elements including dummy elements, 
a  self-bias  circuit  for  regulated  bias  voltage,  a 
differential charge pump  and  a  linear  phase  detector. 
There are two kinds of filter capacitors in DLL. Loop 
filter  capacitor  decides  the  loop  bandwidth  of  DLL. 
Bias  filter  capacitor  is  inserted  for  reduction  output 
jitter.  Each  filter  capacitance  can  be  changed  by 
external control switch. 
  Hence the effect of SSN such as: 
 
·  Fluctuation of on-chip VDD and VSS 
·  Reduce noise margins of digital circuits 
·  Change the Operating Point of Analog Circuits 
·  Increase the Timing Jitter of Oscillators/Clocks can be 
reduced with the help of power distribution network 
 
  Inter-die and Intra-die variations present significant 
power-speed-yield trade-offs. The problem of process 
variations become all  the  more predominant  with the 
scaling  of  devices  for  each  new  generation.  The 
problem  of  process  variations  is  all  the  more 
complicated  in  analog  circuits  (Kinget,  2005).  They 
have a considerable effect on the bias conditions, gain, 
frequency response and bandwidth of the circuit.  
  Another technique is the design methodology to 
develop  circuits  that  compensate  for  process 
variations  without  the  need  for  post-fabrication 
efforts.  The  main  advantage  of  this  methodology 
stems from the fact that we can use it to optimize the 
circuit  to  reduce  variation  on a parameter which 
we  consider  important  such   as   gain,   bandwidth.  
 
 
Fig. 2: Proposed DLL architecture 
 
The timing error in a DLL accumulates over only one 
cycle of the input frequency. Hence the random timing 
error in one cycle is independent  or  uncorrelated  to 
the random timing error of the next cycle. Thus there 
is a flat region in the phase noise plot and rolls off as 
the  two  timing  indices  approach  each  other  within 
the period of the reference crystal frequency. 
  DLL suffers due to simultaneous switching noise. 
Hence  DLL  used  in  memory  controllers  must  be 
designed in such a way that it does not affect due to 
SSN  and  harmonic  stuck  problem.  Hence  DLL  for  a 
memory  controller  is  designed  using  a  Hysteresis 
Coarse  Lock  Detector  (HCLD).  With  the  proposed 
HCLD,  a  DLL  becomes  immune  to  SSN,  free  from 
harmonic  lock  and  stuck  problems  without  a  reset 
signal and faster than that using a conventional Coarse 
Lock Detector (CLD). 
  Under  an  SSN  environment  in  a  memory 
controller,  control  voltage  is  unstable  even  in  a  lock 
state. In a conventional CLD, this environment breaks 
the  lock  state  and  the  CLD  recovers  the  coarse  lock 
quickly  again,  which  will  happen  continuously  at  all 
times. This can be a jitter source because a frequency 
tracking loop and a phase tracking loop may interfere 
with  each  other  during  the  interval  when  the  CLD 
transfers control signal to the PD and vice versa. 
 
RESULTS 
 
  Hence  the  threshold  variations  can  be  reduced 
with  the  help  of  sizing  and  calibrating  the  delays 
which makes the DLL to be process invariant. The 
Fig. 3, gives the analysis of percentage of duty cycle 
error. In a Hystersis CLD (HCLD), once a lock state is 
entered (Lin and Huang, 2004), the coarse lock range 
becomes wide as shown in the Fig. 4. So the PD keeps 
controlling, hence jitter is reduced. Moreover, by using 
hysteresis, it controls the coarse lock range, thus reduces 
jitter. The DLL neither suffers from harmonic lock and 
stuck problems  nor  needs an external reset or start-up 
signal. J. Computer Sci., 8 (3): 305-309, 2012 
 
308 
 
 
Fig. 3: Duty cycle-error free estimation 
 
 
(a) 
 
 
(b) 
 
Fig. 4: (a) Output of HCLD in Lecroy oscilloscope (b) 
Phase shifted locking timing diagram 
  
DISCUSSION 
 
  The memory interface used to operate in the active 
state, where the signals are all active mode of operation. 
The  high-speed  reference  clocks  distributed  across 
both the controller and DRAM interfaces. Based on 
the  command  traffic  from  the  memory  host 
controller,  the  appropriate  low  power  states  are 
employed (Leibowitz et al., 2010; Balamurugan et al., 
2008; Lee et al., 2009; Poulton et al., 2007). 
  When the CA controller queue gets empty, the last 
command  rise  to  clock  pause  state.  The  controller 
synchronously pause the PLL output by halting all the 
switching activity. The Table 1 shows the performance 
report on proposed DLL.  Finally the  work concludes 
the duty-cycle of CLKout is maintained at value of 50%.  
 
 
Fig. 5: Memory  Transactions  on  Supply  current  Vs 
Data rate 
 
Table 1: Summary report  
Frequency range  500-1200MHz 
Locking time   200ns 
Duty cycle error  0.8% 
Edge correction  Double Edge 
Duty cycle correction  20-80% 
Reference clock   External 
Coarse lock  Hysteresis type 
Process variation  Invariant to threshold voltages 
Measured Jitter (RMS and PP)  3.2 ps at 500 MHZ and 
   20.5 ps at 1100 MHZ -pp 
 
This operation disables their front-end DQ transceiver 
circuits,  since  does  not  receive  any  signal  transition. 
The clock pause operation (Balamurugan et al., 2008), 
responds to programmable  number of successive  No-
Operation (NOP) commands. When the host controller 
requests a memory transaction, then the interface exits 
(Lee  et  al.,  2009),  synchronously  un-pausing  the 
interface clocks. The front-end DQ transceiver circuits 
(Balamurugan  et  al.,  2008)  are  initialized  to  the 
appropriate  read  or  write  configuration  after  the  first 
CA command is communicated as shown in the Fig. 5. 
It is similar to the front-end power configuration change 
in  a  normal  read/write  bus  turn  around  during  active 
operation. J. Computer Sci., 8 (3): 305-309, 2012 
 
309 
CONCLUSION 
 
  A  prototype  circuit  is  designed  in  0.18  µm 
technology,  duty  cycle  error  reduces  to  0.8%  which 
makes  good  stability,  fast-responsive.  The  DLL  is 
optimized  for  reduction  in  the  variation  of  threshold 
voltage. Moreover, by using hysteresis, it controls the 
coarse lock range, thus reduces jitter. While the actual 
power consumption of a burst-mode interface will be 
heavily dependent on real-world read and write activity 
patterns. So the improvement enhances the theoretical 
power and bandwidth scaling capabilities provided by 
these  power  states.  Therefore,  judicious  use  of  low 
power  states  can  maintain  good  overall  power 
efficiency. Also the inefficiency of computing capacity 
is  reduced.  Finally,  overall  power  efficiency  shows 
more  than  two  orders  of  magnitude  in  effective 
interface  bandwidth.  The  proposed  work  can  also  be 
applied to clock distribution network within the SOCs, 
high-speed DRAM and MEMS devices. 
 
REFERENCES 
 
Balamurugan, G., J. Kennedy,  G. Banerjee,  J.E. Jaussi 
and M. Mansuri et al., 2008. A scalable 5-15 Gbps, 
14-75  mW  low-power  I/O  transceiver  in  65  nm 
CMOS. IEEE J. Solid-State Circ., 43: 1010-1019. 
DOI: 10.1109/JSSC.2008.917522 
Chi, H.K., M.S. Hwang, B.J. Yoo, W.J. Choe and T.H. 
Kim et al., 2011. A 500 MHz-to-1.2 GHz reset free 
delay  locked  loop  for  memory  controller  with 
hysteresis  coarse  lock  detector.  J.  Semiconductor 
Technol.  Sci.,  11:  73-79.  DOI: 
10.5573/JSTS.2011.11.2.073 
Chun, S., M. Swaminathan,  L.D. Smith,  J. Srinivasan 
and Z. Jin et al., 2001. Modeling of simultaneous 
switching noise in high speed systems. IEEE Trans. 
Adv.  Packag.,  24:  132-142.  DOI: 
10.1109/6040.928747 
Garlepp, B.W., K.S. Donnelly, J. Kim, P.S. Chau and 
J.L. Zerbe et al., 1999. A portable digital DLL for 
high-speed CMOS interface circuits. IEEE J. Solid-
State Circ., 34: 632-644. DOI: 10.1109/4.760373 
Kinget, P.R., 2005. Device mismatch and tradeoffs in the 
design of analog circuits. IEEE J. Solid-State Circ., 
40: 1212-1224. DOI: 10.1109/JSSC.2005.848021 
 
 
 
 
 
 
Lee, H., K.Y.K. Chang, J.H. Chun, T. Wu and Y. Frans 
et al., 2009. A 16 Gb/s/link, 64 GB/s bidirectional 
asymmetric memory interface. IEEE J. Solid-State 
Circ.,  44:  1235-1247.  DOI: 
10.1109/JSSC.2009.2014199 
Leibowitz, B., R. Palmer, J. Poulton, Y. Frans and S. Li 
et al., 2010. A 4.3 GB/s mobile memory interface 
with  power-efficient  bandwidth  scaling.  IEEE  J. 
Solid-State  Circ.,  45:  889-898.  DOI: 
10.1109/JSSC.2010.2040230 
Lin, W.M. and H.Y. Huang, 2004. A low-jitter mutual-
correlated pulsewidth control loop circuit. IEEE J. 
Solid-State  Circ.,  39:  1366-1369.  DOI: 
10.1109/JSSC.2004.831499 
Oklobdzija,  V.G.,  2003.  Digital  System  Clocking: 
High-Performance  and  Low-Power  Aspects.  1st 
Edn.,  John  Wiley  and  Sons,  New  York,  ISBN: 
047127447X, pp: 245.  
Poulton,  J.,  R.  Palmer,  A.M.  Fuller,  T.  Greer  and  J. 
Eyles et al., 2007. A 14-mW 6.25-Gb/s Transceiver 
in  90-nm  CMOS.  IEEE  J.  Solid-State  Circ.,  42: 
2745-2757. DOI: 10.1109/JSSC.2007.908692 
Shin, D., K.J. Na, D. Kwon, J.H. Kang and T. Song et 
al.,  2009a.  Wide-range  fast-lock  duty-cycle 
corrector  with offset-tolerant duty-cycle detection 
scheme for 54nm 7Gb/s GDDR5 DRAM Interface. 
Proceedings of the Symposium on VLSI Circuits, 
Jun. 16-18, IEEE Xplore Press, Kyoto, Japan, pp: 
138-139.  
Shin, D., C. Kim, J. Song and H. Chae, 2009b. A 7 ps 
Jitter 0.053 mm2
 Fast Lock All-Digital DLL with a 
Wide Range and High Resolution DCC. IEEE J. 
Solid-State  Circ.,  44:  2437-2451.  DOI: 
10.1109/JSSC.2009.2021447  
Stojcev,  M.  and  G.  Jovanovic,  2008.  Clock  aligner 
based  on  delay  locked  loop  with  double  edge 
synchronization. Microelect. Reliab., 48: 158-166. 
DOI: 10.1016/j.microrel.2007.02.025  
Tu, S.H.L., 2006. A differential pulsewidth control loop 
for  high-speed  VLSI  systems.  IEEE  Trans.  Circ. 
Syst.  II:  Exp.  Briefs,  53:  417-21.  DOI: 
10.1109/TCSII.2006.869911 