On-Chip time measurement architectures and implementation by Collins, Matthew
University of Southampton Research Repository
ePrints Soton
Copyright © and Moral Rights for this thesis are retained by the author and/or other 
copyright owners. A copy can be downloaded for personal non-commercial 
research or study, without prior permission or charge. This thesis cannot be 
reproduced or quoted extensively from without first obtaining permission in writing 
from the copyright holder/s. The content must not be changed in any way or sold 
commercially in any format or medium without the formal permission of the 
copyright holders.
  
 When referring to this work, full bibliographic details including the author, title, 
awarding institution and date of the thesis must be given e.g.
AUTHOR (year of submission) "Full thesis title", University of Southampton, name 
of the University School or Department, PhD Thesis, pagination
http://eprints.soton.ac.uk    5 
Figure 3-3: Programmable Interface Block (PIB) .........................................................42 
Figure 3-4: Switch controller block schematic..............................................................44 
Figure 3-5: Switch controller block simulations............................................................45 
Figure 3-6: Comparator control logic circuitry..............................................................45 
Figure 3-7: Comparator control simulations..................................................................46 
Figure 3-8: Adjacent switching during rise time measurement.....................................47 
Figure 3-10: Block diagram of a high speed comparator [91].......................................48 
Figure 3-11: Decision circuit.........................................................................................48 
Figure 3-12: Window comparator [57]..........................................................................49 
Figure 3-13: Switched-capacitor input sampling network [85].....................................49 
Figure 3-14: Non-overlapped clock generator [11].......................................................50 
Figure 3-15: PMOS and NMOS differential amplifiers. ...............................................50 
Figure 3-16: Rail-to-Rail Comparator ...........................................................................51 
Figure 3-17: Simulation of comparator with input difference of 10ps..........................53 
Figure 3-18: Bias circuit with start-up circuitry [11].....................................................55 
Figure 3-19: Output current verses supply voltage........................................................55 
Figure 3-20: Typical process corner (TT, 1.2V, 27 degC) ............................................57 
Figure 3-21: Worst process corner (FF, 1.08V, -40 degC)............................................57 
Figure 3-22: Best process corner (SS, 1.32V, 125 degC)..............................................58 
Figure 3-23: TVC operation...........................................................................................58 
Figure 3-24: Current TVC implementations for embedded memory characterization..59 
Figure 3-25: VTC of configurations (b) and (c). ...........................................................60 
Figure 3-26: Current steering time-to-voltage converter (TVC) ...................................62 
Figure 3-27: Simulations of TVC..................................................................................62 
Figure 3-28: Simplified view of processing block.........................................................63 
Figure 3-29: 800ps propagation delay time measurement.............................................64 
Figure 3-30: Propagation delay measurement ...............................................................66 
Figure 3-31: Rise time verses output count from the PTMA.........................................67 
Figure 3-32: Pulse width measurement..........................................................................68 
Figure 3-33: Fall time verses output count from the PTMA .........................................69 
Figure 3-34: The effect of rise and fall time on inputs when switching from one 
measurement to another.........................................................................................70 
Figure 3-35: Conversion time........................................................................................71 
Figure 4-1: Back end design flow..................................................................................74     6 
Figure 4-2: Top level hierarchy of prototype chip.........................................................75 
Figure 4-3: On- chip reference generator.......................................................................76 
Figure 4-4: Time measurement core..............................................................................77 
Figure 4-5: Rail-to-rail comparator................................................................................78 
Figure 4-6: Comparator Layout.....................................................................................79 
Figure 4-7: Time-to-voltage converter (TVC)...............................................................80 
Figure 4-8: Capacitor array............................................................................................80 
Figure 4-9: Time-to-voltage converter (TVC) layout....................................................81 
Figure 4-10: On-chip clock generation..........................................................................82 
Figure 4-11: Clock generator circuit..............................................................................82 
Figure 4-12: Generation of the 2 GHz and 2.5 GHz clocks...........................................83 
Figure 4-13: Layout of the clock generator module......................................................83 
Figure 4-14: Layout of the programmable input block..................................................84 
Figure 4-15: Layout of the digital processing block......................................................84 
Figure 4-16: Full chip layout .........................................................................................85 
Figure 4-17: Top level schematic with input and output tri-state buffers......................86 
Figure 4-18: Modules of the PTMA..............................................................................86 
Figure 4-19: Optical picture of fabricated chip..............................................................87 
Figure 4-20: Experimental test setup.............................................................................88 
Figure 4-21: Level translator .........................................................................................88 
Figure 5-1: Flip-Flop Setup and Hold Violations..........................................................93 
Figure 5-2: Low resolution time measurement architecture (LRTMA) with time 
amplifier.................................................................................................................94 
Figure 5-3: MUTEX [99]...............................................................................................94 
Figure 5-4: Time amplifier [98].....................................................................................95 
Figure 5-5: Proposed Time Measurement Architecture.................................................96 
Figure 5-6: Single balanced mixer [101].......................................................................98 
Figure 5-7: Double balanced Gilbert multiplier [104]...................................................99 
Figure 5-8: Dual-Gate Mixer [110]..............................................................................101 
Figure 5-9: Dual-gate cascode mixer...........................................................................102 
Figure 5-10: Dual Gate Mixer Simulations .................................................................103 
Figure 5:11: Conversion Gain vs Input Frequency......................................................103 
Figure 5-12: Signal flow graph representation of high Q SC LPF..............................105 
Figure 5-13: A 2
nd order low pass switched-capacitor filter with switch sharing. ......105     7 
Figure 5-14: On–resistance of transmission gate.........................................................106 
Figure 5-15: LPF Frequency Response........................................................................106 
Figure 5-16: SC LPF input and output waveforms......................................................107 
Figure 5-17: Total output noise of mixer and LPF......................................................107 
Figure 5-18: Successive Approximation ADC [8].......................................................108 
Figure 5-19: 3-bit Flash ADC......................................................................................110 
Figure 5-20: Dual Slope ADC .....................................................................................111 
Figure 5-21: N-bit dual slope ADC [8]........................................................................112 
Figure 5-22: Block diagram of the DS ADC ...............................................................113 
Figure 5-23: Block diagram of the 1st Order DS Modulator.......................................113 
Figure .5-24: 1st Order DS Modulator using Simulink®.............................................114 
Figure 5-25: Simulink Simulations..............................................................................114 
Figure 5-26: 1
st order switched-capacitor DS modulator.............................................115 
Figure 5-27: Non-overlapping clock generator............................................................115 
Figure 5-28: Non-overlapped clock timing..................................................................116 
Figure 5-29: Non-overlap clock simulation.................................................................116 
Figure 5-30: Folded cascode operational amplifier.....................................................118 
Figure 5-31: Amplifier gain across process corners....................................................119 
Figure 5-32: Amplifier Phase response across process corners...................................119 
Figure 5-33: Amplifier step response ..........................................................................120 
Figure 5-34:  High performance comparator...............................................................121 
Figure 5-35: Clock, Input and output waveforms of the integrator.............................121 
Figure 5-36: Zoomed in version ..................................................................................122 
Figure 5-37: Delta modulated output of the DS modulator .........................................122 
Figure 5-38: Frequency spectrum of the DS modulator...............................................123 
Figure 5-39: Decimation Filter....................................................................................126 
Figure 5-40: Filter output response..............................................................................126 
Figure 5-41: Proposed time measurement architecture ...............................................127 
Figure 5-42: Simulated relationship between timing resolution and the output of the 
LPF.......................................................................................................................128 
Figure 5-43: Input and output waveforms of the proposed architecture......................128 
Figure 5-44: propagation delay versus output count....................................................129 
Figure 5-45: Time measurement architecture current consumption............................130 
Figure A-1: IEEE Standard 1500 wrapper architecture [120].....................................137     10 
OSR    Over-Sampling Ratio 
PCB    Printed Circuit Board 
PIB    Programmable Input Block 
PSD    Power Spectral Density 
PTMA   Programmable Time Measurement Architecture 
PTMB   Programmable Time Measurement Block 
RF    Radio Frequency 
RMS    Root Mean Square 
SAR    Successive Approximation Register 
SC    Switched Capacitor 
SoB    System on Board 
SoC    System on Chip 
TAP    Test Access Port 
TDC    Time-to-Digital Converter 
TDI    Test Data Input 
TDO    Test Data Output 
TMA    Time Measurement Architectures 
TVC    Time-to-Voltage Converter 
VCDL   Voltage Control Delay Line 
VDL    Vernier Delay Line 
VLSI    Very Large Scale Integration 
WBR    Wrapper Boundary Register 
WBY    Wrapper Bypass Register 
WIR    Wrapper Instruction Register 
WPI    Wrapper Parallel Input 
WPO    Wrapper Parallel Output 
WSO    Wrapper Serial Output 
WSI    Wrapper Serial Input 
 Introduction    19 
signal. This additional phase delay will appears as an error in the timing measurement 
of the timing performance parameters being carried out [10]. 
 
Observability: In addition, modern integrated circuits (ICs) are frequently condensed 
and have highly integrated levels of functionality. Routing out embedded nodes from 
deeply buried cores to the pins of the device at the chip boundary for observation is 
often impossible and impractical. The resistive and capacitive parasitic effects will not 
only increase the electrical distance, but will also attenuate and skew the timing results 
[12]. 
 
The difficulties in integrated circuit (IC) testing, have risen due to the shortage of I/O 
points.  VLSI  SoC  devices  have  a  limited  number  of  input  and  output  pins  and  in 
addition there are usually multiple cores, which are integrated onto the same silicon [4]. 
Therefore, the test points for a particular core maybe deeply buried and the access from 
external  ATE  is  impossible.  Signal  distortions  and  noise  disturbances  in  interface 
connections from the ATE to the device under test (DUT) may exist and introduce 
timing errors in the measurement. In addition, there maybe difficulty in synchronising 
the test object’s timing with the tester timing. There is also the cost of the large volume 
of test data to be processed although research into data compression techniques are 
being carried out [7] to minimise this cost. Also the external ATE may have limited 
performance  compared  to  the  DUT.  This  will  limit  the  capability  of  the  tester  for 
measuring timing measurements of today’s high performance VLSI devices. Therefore, 
the  cost  for  running  the  tests,  as  well  as,  the  ATE  equipment  itself  is  high.  It  is 
predicted by the ITRS [1] that costs will rise further towards $20 million [1, 13]. If the 
cost of test is not lowered, testing will have a negative impact on the cost of design, 
leading to an increase in the overall production cost [1, 14-16]. These problems with 
quality and cost of external ATE will continue to get worse for high speed, high density 
VLSI devices, thus rendering external ATE expensive, inaccurate and is unacceptable 
[13]. 
 
This has led to research into on-chip test solutions such as built-in self test (BIST), 
where  the  tester  or  part  of  the  tester  is  situated  on  the  same  silicon  as  the 
device/embedded  core  under  test.  There  have  been  a  number  of  BIST  solutions Introduction    22 
In Chapter 2, Time measurement architectures (TMAs) based on the Time-to-
Digital Conversion (TDC) technique have been the focus of much work in on-chip time 
measurement  testing.  Although  researchers  have  developed  numerous  time 
measurement architectures [10], they are only capable of performing a limited number 
of time measurements with the use of duplicating or adding additional circuitry. In this 
chapter the current time measurement techniques are presented. 
Chapter 3, describes a new programmable time measurement architecture that 
can be programmed to measure four types of measurements; rise time, fall time, pulse 
width and propagation delay is proposed. 
To analysis the practical performance of the time measurement architecture, a 
prototype  chip  has  been  fabricated.  Chapter  4  describes  how  the  proposed 
programmable time measurement architecture is implemented in a CMOS process. Post 
silicon  results  from  a  test  chip  that  was  fabricated  are  presented  and  a  detailed 
description of the test setup is described. 
The  International  Technology  Roadmap  for  Semiconductors  (ITRS)  is 
predicting  that  by  2010  clock  frequencies  of  high  performance  VLSI  devices  will 
increase into the tens of GHz. To perform timing performance measurements of such 
devices, timing measurement architectures with capabilities of tens of femtoseconds 
will be required [1]. Whilst there are various architectures capable of achieving timing 
resolutions  in  the  region  of  picoseconds  [21],  no  single  timing  measurement 
architecture  has  been  reported  that  is  capable  of  achieving  femtosecond  resolution 
which is needed to verify the timing performance of future VLSI devices. 
In Chapter 5, new time measurement architecture for femtosecond resolution 
time measurement is proposed and simulation results are presented. 
Finally, Chapter 6 summaries the presented work and concludes this thesis. The 
contributions outlined in Chapters 3, 4 and 5 resulted in original work published in [22-
24] and are itemised in Appendix E. 
 Literature review    37 
being required on-chip to measure the required time measurement for a single CUT and 
this  may  not  be  acceptable  for  some  applications.  The  proposed  time measurement 
architecture  in  Chapter  3  addresses  this  problem  of  measuring  multiples  of  time 
measurements with a single architecture, thereby eliminating the need to reproduce or 
add circuitry in order to obtain different types of time measurements. Many of the 
proposed architectures in the literature review have based their results on simulation 
alone. Therefore, practical validation is needed to gain maturity. 
 
 Programmable Time Measurement Architecture (PTMA)  72 
  In  the  next  chapter,  the  physical  and  practical  validation  of  the  proposed 
programmable time measurement architecture is described. 
 
 Implementation of PTMA    91 
The limitations of the fabricated  chip are that the chip can only make time 
measurements  sequentially  rather  than  in  parallel.  The  advantage  of  making  time 
measurements in parallel would reduce the overall time cost for the testing. The other 
limitation of the chip is that the resolution of the time measurement architecture is 
limited by the propagation delay of the comparator which is simulated to be 175ps 
using the typical process corner, as shown in Chapter 3, section 3.3.2. So, in order to 
improve the resolution of time measurement architectures, a new design is presented in 
chapter 5. 
 
 Homodyne Time-to-Digital Conversion    124 
not effect the shape of the response. IIR filters have a unit sample response that is of 
infinite duration. FIR filters are less sensitive to rounding errors in the coefficients and 
computation.  They  are  incapable  of  becoming  unstable,  whereas,  IIR  filters  can 
produce oscillations as a result of non-linearity caused by overloading or quantization 
errors [116, 117]. 
The  advantages  of  using  a  FIR  topology  as  apposed  to  an  IIR  filter  are  as 
follows [116, 117]. They can easily be designed for "linear phase". Linear-phase filters 
would delay the input signal, but do not distort its phase. They are relatively easy to 
implement. On most DSP microprocessors, the FIR calculation can be done by looping 
a  single  instruction.  They  are  suited  to  multi-rate  applications;  either  "decimation" 
(reducing the sampling rate), "interpolation" (increasing the sampling rate), or both. 
Whether  decimating  or  interpolating,  the  use  of  FIR  filters  allows  some  of  the 
calculations to be omitted, thus providing an important computational efficiency. In 
contrast,  if  IIR  filters  are  used,  each  output  must  be  individually  calculated,  even 
though the output may be discarded, so the feedback will be incorporated into the filter. 
They  have  desirable  numeric  properties.  In  practice,  all  DSP  filters  must  be 
implemented using "finite-precision" arithmetic that is a limited number of bits. The 
use of finite-precision arithmetic in IIR filters can cause significant problems due to the 
use of feedback, but FIR filters have no feedback and they can usually be implemented 
using fewer bits. The FIR filter can be implemented using fractional arithmetic. Unlike 
IIR  filters,  it  is  always  possible  to  implement  a  FIR  filter  using  coefficients  with 
magnitude of less than 1 and the overall gain of the FIR filter can be adjusted at its 
output, if desired. This is an important consideration when using fixed-point DSP's, as 
it makes the implementation easier. 
The disadvantages of using an Impulse Response (IIR) filter are as follows. 
They  are  more  susceptible  to  problems  of  finite-length  arithmetic,  such  as  noise 
generated by calculations, and limited cycles. This is a direct consequence of feedback: 
when  the  output  is  not  computed  perfectly  and  is  fed  back,  the  imperfection  can 
compound. It is more difficult to implement the IIR using fixed-point arithmetic. They 
do not offer the computational advantages of FIR filters for multi-rate (decimation and 
interpolation) applications. 
This  latter  condition  can  be  satisfied  by  using  a  linear  phase  finite  impulse 
response (FIR) LPF. Since the output signal of the modulator is a single bit stream, it 
may be practical to use a single-stage high order linear-phase FIR filter, since there are Homodyne Time-to-Digital Conversion    132 
gigahertz. This has been made possible through appropriate selection of mixer, filter 
and  data  conversion  techniques.  Simulations  using  SPECTRE  models  based  on  the 
0.12mm CMOS process show that measurements are capable with a resolution of 42fs 
which is the highest reported-to-date. 
 
 Conclusions and Further Research Directions  134 
measurements. They are rise and fall times, pulse width and propagation type time 
measurements. Simulations based on a 1.2V CMOS process show that it is possible to 
obtain these multiple measurements with high resolution.  
Chapter 4 presents a detailed implementation and verification of a prototype 
chip  of  the  programmable  time  measurement  architecture  fabricated  using  a  1.2V 
CMOS  process.  The  high  performance  of  the  time  measurement  architecture  is 
achieved  by  the  use  of  careful  consideration  of  the  use  of  mixed-signal  layout 
techniques, such as minimisation of noise and mismatch of devices. This was achieved 
through  the  use  of  common  centriod  layout  techniques  and  shielding  of  carefully 
selected components. 
The  advantage  of  this  programmability  and  flexibility  is  in  the  ability  to 
perform different types of on-chip time measurements and the potential to reduce the 
overall time cost of chip testing. Another advantage of this architecture is that it can be 
easily automated. Automation of on-chip time measurement testing is possible using 
the proposed PTMA using either an on-chip embedded microcontroller or an off-chip 
programmable device such as a field programmable gate array (FPGA). 
Realising  the  potential  of  the  fabricated  programmable  time  measurement 
architecture, chapter 5, has proposed a novel time measurement architecture that is 
capable of tens of femtosecond timing measurements. This was achieved using the 
TDC method using the homodyne technique. This technique uses a different approach 
to the problem of sub picoseconds by means of an analogue/RF solution. It has been 
shown  that  by  using  a  TDC  incorporating  a  frequency  domain  method,  higher 
resolutions are capable compared with time domain methods.  
With regards to the practicality of these two time measurement architectures, 
the number of time measurement architectures required on a chip, will be determined 
on how large the chip is and how much is to be tested. With small designs only one or 
two  time  measurement  blocks  may  be  required  and  the  connections  can  be  easily 
multiplex. However, in larger SoC devices where the CUTs are placed far apart from 
one side of the chip to the other, then there may be more time measurement blocks 
required, simply to minimise the parasitcs from the CUT to the time measurement 
block.  
Although  setup  and  hold  times  on  register  files  are  important  time 
measurements on digital circuitry, often rise and fall time specifications are important 
in  analogue  and  mixed-signal  circuits.  Therefore,  the  Programmable  Time Appendix A    138 
 
The IEEE standard 1500 wrapper architecture consists of an instruction register known 
as  the  wrapper  instruction  register  (WIR).  The  wrapper  instruction  register  (WIR) 
configures  the  1500  wrapper  into  different  modes  of  operation  determined  by  an 
instruction shifted into the WIR register. A wrapper boundary register (WBR) provides 
access  to  the  core  terminals  where  data  can  be  shifted,  captured,  updated  and 
transferred. A wrapper bypass register (WBY) is provided as a bypass in serial mode. It 
is  intended  for  use  when  several  IEEE  1500  wrappers  are  chained  together,  thus 
providing a minimum length scan path through the wrapper. 
 
 
 Appendix C    142 
; Date : 22/03/06  REV: 1 
; For PIC18FXXX 
; Function 
; -------- 
; Program MC01TDC Module 
 
    list p=18f452 
 
; Include file, change directory if needed 
#include <p18f452.inc> 
 
; Start at the reset vector 
Reset_Vector  code 0x000 
 
; Start application beyond vector area 
 code  0x002a 
 
goto  init 
 
;******** THE SUBROUTINES START HERE ******** 
init   ; the Program starts here 
 
;******** MEMORY EQUATES ******** 
; You define names and allocation as required by the program 
 
  goto  start 
 
 
MODE1 ; Rise Time Measurement 
 
  bcf    PORTB,RB3 ;set PORTB BIT3 High (MODE0) 
  bcf    PORTB,RB4 ;set PORTB BIT4 High (MODE1) 
 
  bsf    PORTB,RB5 ;set PORTB BIT5 High (START) 
 
; Vin_1   
  bcf    PORTB,RB0 ;set PORTB BIT6 High 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
  bsf    PORTB,RB0 ;set PORTB BIT6 High 
  bsf    PORTB,RB0 ;set PORTB BIT6 High 
  bsf    PORTB,RB0 ;set PORTB BIT6 High 
 
 
MODE2 ; Fall Time Measurement 
 
  bcf    PORTB,RB3 ;set PORTB BIT3 High (MODE0) 
  bsf    PORTB,RB4 ;set PORTB BIT4 High (MODE1) 
 
  bsf    PORTB,RB5 ;set PORTB BIT5 High (START) 
 
  bsf    PORTB,RB0 ;set PORTB BIT6 High 
  bsf    PORTB,RB0 ;set PORTB BIT6 High 
  bsf    PORTB,RB0 ;set PORTB BIT6 High 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
 
MODE3 ; Pulse Width Measurement 
   
  bsf    PORTB,RB3 ;set PORTB BIT3 High (MODE0) Appendix C    143 
  bcf    PORTB,RB4 ;set PORTB BIT4 High (MODE1) 
 
  bsf    PORTB,RB5 ;set PORTB BIT5 High (START) 
 
; Vin_1   
  bcf    PORTB,RB0 ;set PORTB BIT6 High 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
  bsf    PORTB,RB0 ;set PORTB BIT6 High 
  bsf    PORTB,RB0 ;set PORTB BIT6 High 
  bsf    PORTB,RB0 ;set PORTB BIT6 High 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
  bcf    PORTB,RB0 ;set PORTB BIT6 Low 
 
  bcf    PORTB,RB5 ;set PORTB BIT5 Low (START) 
 
MODE4 ; Propagation Delay Measurement 
 
  bsf    PORTB,RB3 ;set PORTB BIT3 High (MODE0) 
  bsf    PORTB,RB4 ;set PORTB BIT4 High (MODE1) 
 
  bsf    PORTB,RB5 ;set PORTB BIT5 High (START) 
 
; Vin_1   
  bcf    PORTB,RB0 ;set PORTB BIT6 High 
  bcf    PORTB,RB0 ;set PORTB BIT6 High 
  bcf    PORTB,RB0 ;set PORTB BIT6 High 
  bsf    PORTB,RB0 ;set PORTB BIT6 High 
   
; Vin_2 
  bcf    PORTB,RB1 ;set PORTB BIT6 Low 
  bsf    PORTB,RB1 ;set PORTB BIT7 High 
 
 
;******** MAIN PROGRAM ******** 
 
start 
  clrf  PORTB  ; Clear PORTB 
clrf  TRISB   ; PORTB all outputs 
  clrf  PORTA ; Clear PORTA 
  clrf  TRISA  ; PORTA all outputs 
 
 
PWRUP   bcf  PORTB,RB0 ;set PORTB BIT0 High (PWRUP) 
BGAP_EN  bcf  PORTB,RB1 ;set PORTB BIT1 High (BGAP_EN) 
N0_MEM  bsf  PORTB,RB2 ;set PORTB BIT2 High (No_MEM) 
    CALL   MODE1 
 
  end   ;and ends here - this must appear on the last line of the program so the assembler knows 
where to stop 
 
 