Matrix: a 15 ps resistive interpolation TDC ASIC based on a novel regular structure by Mauricio, J. et al.
2016 JINST 11 C12047
Published by IOP Publishing for Sissa Medialab
Received: November 7, 2016
Accepted: December 9, 2016
Published: December 19, 2016
Topical Workshop on Electronics for Particle Physics,
26–30 September 2016,
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
MATRIX: a 15ps resistive interpolation TDC ASIC based
on a novel regular structure
J. Mauricio,1 D. Gascón, D. Ciaglia, S. Gómez, G. Fernández and A. Sanuy
FQA, ICC, Universitat de Barcelona,
Avinguda Diagonal 647, Barcelona, Spain
E-mail: jmauricio@icc.ub.edu
Abstract: This paper presents a 4-channel TDC ASIC with the following features: 15-ps LSB
(9.34 ps after calibration), 10-ps jitter, < 4-ps time resolution, up to 10MHz of sustained input rate
per channel, 45mW of power consumption and very low area (910 × 215 µm2) in a commercial
180 nm technology. The main contribution of this work is the novel design of the clock interpolation
circuitry based on a resistive interpolation mesh circuit (patented), a two-dimensional regular
structure with very good properties in terms of power consumption, area and low process variability.
Keywords: Digital electronic circuits; Instrumentation and methods for time-of-flight (TOF)
spectroscopy; Timing detectors; VLSI circuits
1Corresponding author.
c© 2016 IOP Publishing Ltd and Sissa Medialab srl doi:10.1088/1748-0221/11/12/C12047
2016 JINST 11 C12047
Contents
1 Introduction 1
2 Matrix TDC design overview 2
2.1 Building blocks 2
2.2 RIMC 3
2.3 PLL 4
2.4 Front-End Readout 4
2.5 Back-End Readout 4
2.6 Serializer 5
2.7 SPI slave 5
3 Chip measurements 5
3.1 Code density test 6
3.2 Linearity 6
3.3 Jitter 7
3.4 Power consumption 8
4 Conclusions 8
1 Introduction
Time-of-Flight (ToF) measurement is one of the major challenges in high energy physics exper-
iments, medical imaging and even in Laser Imaging Detection and Ranging (LiDAR). Precise
timing measurements allow to compute the distance that a particle traveled and thus identify tracks,
perform coincidence measurements or determine distance to objects.
Our research group has been working for years on fast-timing ASIC designs for Positron Emis-
sion Tomography (PET) applications [1]. FlexToTv2 ASIC provides very good timing performance:
100-ps Single Photon Time Resolution (SPTR) and 128-ps full width at half maximum (FWHM) of
Coincidence Time Resolution (CTR). The outputs of this chip are discrete in amplitude but contin-
uous in time, so that an external equipment is required to perform fine timing measurements. Our
mid-term goal is to integrate a Time-to-Digital Converter (TDC) into a System-on-Chip (SoC) to
provide timing measurements in the digital domain with low power consumption (10mW per chan-
nel). Moreover, the timing resolution of the TDC should be good enough to avoid degradation in
timing performance of the analog readout; in other words, the Least Significant Bit (LSB) should be
in the range of 10 to 20 ps. Thus, the combination of jitter and non-linearity should not exceed 1LSB.
The TDC architecture is heavily constrained by the target LSB and the available manufacturing
technology, but power consumption also plays an important role in the choice of the architecture.
Several works [2, 3] exhibit very high performance in terms of timing resolution (< 10 ps) but in all
– 1 –
2016 JINST 11 C12047
FRONT-END 
READOUT
RIMC
(VCO)
COARSE COUNTER
CP
/M
clk200
PFDCLK_REF
clk800
···
TIME
clk800
SERIALIZER
BACK-END 
READOUT
VCTL
clk800
clk200
TIME_DIG
COARSE_OVF
Fine data
Coarse data
!GRSTSPI BUS
SPI SLAVE
PLL
clk200
Figure 1. MATRIX block diagram.
these cases power consumption would not fulfill our application requirements when implemented
in 180 nm technology.
In this work, we present a 4-channel TDC ASIC prototype called Multichannel Architecture
Tdc with Resistive Interpolation matriX (MATRIX). The main contribution of this work is the
Resistive Interpolation Mesh Circuit (RIMC), which features 15-ps LSB (typical LSB of 9.3 ps
after calibration), 10-ps jitter, < 4-ps time resolution, 1280 ns dynamic range, dead time < 20 ns,
up to 10MHz of sustained input rate per channel, 45mW of power consumption and very low area
(910 × 215 µm2) in a commercial 180 nm technology.
This paper is organized as follows: in section 2 the building blocks of the chip are described,
section 3 shows the preliminary chip measurement results, and finally in section 4 the conclusions
are drawn.
2 Matrix TDC design overview
2.1 Building blocks
MATRIX is a multilevel approach TDC which consist of a fine clock interpolator and a coarse
counter block (see figure 1). Sub-clock resolution is achieved by producing copies of an 800MHz
internal clock spaced in steps of 15 picoseconds. The synthesis of these clocks is performed by
– 2 –
2016 JINST 11 C12047
the RIMC. The clock signals are captured by the front-end readout block on the rising edge of the
input TIME signal, and thus sub-clock phase is measured. The coarse counter block is in charge of
counting entire clock periods (binary natural) to extend TDC dynamic range to 1280 ns. The back-
end readout acquires synchronized data coming from both TDC levels, coarse and fine interpolator,
builds the event and stores them into a 4-Word FIFO. Finally, the serializer block transmits data at
200Mbps in a serial protocol.
2.2 RIMC
The circuit shown in figure 2 is a novel clock synthesizer is composed of an array of ring oscillators
coupled by means of resistors, and thus providing 56 clock phases of the 800MHz reference clock.
These phases are organized in 7 rows by 8 columns of Delay Elements (DEs). Note that oscillation
is achieved by inserting an odd number of rows and connecting the outputs of the lasts DEs to the
inputs of the firsts DEs. The DE (see figure 2.b) contains a current starved inverter which fixes the
row width to 89 ps (1/14 of the clock period) with the Phase-Locked Loop (PLL) control voltage
(VCTL), while the resistor introduces a 15-ps sub-gate delay between adjacent columns (from left
to right). The typical end-to-end delay between the first and the last column nodes for a given row
is 120 ps since the number of columns is 8 (the first column in the left is used as dummy).
R0<0> R0<1> R0<2> R0<3> R0<4> R0<5> R0<6> R0<7>DIN0
R0<0> R0<1> R0<2> R0<3> R0<4> R0<5> R0<6> R0<7>DIN0
ROW0
ROW1
ROW2
ROW3
ROW4
ROW5
ROW6
IN COUPLE
OUT
VCTL
OUTIN
VCTL
(a) (c)
(b)
Figure 2. (a) RIMC schematic. (b) DE schematic. (c) Starved inverter schematic.
Figure 3 shows a chronogram of the nominal clock phases (printed per rows) from all the
nodes in the RIMC. It can be seen that clock edges are alternated between adjacent rows since a
starved inverter is used instead of a buffer. This fact has many benefits in terms of area and power
consumption since the number of transistors per buffer is three (two for the inverter and one for
biasing). Phase information of the clocks is used by the readout system to halve the number of
DEs. Thus, only half clock period has to be covered by sub-gate delays, since the edge of the
clock where transition occurs will determine if the conversion belongs to the first or to the second
– 3 –
2016 JINST 11 C12047
0.000
0.200
0.400
0.600
0.800
1.000
No
rm
ali
zed
 TD
C c
ou
nts
TDC sub‐delay normalized transfer function
ROW 6
ROW 5
ROW 1
ROW 3
ROW 4
ROW 2
ROW 0
TCycle / 2
Figure 3. Chronogram of the RIMC nodes sorted by rows.
half of the fine TDC counter. Observe that there is a 35% overlap between transitions in adjacent
rows since, as already stated, the end-to-end delay from the first to the last column is 120 ps while
row-to-row delay is 89 ps. This overlapping is used to accommodate either local or global process
and temperature variations which, according to simulations, never exceeds this 35%.
2.3 PLL
The internal clock is generated on-chip by the novel circuit aforedescribed, which also outputs an
800MHz clock sample. The PLL divides this clock sample by M , where M can be adjusted to 4, 8
or 16, and then the divided clock is compared with an external reference clock (50, 100 or 200MHz)
that controls a Charge Pump which, in turn, generates the control voltage of the synthesizer (VCTL).
2.4 Front-End Readout
This TDC has four channels. An event occurs when a rising edge is produced in any of the
TIME<3:0> inputs. This rising edge triggers the Time Capture Registers (TCRs), which are the
first stages of the front-end readout (see a row example in figure 4). The group of 7 TCRs (one per
row) is called Time Capture Matrix (TCM). This first stage of the readout captures the logic level of
the 7 by 8 clock matrix coming from the RIMC using D-type Flip-Flops (FFs). These full custom
FFs are optimized to reduce mismatch variability issues while not increasing power consumption
substantially. After 1 ns TCM captures the clock phases, data is stored into the second FFs as
long as no events are processed by the Back-End Readout block (!BUSY disabled). Once data is
in the storage Flip-Flops, combinational logic computes the row and column where the transition
occurred, and the edge (either 0/1 or 1/0, see figure 4 top right). Note that two consecutive rows
may detect the same transition due to the 35% overlapping between contiguous rows as shown in
figure 3.
2.5 Back-End Readout
The Front-End block computes fine interpolation value when a new event occurs. Moreover, it
stores them into the 4-word FIFO (per channel) allowing peak event rates up to 50MHz (20-ns dead
– 4 –
2016 JINST 11 C12047
D Q
x8
TIME
R0<7:0>
T0<7:0>
T0<7:1>
T0<6:0>
x7
XOR0<6:0>
7:3 COL0<2:0>
D Q
x8
TIMEDELAYED
E
!BUSY
HIT<0>
D Q
x16
TIME
R6<15:0>
T6<15:0>
T6<15:1>
T6<14:0>
x15
XOR6<14:0>
15:4 COL6<3:0>
D Q
x16
TIMEDELAYED
E
!BUSY
HIT<6>
CLK_PHASE<0>
T0<0>
IS_EVEN=1
CLK_PHASE<6>
T6<0>
IS_EVEN=0
TCR
Figure 4. Schematic of Row #0 of the Front-End Readout block.
time), and it synchronizes data coming from both fine and coarse counters. Synchronization works
as follows: the coarse counter block generates an auxiliary signal which indicates if the captured
counter has recently changed. In those cases where the coarse counter reading has recently changed
and the fine interpolator has still not overflowed, fine counter is corrected by subtracting 1 LSB.
The fine data encoding algorithm works as follows: the node identifier is computed as 8 ∗
NRow + NCol from all those rows which have detected a transition (two at most). This stage is
a 6-bit value where the 3 MSBs corresponds to the row number and the 3 LSBs to the column
number. If the event is detected by two consecutive rows both node identifiers are averaged, and
thus improving resolution with the arithmetic expansion (soft bins). Otherwise, resolution does not
improve. Finally, in case that the captured clock phase is high, fine counter increases by 110 LSBs
(half full scale). In summary, it is important to highlight that the average LSB is smaller than the
typical LSB (15 ps) thanks to the row overlapping that produces soft bins.
2.6 Serializer
Once events are built and buffered, each TDC channel has its own transmitter which serializes data
at 200Mbps. The frame width is 18 bits, 10 for the coarse counter and 8 for the fine counter. Thus,
the maximum sustained event rate allowed by this block is 10MHz per channel.
Serialized data can be sent in both modes, single-ended and Low-Voltage Differential Signaling
(LVDS). The LVDS driver allows four differential voltage modes to optimize power consumption.
In those cases where the distance between the chip outputs and the external receiver are short and
power consumption is critical, the LVDS transmitter can be switched off, and thus transmitting in
single-ended mode.
2.7 SPI slave
This block allows the configuration of the chip via software. One of the main functionalities of the
SPI control is to change the power consumption profile of the chip by disabling the unused timing
IOs, or modifying the current output of the LVDS drivers. Also, the SPI Slave permits to modify the
PLL multiplier selector and provides PLL debugging functionalities (VCTL reset and monitoring).
3 Chip measurements
This section shows the preliminary linearity, jitter and power consumption measurements of the
firsts MATRIX chip version. The calibration of the ASIC is required in order to achieve the best
performance since time bins in the interface between two consecutive rowswill havemore resolution
than the ones coming from a single row, as explained in section 2.5.
– 5 –
2016 JINST 11 C12047
3.1 Code density test
Calibration is performed by means of density code test. This test consists in producing M random
pulse shots (100K in the current test) following a uniform distribution at the TDC input channels.
The binary code corresponding to wider TDC bins will appear more often than the narrower ones.
Figure 5 shows the density code test results for one of the MATRIX TDC channels.
TDC channel calibration may vary dynamically due to the thermal dependence of interpolation
resistors. These variations introduce up to ±7% for the full temperature operating range (−20◦ C to
+80◦ C), which corresponds to ±1 ps variation.
It can be seen that most of the fine counter bins (97.1%) are in the range between 0 and 25 ps,
but eventually some outliers are observed in some TDC channels. Note that there are several bins
with no sensitivity (zero bin width). These bins correspond to those RIMC regions where row
overlapping never occurs, i.e. without node identifier averaging. The physical number of bins is 112
(7 rows ×8 columns × two clock edge types), corresponding to the 35% overlapping described in
section 4, despite the required 15-ps bins to cover the clock period (1250 ps) without row overlapping
is 83 (1250/15). It is important to highlight that the typical number of sensitive bins (width > 0 ps)
achieved by the averaging algorithm is 134 (see figure 5), which corresponds to a typical bin width
(LSB size) of 9.34 ps, with a standard deviation of 7 ps (before calibration).
0
5
10
15
20
25
30
35
0 20 40 60 80 100 120 140 160 180 200 220
Bi
n 
W
id
th
 (p
s)
Bin Number (TDC fine LSB count)
Figure 5. Density code test results for one MATRIX TDC channel.
3.2 Linearity
Linearity is measured by injecting N pulse shots (40K in the current test) synchronized with the
external reference clock and by sweeping the relative phase between 0 and 1250 ps in steps of 5 ps.
Linearity is then measured by averaging the N calibrated measurements for each step and then by
subtracting the step size of 5 ps. Further details of the testbench setup can be found in figure 6.
Figure 7 shows the Differential Non-Linearity (DNL) and Integral Non-Linearity (INL) mea-
surements of a single MATRIX TDC channel (after calibration). The measured DNL in the 12
available MATRIX prototype samples (48 channels) is always between ±4.7 ps, with an RMS lower
than 1.1 ps. The INL error is always between ±10.2 ps and the RMS is lower than 3.7 ps. Hence,
– 6 –
2016 JINST 11 C12047
PPG 81110A
MATRIX PCB
5.0V 0.4A
E3631A
TIMEi
CLK REF
0ps   5ps  ... 1245ps
013   015  ...   011
014   016  ...   010
011   014  ...   012
···
014   017  ...   011
013   014  ...   011
012   015  ...   009
N
 A
C
Q
250 steps
Figure 6. Setup for linearity and jitter measurements.
-2.5
-1.5
-0.5
0.5
1.5
2.5
D
N
L 
(p
s)
-6
-4
-2
0
2
4
6
8
10
0 200 400 600 800 1000 1200
IN
L 
(p
s)
Phase (ps)
Figure 7. DNL (top) and INL (bottom) measurements in picoseconds for a given MATRIX channel sample
after calibration.
the total precision due to non-linearities of TDCs is estimated by
√
σ2DNL + σ
2
INL, which is less than
4 ps for MATRIX TDC.
3.3 Jitter
Jitter is measured by computing the standard deviation in each of the 5-ps steps done during the
linearity test, and obtaining the typical value from these 250 steps. Table 1 shows thatMATRIXTDC
jitter increases with the
√
PLLM, where PLLM is the PLL multiplication factor. This indicates that
the jitter produced by the PLL is the dominant contribution. The PLL natural frequency (bandwidth)
decreases with large multiplication factors (square root dependence), which in turn affects to the
phase error, settling time and jitter [4]. Thus, jitter can be improved in further MATRIX versions
by increasing the bandwidth of the internal PLL.
– 7 –
2016 JINST 11 C12047
Table 1. TDC jitter measurements for different PLL multiplier configurations.
PLL M TDC Jitter (ps)Uncalibrated Calibrated
4 9.7 9.3
8 13.4 12.9
16 21.2 20.6
3.4 Power consumption
MATRIX has several configuration modes which enables to optimize power consumption. When
the chip is in standby mode, the power consumption is 0.76mW when the reference input clock
is disabled, and 30.1mW when enabled. Note that most of the power consumption is due to the
continuous oscillation of the RIMC and the continuous switching of the D-input in the FFs of the
TCM block. When the chip is fully operative, the power consumption is in the range between
45.2mW (LVDS differential mode is 90 mV) and 67.7mW (LVDS differential mode is 600 mV).
Data transmission in low power mode has been tested successfully in our testbench setup.
Table 2 shows a comparison between several proposals with similar performances toMATRIX.
Although it is difficult to compare different technology nodes, it can be seen that the power
consumption per channel is clearly lower than any other proposal, even when compared with
smaller technology nodes.
Table 2. A comparison between state of the art proposals and MATRIX.
Proposal Technology Bin Size Linearity1 Power/channel
S. Russo et al. (2011) [5] 180 nm 41 ps 35 ps 25mW
L. Perktold et al. (2014) [2] 130 nm 5 ps 3 ps 43mW
P. Keranen et al. (2015) [3] 350 nm 0.6 ps 5 ps 80mW
MATRIX (2016) 180 nm 9.3 ps 4 ps 11.3mW
1Linearity estimated by
√
σ2DNL + σ
2
INL.
4 Conclusions
A novel TDC concept has been designed, prototyped and tested. The most attractive feature of
MATRIX is the required power consumption to achieve an average time bin of 9.3 ps. The power
consumption (11.3mW) of this new 2-D resistive interpolation TDC generation makes it suitable
for those applications with hard power consumption constraints. The linearity error is very low
(4 ps) thanks to the low variability that RIMC offers. This linearity could be improved by using
Low Vth transistors in the TCM. However, one of the weak points of this first chip version is jitter,
which varies from 9.3 ps to 20.6 ps depending on the frequency of the input reference clock.
A second MATRIX version with improved jitter has been already submitted and it will be
available in early 2017. In a near future we expect to design a System-on-Chip (SoC) called High
Resolution FlexToT (HR-FlexToT), which will integrate a new FlexToT [1] version with improved
energy and timing resolution and MATRIX TDC.
– 8 –
2016 JINST 11 C12047
References
[1] A. Comerma et al., FlexToT-Current mode ASIC for readout of common cathode SiPM arrays, IEEE
Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2013, pp. 1–2.
[2] L. Perktold and J. Christiansen, A multichannel time-to-digital converter ASIC with better than 3 ps
RMS time resolution, 2014 JINST 9 C01060.
[3] P. Keränen and J. Kostamovaara, A Wide Range, 4.2 ps(rms) Precision CMOS TDC With Cyclic
Interpolators Based on Switched-Frequency Ring Oscillators, IEEE Trans. Circuits Syst. I, Reg.
Papers1 62 (2015) 2795.
[4] D. Fischette ,Practical Phase-Locked Loop Design, talk given at International Solid-State Circuits
Conference, 2004.
[5] S. Russo et al., A 41 ps ASIC time-to-digital converter for physics experiments, Nucl. Instrum. Meth. A
659 (2011) 422.
– 9 –
