Design and Performance of the CMS Pixel Detector Readout Chip by Kaestli, H. Chr. et al.
ar
X
iv
:p
hy
sic
s/0
51
11
66
v2
  [
ph
ys
ics
.in
s-d
et]
  2
4 F
eb
 20
06
Design and Performance of the CMSPixel Detector Readout Chip
H.Chr. Ka¨stli a,∗, M. Barbero b,a, W. Erdmann a, Ch. Ho¨rmann d,a, R. Horisberger a,
D. Kotlinski a, B. Meier c
aPaul Scherrer Institut, 5232 Villigen PSI, Switzerland
bInstitut fu¨r Physik der Universita¨t Basel, 4056 Basel, Switzerland
cInstitut fu¨r Teilchenphysik, ETH Zu¨rich, 8093 Zu¨rich, Switzerland
dPhysik-Institut der Univerita¨t Zu¨rich, 8057 Zu¨rich, Switzerland
Abstract
The readout chip for the CMS pixel detector has to deal with an enormous data rate. On-chip zero suppression is
inevitable and hit data must be buffered locally during the latency of the first level trigger. Dead-time must be
kept at a minimum. It is dominated by contributions coming from the readout. To keep it low an analog readout
scheme has been adopted where pixel addresses are analog coded.
We present the architecture of the final CMS pixel detector readout chip with special emphasis on the analog
readout chain. Measurements of its performance are discussed.
Key words: CMS, Pixel Detector, Readout Chip
PACS: 29.40.Wk, 29.40.Gx, 85.40.-e
1. Introduction
CMS is a general purpose detector at the Large
Hadron Collider (LHC) at CERN. Its innermost
tracking device is the pixel detector, which consists
of three barrel layers and two endcap disks both up
and downstream[1]. Its basic building blocks are
highly segmented silicon sensors with correspond-
ing readout chips (ROCs). Each sensor segment
(pixel cell) is connected to a charge sensitive am-
plifier in the corresponding pixel unit cell (PUC)
of the ROC via an indium bump of about 20µm
diameter. In the first barrel layer, the active area
∗ Corresponding author.
Email: hans-christian.kaestli@psi.ch
is 4.3cm from the interaction point. At this short
distance the density of produced particles will be
as high as 4 · 107sec−1cm−2 at the full LHC design
luminosity of L = 1034sec−1cm−2. Due to inclined
tracks and Lorentz drift of the produced signal
charge in the 4Tmagnetic field several pixels might
be hit by a single track forming hit clusters. In or-
der to improve position reconstruction for clusters
through signal interpolation the amount of charge
produced in each pixel is measured. The ROCs
have to record position and charge for all hit pixels
with a time resolution of 25ns, which is the time
between two LHC bunch crossings. These informa-
tion have to bo stored on-chip during the CMS first
level trigger latency of 3.2µs after which they are
either read out or discarded. The ROCs are read
Preprint submitted to Elsevier Science 28 August 2018
out serially via 40MHz analog links.
The first prototype of the readout chip has been
developed in the radiation hard DMILL process
[2,3]. In order to improve performance and yield
and to reduce costs, the design has been migrated
to a commercial process with much smaller feature
size[4]. The second iteration of the translated chip,
PSI46V2, is going to be the production version and
is described in this article.
In section two the architecture of PSI46V2 is de-
scribed. Section three focusses on the analog read-
out scheme and in section four measurements of
the performance are presented.
2. Chip architecture
The readout chip PSI46V2 has been fabricated
in a commercial 5-metal layer 0.25µm process
available through CERN. The design has been
made radiation tolerant by following special layout
rules as proposed in [5]. The chip integrates 1.3
million transistors in an area of 7.9mm× 9.8mm.
It can be divided into 3 functional blocks: a control
and supply block in the chip periphery, an array of
pixel unit cells organized in double columns, and
the double column peripheries which control read-
out and trigger validation within double columns.
The total number of pixels is 80×52 with a pixel
size of 100µm ×150µm.
2.1. Control and supply block
The chip periphery houses various control and
supply circuits. It contains
– A serial programming interface. This is an I2C-
like protocol, modified to run at a speed of 40
MHz. To accommodate this high speed the pos-
sibility to read back configuration data had to be
given up. Nevertheless, there is a limited read-
back possibility through the analog data stream
as will be shown below. Two low voltage differ-
ential signal (LVDS) pairs are needed for clock
and data lines.
– A fast signal decoder. First level triggers and
commands for reset and calibration signal injec-
tion are coded into a single LVDS signal. This
signal has to be decoded and distributed over
the chip.
– 21 8-bit digital-to-analog converters (DACs),
five 4-bit DACs and one 3-bit DAC to adjust off-
sets, gains, thresholds, supply voltages, timings,
etc.
– 2 control registers to set the trigger latency, read-
out speed (40 MHz/20MHz) and range for cali-
bration pulses and to enable/disable the ROC.
– A bandgap voltage reference and 6 voltage reg-
ulators. Three of them can be wire-bonded to
external filter capacitors. This leads to a good
immunity against power ripple and reduces chip-
to-chip cross talk. The ROC needs two external
power supplies, 1.5V for the analog section and
2.5V for digital part. It consumes a total of about
120 mW which corresponds to only 29µW per
pixel. The voltage regulators are programmable
and hence the voltages can be set for each chip
separately.
– An analog event generator. This is a circuit that
collects the pixel hit information from the double
columns and generates the output data stream
as described in section 3.
– An on-chip temperature sensor for monitoring
cooling performance.
– A cluster multiplicity counter. This is a fast trig-
ger signal that could be used by the CMS first
level trigger or for self-triggering in the lab when
no external trigger is available. Two thresholds
can be set for this mechanism: one sets the min-
imal number of hits within a double column be-
low which hits are considered as background and
are ignored, whereas the other one tunes the
number of hit double columns above which a
trigger signal is issued. Note, that even below
this threshold a signal proportional to the num-
ber of clusters is available.
2.2. Pixel unit cell
A sketch of the PUC is shown in figure 1. It
can be divided into an analog part (top) and the
digital logic (bottom). The thick lines on the right
are bus lines running along the double column.
The signal from the sensor enters a two stage
charge sensitive pre-amplifier/shaper system. Al-
2
shaperpreamp
-
+
+
comparator
global threshold
4 bit DAC
leakage current
compensation   
bump
pad
delay
sample/hold
hit
Address
9
co
lu
m
n 
O
R
a
n
a
lo
g 
pu
lse
 h
ei
gh
t
pi
xe
l a
dd
re
ss
co
lu
m
n
to
ke
n 
Column token & 
readout control
Hit FF
Q
Q
_
T1 T2 T3 T4 M
mask pixel
4
calibrate
ca
lib
ra
te
 se
le
ct
pr
og
ra
m
 / 
en
ab
le
latch
trim bits
mask bit 
V+
V+Va+
se
n
so
r 
ca
lib
ra
te
 se
le
ct
top
metal 
Fig. 1. Schematic view of a pixel unit cell
ternatively, calibration signals can be injected
through either a 4.8fF injection capacitor con-
nected directly to the amplifier input node, or
via the sensor through the air gap between a top
metal plate in the ROC and the sensor. The for-
mer is used for trimming the comparator threshold
while the later can be used to test the bump bond
connections[6]. A globally programmable current
source at the input node compensates for sensor
leakage current.
Zero suppression is performed with a compara-
tor. A global threshold can be programmed for
the whole chip. In order to compensate for local
transistor mismatches each pixel has a 4-bit DAC
to trim the threshold. Furthermore, a mask bit
allows to disable noisy pixels. Once the compara-
tor is above threshold the shaper output signal is
stored in a sample-and-hold circuit (see below).
The double column periphery is notified immedi-
ately through a fast hard-wired column OR. The
pixel becomes insensitive and waits for a column
readout token. When it arrives the analog signal is
sent to the periphery together with the pixel row
address. The token is then passed on and the pixel
resumes data taking. Thus, dead-time is short but
depends on the hit rate. The time needed for a
column scan to finish is ≤ 50ns + (50ns× number
of hits).
The shaper output signal is sampled after a fixed
time delay which is controlled by a global reg-
ister (i.e. once per ROC). The signal within the
sample-and-hold circuit follows the rising edge of
the shaper output quite fast while it decays much
slower. This makes the peak of the signal suffi-
ciently flat and the circuit is not very sensitive
to this time delay. Due to timewalk and device
mismatch the exact peaking time depends some-
what on the pulse height and varies a bit from
pixel to pixel. The relative error in the measured
pulse height due to a non perfect sampling point
has been measured and is shown in figure 2. If
the delay is optimised for low signal charges (3500
3
Relative sampling error [%]
0 0.5 1 1.5 2 2.5 3 3.5 4
# 
pi
xe
ls
0
200
400
600
800
1000
1200
7k electrons
25k electrons
Fig. 2. Relative error of peak detection across a full chip
for 2 different pulse heights
electrons in figure 2), the relative error for large
signals is only a few per cent.
Downloading the full detector configuration at run
start will take . 0.5 minute [7]. If needed, pixels
can be reprogrammed during LHC machine orbits
reserved for sub-detector calibration, however only
at a limited rate. Thus single event upset (SEU) in
the trim and mask bit storage cells has to be con-
sidered. A simple trick has been adopted to protect
storage cells from SEU. This is illustrated in fig-
ure 3. If node (a) collects a large enough amount of
charge, produced by a nuclear reaction, the stored
logic level can be changed. The presence of the ca-
pacitor aided by Miller effect increases the critical
charge and thus reduces the probability of an up-
set.
The SEU cross section for both protected and
unprotected storage cells has been measured at
PSI for 300 MeV/c pions. The result is σ =
2.6 · 10−16cm2 per storage cell for protected and
σ = 2.4 · 10−14cm2 for unprotected cells. This
is an improvement of 2 orders of magnitude. In
(a)(b)
Fig. 3. SEU protected storage cell
PSI46V2 all storage cells in the PUCs are pro-
tected (4 trim bits and one mask bit). In case of
an SEU the flipped bits have to be reprogrammed
through control links. Each link serves several
modules/pannels. In the worst case one link con-
trols 4 modules plus 4 half-modules in each of
the two innermost barrel layers, i.e. a total of 192
ROCs or 96 ROCs per layer. At high luminosity
LHC operation the expected SEU rate for this
control link is < 3 · 10−2Hz. This means that in
about 8 hours 0.1% of the pixels changed their
threshold. The occupancy of each pixel can be
monitored online and pixels that show significant
changes will be reprogrammed.
2.3. Double column periphery
The double column periphery controls the trans-
fer of hit information from the pixels to the storage
buffers (column drain mechanism) and performs
trigger verification. A schematic view of the logic
is shown in figure 4.
When the asynchronous fast column OR signal ar-
rives at the periphery three actions are performed:
1. The present value of a bunch crossing counter
(WBC) is latched into one of 12 time stamp
buffer cells. This is needed later on for trigger
validation.
2. The hits for this bunch crossing are acknowl-
edged by the periphery. This tells the pix-
els to associate any later hits with another
column drain and its corresponding bunch
crossing. There are one active and up to two
pending column drains allowed. Hits of any
further bunch crossing are lost.
3. A column drain is initiated by sending out a
readout token to the first pixel.
A pixel without a hit belonging to the active col-
umn drain just bypasses the token. This has been
measured to happen at a rate of about 3.3 GHz,
i.e. the token arrives at the first hit pixel within
less than 50ns or 2 clock cycles. Hit information
is transferred in parallel into the data buffer, one
pixel per two clock cycles. The data buffer has a
depth of 32 units. Each unit consists of a marker bit
one analog and nine digital storage cells for pulse
4
12
 ti
m
e 
sta
m
ps
32
 d
at
a 
bu
ffe
rs
Write BC 8
8
B
A=B
Search BC
8 A
Trigger
clear
Ti
m
es
ta
m
p 
bu
ffe
r 
w
/r 
co
nt
ro
l
Co
lu
m
n 
O
R
D
at
a 
bu
ffe
r 
w
/r 
co
nt
ro
l
Co
lu
m
n 
to
ke
n
To
ke
n 
re
tu
rn
A
ck
no
w
le
dg
e
Column 
drain 
control
D
at
a 
va
lid
Pi
xe
l a
dd
re
ss
Pu
lse
 h
ei
gh
t
readout 
control
set r/o mode
Readout 
token in
Readout 
token out
clock
Fig. 4. Schematic view of the double column periphery
height and row address respectively. The marker
bit indicates the beginning of a new event and thus
is used to synchronize entries in the time stamp
and the data buffers.
The oldest entry in the time stamp buffer is per-
manently compared to a second bunch crossing
counter (SBC) which is delayed with respect to
the WBC by a programmable amount which cor-
responds to the first level trigger latency. In case of
agreement the time stamp is deleted and the pres-
ence of the CMS first level trigger signal is checked.
Either the corresponding data is discarded or the
column is set into readout mode. In this mode, the
double column stops data acquisition in order to
prevent overwriting of valid data. It waits for the
readout token to arrive and sends data to the chip
periphery. Afterwards the double column resets it-
self and hence cannot have further valid hits for the
duration of the trigger latency. Data loss also oc-
curs when one of the buffers is completely filled up.
In case of a full data buffer a double column gets
reset. If the time stamp buffer is full, data acquisi-
tion is paused until the next buffer cell is freed.
3. Readout scheme and chip readout format
A couple of design inherent data loss mecha-
nisms have been mentioned so far. They are sum-
marized in table 3 for barrel modules, a luminosity
of 1034·sec−1cm−2 and 100 kHz first level trigger
5
rate. A large contribution, labeled ’readout’, comes
from the fact that a double column is insensitive
between the time it validates a trigger and the time
it finishes the readout. This stresses the impor-
tance of having short readout times. Therefore an
analog readout scheme running at 40 MHz (alter-
natively it can be switched to 20 MHz) has been
adopted where the pixel address is coded into 6 dis-
crete levels. Pixel address and signal pulse height
can be transferred in 6 clock cycles.
In the endcap part of the detector several sen-
sor tiles of different size are organized in so called
blades which contain 24ROCs on the front side and
21 ROCs on the back side. In the barrel, sensors
connected to 16 (8) ROCs are called modules (half
modules). The ROCs of each side of a blade or of a
(half) module are read out in a daisy chained way.
A token bit, controlled by a special token bit man-
ager chip (TBM,[8]), is sent to the first ROC. The
ROC sends its data to the TBM and passes the to-
ken bit to the next ROC, until the token bit gets
back to the TBM. The TBM amplifies the signals
from the ROCs, adds a header and a trailer and
drives the signals out to the detector supply tube.
The readout of a single pixel hit is shown in figure
5. It starts with a header of 3 clock cycles. A large
negative signal level well outside of the range of
pixel data (ultra-black) followed by a zero differen-
tial level (black) separates the individual ROCs in
the blade/module data stream. No other identifi-
cation is sent. In order to get the ROC ID the DAQ
Radius [cm] 4 7 11
pixel busy 0.21% 0.078% 0.044%
column drain busy 0.25% 0.020% 0.004%
time stamp buffer overflow 0.17% 0.001% 0
data buffer overflow 0.17% 0.081% 0.065%
1-buffer 1.0% 0.40% 0.26%
readout 1.0% 0.20% 0.16%
double column reset 1.0% 0.44% 0.26%
total 3.8% 1.2% 0.79%
Table 1
Summary of simulated data loss for barrel modules at a
luminosity of 1034sec−1cm−2 and 100kHz trigger rate. The
upper part shows column drain losses, in the lower part
readout related data losses are summarized
Fig. 5. Readout sequence of a ROC with one pixel hit
Fig. 6. Readout signal output of a barrel module for two
different address sequences
unit has to count the chip headers. In the third
clock cycle of the chip header a signal inversely pro-
portional to the value of the last addressed DAC
is sent. This is the only way information about the
configuration of the ROCs can be read back. It fol-
lows the double column address (2 cycles), the row
address (3 cycles) and the analog pulse height for
each hit pixel. Figure 6 shows the signal output of
a barrel module as it arrives at the electro-optical
converter. Shown are two different address level se-
quences. As can be seen, the risetime of the signals
is short enough to allow for a tolerance of the sam-
pling point in the DAQ unit of ≈ 6ns. The mea-
sured sampling distribution for a ROC is shown in
figure 7. The address levels are clearly separated
and allow a reliable reconstruction of the pixel ad-
dresses.
6
Constant  7.4± 382.5 
Mean      4.9±  4001 
Sigma     3.1± 293.2 
Threshold [electrons]
2500 3000 3500 4000 4500 5000 5500 6000
# 
pi
xe
ls
0
100
200
300
400
500
(a)
Constant  28.1±  1513 
Mean      1.3±  2948 
Sigma     0.8±  79.7 
Threshold [electrons]
2500 3000 3500 4000 4500 5000 5500 6000
# 
pi
xe
ls
0
200
400
600
800
1000
1200
1400
(b)
Fig. 9. Threshold distribution of a ROC before (a) and after (b) trimming
Address levels [ADC counts]
-600 -400 -200 0 200 400 600
# 
en
tr
ie
s
10
210
310
410
510
u
ltr
a 
bl
ac
k
le
ve
l 5
le
ve
l 4
le
ve
l 3
le
ve
l 2
le
ve
l 1
bl
ac
k 
/
le
ve
l 0
Fig. 7. Histogram of the sampled address levels of a ROC.
The levels are cleanly separated
Injected charge [ke-]
0 10 20 30 40 50 60
D
iff
er
en
tia
l o
ut
pu
t (
ar
bit
ra
ry
 un
its
)
-400
-200
0
200
400
600
800
1000
Internal calibrate
X-ray
Fig. 8. Analog signal transmission.
4. Measurements
In this section lab measurements of the perfor-
mance of the ROC are shown. Figure 8 shows the
pulse height response to calibration signal injec-
tion. It is slightly non-linear at low charges and
saturates for high charges above about 2 MIPS.
Injection capacitances have been calibrated with
X-ray sources. Kα lines of 6 different isotopes be-
tween 8.04keV and 44.2keV have been used. These
measurements are also shown in figure 8.
It is important to have a threshold distribution as
uniform as possible. On one hand the threshold
must be set well above the amplifier noise in or-
der not to flood the system with random hits. On
the other hand the position resolution depends
almost linearly on the threshold (see [1]). The
width of a gaussion fit to the untrimmed thresh-
old distribution of a ROC is around 300 electrons
as shown in figure 9 (a). Adjusting the trim bits
is one of the most important steps in the module
characterization process [6]. After trimming, the
threshold dispersion can be improve considerably
to typically 80 electrons as shown in figure 9 (b).
The intrinsic noise of the preamp/shaper has been
measured to be . 190 electrons [6].
The threshold described so far is not directly rel-
evant. The time between a particle crossing the
sensor and the comparator going above threshold
depends on the created ionization charge. There-
fore it might happen that for low pulses the hit
7
Charge [ke-]
0 10 20 30 40 50 60 70 80
D
el
ay
 (n
s)
5
10
15
20
25
30
35
40
45
25
 n
s
3200 electrons
-Threshold 3000 e
-Threshold 2500 e
-Threshold 1500 e
Fig. 10. Timewalk for 3 different thresholds. The offsets of
the 3 curves are arbitrary for better readability
signal is delayed to the next bunch crossing and
hence is lost during trigger validation. This time-
walk can be measured by delaying the injected
calibration pulses with respect to the chip clock.
The result of such scans is shown in figure 10 for
three different thresholds. For an absolute thresh-
old of 2500 electrons it is shown that the effectively
usable threshold (in-time threshold) is at about
3200 electrons, i.e. the time difference between
a signal of 3200 electrons and very high charges
above 100k electrons is below 25ns.
Test structures containing the analog section of
the PUC have been irradiated with gamma rays
up to a total dose of 13.2 Mrad. This corresponds
to the absorbed dose in 2 years at high luminosity
LHC operation for the 7cm barrel layer or 1 year
for the innermost layer. Figure 11 shows the out-
puts of the shaper and the sample-and-hold circuit
in transparent mode. Curves (a) show the response
for unirradiated structures. After irradiation, the
analog signals become somewhat slower (b), but
can be brought back close to the original waveform
by reprogramming the feedback settings (c).
5. Summary
The final version of the CMS pixel readout chip
has been designed in a 0.25µm process. It is fully
functional and measurements of the performance
have been presented. The threshold can be tuned
Fig. 11. Output of sample-and-hold circuit (top) and shaper
output (bottom) before and after irradiation.
to values below 2500 electrons with a uniformity of
80 electrons. Due to timewalk the in-time threshold
is somewhat higher at 3200 electrons. The readout
uses an analog scheme with address levels coded
into 6 levels. This has been proven to work reliably
at 40 MHz as address decoding was 100% correct.
References
[1] The CMS collaboration, CMS Tracker Technical
Design Report, CERN/LHCC 98-6
[2] M. Dentan et al., IEEE Trans. Nucl. Sci. 43 (1996)
1763
[3] M. Barbero et al., Design and test of the CMS pixel
readout chip, Nucl. Instr. and Meth. A517 (2004) 349
[4] W. Erdmann, The 0.25µm front-end for the CMS pixel
detector, Nucl. Instr. and Meth. A 549 (2005) 153
[5] W. Snoeys et al., Layout techniques to enhance the
radiation tolerance of standard CMOS technologies
demonstrated on a pixel detector readout chip, Nucl
Instr. and Meth. A439 (2000) 349
[6] A. Starodumov et al, Qualification Procedures of the
CMS Pixel Barrel Modules, these proceedings
[7] D. Kotlinski, The Control and Readout System of the
CMS Pixel Barrel Detector, these proceedings
[8] E.Bartz, The 0.25µm Token Bit Manager Chip
for the CMS Pixel Readout, Proceedings of the
11th Workshop on Electronics for LHC and Future
Experiments, Heidelberg, Germany
8
