Long-term Running Experience with the Silicon Micro-strip Tracker at the
  D{\O} detector by Jung, Andreas W. et al.
ar
X
iv
:1
20
2.
59
96
v1
  [
ph
ys
ics
.in
s-d
et]
  2
7 F
eb
 20
12
Physics Procedia 00 (2018) 1–6
Physics
Procedia
www.elsevier.com/locate/procedia
TIPP 2011 - Technology and Instrumentation for Particle Physics 2011
Long-term Running Experience with the Silicon Micro-strip
Tracker at the DØ detector
Andreas W. Jung1,a,
M. Cherry, D. Edmunds, M. Johnson, M. Matulik, M. Utes, T. Zmuda and the SMT group
aFermilab, Batavia, IL, 60510, USA
Abstract
The Silicon Micro-strip Tracker (SMT) at the DØ experiment in the Fermilab Tevatron collider has been operating since
2001. In 2006, an additional layer, referred to as ’Layer 0’, was installed to improve impact parameter resolution and
compensate for detector degradation due to radiation damage to the original innermost SMT layer. The SMT detector
provides valuable tracking and vertexing information for the experiment. This contribution will highlight aspects of the
long term operation of the SMT, including the impact of the silicon readout test-stand. Due to the full integration of the
test-stand into the DØ trigger framework, this test-stand provides an advantageous tool for training of new experts and
studying subtle effects in the SMT while minimizing impact on the global data acquisition.
c© 2011 Published by Elsevier BV. Selection and/or peer-review under responsibility of the organizing committee for
TIPP 2011.
Keywords: Silicon, micro-strip, long-term operational experience
1. Introduction
The Run II DØ detector has been operating since 2001. It consists of two central tracking detectors
inside a 2 T solenoidal magnet; central and forward preshower systems; liquid argon calorimeters; and
muon spectrometers including a 1.8 T toroidal magnet. The Silicon Micro-strip Tracker (SMT) is part of
the central tracking system of the DØ detector and is the innermost layer of instrumentation [1]. Thus
radiation damage is a potential issue and needs to be monitored and addressed [2]. The SMT layout is
shown in Figure 1. It consists of six barrels each with four-layers. These barrels are interspersed with six
disks of small radius, so-called ’F-disks’. There are another six ’F-disks’ beyond the end of the barrels.
Two (originally four) large radius detectors, so-called ’H-disks’, are located at the ends of the detector to
enhance tracking at very large pseudo-rapidities |η| < 3. The barrels provide tracking for particles with
high transverse momentum in the central regions |η| < 1.5, while the disk detectors allow for the precise
reconstruction of particles traveling with pseudo-rapidity up to |η| < 3. A major SMT upgrade took place
in 2006 to install an innermost layer (Layer 0) [3]. This single-layer detector consists of eight barrels and
was installed to mitigate the degradation of the first layer of the original SMT due to radiation damage. One
1Primary author, email contact: ajung@fnal.gov.
2 / Physics Procedia 00 (2018) 1–6
Fig. 1. The upgraded SMT detector consists of 6 Barrels, 12 ’F-disks’ and 2 ’H-disks’. Barrels are interspersed with ’F-disk’.
Additional ’F-disks’ and ’H-disks’ are placed at both ends of the detector.
’H-disk’ was removed from each end of the detector in 2006 to accommodate the Layer 0 readout channels.
There are two different types of readout chips used for the SMT: SVX-IIe [4] for the original SMT and SVX4
readout chips [5] for Layer 0. The SVX-IIe readout chips are mounted on so-called HDIs (High Density
Interconnects), made from Kapton flex circuits laminated to Beryllium substrates. The silicon sensors are
glued to them and are referred to as ’module’ [1]. There are 432 such modules for the barrel, 288 for the
’F-disks’ and 96 for the ’H-disks’, for a total of 5712 SVX-IIe chips installed. For L0 the HDIs are ceramic
hybrids made of Beryllium oxide. Including Layer 0 there are a total of 730k readout channels providing
the largest data flow of all DØ sub-detectors.
2. Long-term Operational Experience
The SMT has generally operated 24 hours per day 7 days a week since 2001, which required dedicated
shift personnel and experts. In general, the operation is very stable and the response and recovery from
problems is usually quick. Shift personnel are supported by on-call experts. Furthermore senior experts
bo
ar
d
In
te
rfa
ce
 
HV/LV
SE
Q 
Co
ntr
oll
er
SE
Q
SE
Q
VME
Po
w
er
 P
C
V
RB
V
RB
SB
C
V
RB
C
MCH2
Horse Shoe Cathedral
Platform
MCH3
PDAQ (L3)SDAQ
1553 Monitoring
Optical Links (1 Gb/s)
(3M / 80 conductor)
~19’−30’ High Mass Cable
25’ High Mass Cable
(3M / 50 conductor)
8’ Low Mass Cable
CLKs CLKs
Serial Command Link
A
da
pt
er
ca
rd
Po
w
er
 P
C
15
53
Chip HDI
3/6/8/9
Sensor
Fig. 2. The hardware and readout chain of the original SMT from the sensor-HDI
level to the movable counting house level (MCH) via horse shoe, cathedral and
platform level.
are available for support in various
aspects. Over time many tools have
been developed to monitor low and
high level information such as volt-
ages of power supplies or on-line
cluster charge and size histograms
as well as on-line track efficiencies.
The hardware and readout chain of
the SMT as sketched in Figure 2
is distributed over several physical
locations. These locations are not
entirely accessible on a daily ba-
sis: the ’Horse shoe’ and ’Cathedral’
area only during longer shutdowns
whereas the ’Platform’ area can be
accessed between stores with agree-
ment of Tevatron operations.
A failure in the hardware and read-
out chain needs to be understood and
then traced down to its physical lo-
cation with the monitoring informa-
tion at hand. In order to do that it
is very important to have monitoring
capabilities for low and high level information. This includes the monitoring of voltages and current draws
of the power supplies (PS) of the detector. For example a failure of a power supply at the ’Platform’ area typ-
ically causes lower efficiency for approximately a couple of hours until the failure is addressed. On average
/ Physics Procedia 00 (2018) 1–6 3
all power supply failures including the ones in the cathedral area compromised about 2.5% of the collected
data. The electrical crew developed a robotic remote switch ’R2DØ’ to switch to a spare PS within minutes
for the ’Platform’ PS failures. All SMT power supplies at the ’Platform’ area have since been equipped with
’R2DØ’ units and resulting data quality losses are minimized.
3. SMT test-stand activities
The SMT test-stand provides a small ’copy’ of the full hardware and readout chain of the SMT. In
contrast to the SMT, all parts are easily accessible allowing for detailed studies of single components. Figure
Fig. 3. The picture shows part of the SMT test-stand setup: sen-
sor and a light emitting diode (lower left), PS (upper left), op-
tional signal delay generator with respect to the Tevatron clock
(right).
Fig. 4. The graph shows the fraction of enabled HDIs for Barrel,
’F-disk’ and ’H-disk’ sensors versus time. The shaded yellow
bands reflect the shutdown periods of the DØ experiment. For a
more detailed explanation of the steps in the fraction of enabled
HDIs see text.
3 shows parts of the test-stand setup for signal-to-noise studies with HV supply (top left), sensor with a light
emitting diode (bottom left) and optional signal delay generator with respect to the Tevatron clock (right).
In general spare boards are tested at the test-stand before they are installed. This is also true during the
longer shutdowns where problematic boards are replaced. During these shutdowns every effort is made
to improve the system stability and efficiency. For a complex system like the SMT it is difficult to cite a
single quantity characterizing global performance. A good measure is the fraction of enabled HDIs as a
function of time as shown in Figure 4. Prior to the 2009 shutdown there was a gradual decrease of this
fraction due to hardware issues during continued operation. There are many different types of failures and
the most common ones are individual chip failures as well as bad cable connections at various levels as
given in Table 1 (larger steps in the fraction of enabled HDIs are explained later in the text). In order to
Type of HDI failure 2009 / # HDI 2010 / # HDI
Adapter card 16 1
Clock cables 9 10
Interface Board 15 −
Reseating cables ≈ 6 2
Bad/dead (problem inside detector) 20 Not re-visited
Disabling bad chips 24 4
Total # HDI worked on 90 17
Table 1. Detailed list of HDI defects for the years 2009 − 2010.
trace such a failure to its underlying cause every failure is characterized and a record of previously tried
interventions is maintained. The shutdown periods are highlighted with shaded bands in Figure 4. They
4 / Physics Procedia 00 (2018) 1–6
allowed for the time-consuming task to investigate and fix these individual failures. The efforts resulted in
higher fractions of enabled HDIs after a shutdown. Another example of a sort of failure are broken wire-
bonds which interrupted the distribution of the digital power lines to the readout chips. As the readout chips
on an individual HDI are daisy-chained, a single chip failure caused the ’loss’ of all subsequent chips of
a module consisting of up-to 9 chips. The most prominent occasion occurred in late 2006 but it is likely
that this sort of failure also contributed to the losses of enabled HDIs prior to that incident. The test-stand
facilitated the development of a solution to this problem by using an alternative path to distribute the digital
power using a special hardware board (new adapter card). Thus the initial failing chip was bypassed and the
readout of the remaining chips on the module could be fully restored as implemented during the shutdown
in 2007, which increased the fraction of enabled HDIs by about 10%. Furthermore the test-stand facilitated
the development of an improved sequencer firmware version as well as a modified version of the adapter
card in order to fix a noise problem. Both have been installed during the shutdown in 2008 and increased the
fraction of enabled HDIs. An intensive and thorough investigation for all known sorts of failures took place
during the shutdown in 2009 and resulted in the largest fraction of enabled HDIs. The tireless efforts during
the past shutdowns allowed re-enabling of HDIs and led to an all-time high number of enabled HDIs.
The test-stand was also used for detailed firmware studies in order to improve signal-to-noise (S/N) for
the sensors controlled and readout by the SVX-IIe type of chips. The pedestal distribution for the ’old’
firmware and the ’new’ (improved) firmware is shown in Figure 5a) and b). By moving certain activities
a) b)
’Old Firmware’ ’New Firmware’
Fig. 5. Pedestal distribution for 6 chips with 128 channels per chip. First three chips are p-side and last three chips are n-side. The
pedestal distribution for the ’Old Firmware’ is shown on the left, whereas the one for the ’New Firmware’ is shown on the right.
on control lines to a different point in time a significant reduction of the noise level was achieved. The
biggest impact in terms of reducing the noise was achieved by moving control signals for ’PreAmp’-reset
and ’RampReference’-select further away from the start of digitization. For n-side type of sensors the noise
was reduced by approximately 20% whereas for the p-side type sensors noise level was stable. The noise
source is not coupling in the same way to all channels as it can be seen in Figure 5a). Our interpretation is
that the control signal pulse generates noise on the chip. The previously persistent second band structure is
now completely removed as it can be seen by comparing Figure 5a) and b). This firmware is now used for
the entire SMT.
The DØ data acquisition (DAQ) is a buffered system and consequently the dead-time or front-end busy rate
(FEB) is driven by the amount of data and the ability to process it. Figure 6 shows a simplified sketch of the
data flow in the DØ experiment. Data are processed by means of a multi-level trigger system (L1, L2, L3).
The red arrows indicate a busy signal at different levels if no free buffer is available.
Individual SMT crates showed a very peculiar FEB pattern: one would expect that the SMT crate leading
in FEB is given by highest data processing load as it takes more time to process more data. Instead the
FEB leading SMT crate seems to appear randomly as shown in Figure 7a). It shows FEB rates [%] of all
SMT crates (different colors) as a function of time with the two crates showing the peculiar FEB pattern
highlighted by the red circles. This happened on an apparently random basis but more frequently at higher
/ Physics Procedia 00 (2018) 1–6 5
L2 Buffer
L3 FarmData
L1 Busy L2 Busy
L1 Buffer
Fig. 6. Simplified sketch of the data flow from left to right. The red arrows indicate a busy signal at different levels if no free buffer is
available.
trigger rates. The buffer handling is organized by a VME read-out board controller (VRBC) [6] which
controls the VME read-out boards (VRB) [6], which in turn are gathering the data from the sequencer
level as sketched in Figure 2. The VRBC firmware was extended with monitoring capabilities for buffer
management. The two plots in Figure 7b) show the available buffers (blue), buffers waiting for L2 (green)
a)
# 
B
uf
fe
rs
# 
B
uf
fe
rs
0
20
40
60
b)
80
0
0
20
20
FE
B
 ra
te
s /
 c
ra
te
 [%
]
−20
−20
0
0
Time [minutes]
Time [minutes]
0
5
FE
B
 ra
te
s /
 c
ra
te
 [%
]
c)
−40 −20
10
15
0
Time [minutes]
−10−30−50
Fig. 7. a) shows FEB rates [%] of all SMT crates (different colors) as a function of time without the improved buffer handling firmware.
The two plots in b) show the available buffers (blue), buffers waiting for L2 (green) or L3 decisions (red) as a function of time. As an
example the buffer distribution is shown for the two SMT crates which exhibit an increased FEB rate (top plot, crate 0x65 & 0x67) as
highlighted by the red ellipses and arrows. c) shows the FEB rates [%] of various detector subsystem crates grouped by colors (SMT
crates are colored in red). In addition the global DØ L1 busy rate (green) consisting of all L1 subsystems is shown. More details in the
text.
or L3 decisions (red) as a function of time. The buffer distribution is shown for the two SMT crates which
exhibit the increased FEB rate (SMT crates 0x65 and 0x67) as highlighted by the red ellipses and arrows.
A good correlation between the number of available buffers and the FEB was seen. In general there are less
available buffers at higher trigger rates. The red circles connected by arrows in Figure 7a)-b) highlight the
peculiar FEB pattern shown by two different SMT crates. This effect was due to the sudden reduction of
available buffers (blue) causing increased dead-times for the affected SMT crates. Figure 7c) shows the FEB
rates [%] of various detector subsystem crates (SMT crates are colored in red). In addition the global DØ L1
busy rate consisting of all L1 subsystems is plotted (green). The yellow ellipses highlight an increase of the
global L1 busy rate caused by raised FEB rates of particular SMT crates. This illustrates how the sudden
reduction of available buffers in SMT crates affected the global L1 busy rate. At higher trigger rates (around
−50 minutes) the effect is not large. There is an increase of the global L1 busy rate by approximately 2%
6 / Physics Procedia 00 (2018) 1–6
at the same time as the jump in FEB for a SMT crate: from 10% to approximately 12%. At lower trigger
rates (around −5 minutes) the effect is smaller and the global L1 busy rate increases only by about 0.4%.
The latter can be understood as the reduced number of buffers has largest impact at high data taking rates.
a)
0
5
10
b)
20
0# 
B
uf
fe
rs
# 
B
uf
fe
rs
FE
B
 ra
te
s /
 c
ra
te
 [%
]
0
20
−60 −20
−60 −20
−40
Time [minutes]0
−40 0
Time [minutes]
Fig. 8. a) shows the FEB rates [%] of all SMT crates (different colors) as a function of time
with the improved buffer handling firmware. The two plots in b) show again available buffers
(blue), buffers waiting for L2 (green) or L3 decisions (red) as an example for two different
SMT crates. More details in the text.
The SMT test-stand al-
lowed tests at high rates
of new versions of the
VRBC firmware handling
buffer management. A
more robust VRBC firmware
version was developed and
it did not show this prob-
lem anymore. Monitor-
ing data for this modi-
fied VRBC firmware ver-
sion are shown in Fig-
ure 8a)-b). a) shows the
FEB rates [%] of all SMT
crates (different colors) as
a function of time with
the improved buffer han-
dling firmware. There are
no SMT crates showing a
significantly higher FEB
rate. The increased FEB
rates visible at the end of
the distribution was due to
a change in prescale set-
tings, which increased the
event rates. Figure 8b) shows the available buffers (blue), buffers waiting for L2 (green) or L3 decisions (red)
as a function of time for two different SMT crates. There are no sudden drops in the number of available
buffers anymore.
4. Conclusions
The SMT has been operated since 2001. Its performance and efficiency have been enhanced using new
tools such as the ’R2DØ’ units. The SMT test-stand is a unique piece of equipment to train new experts
as well as to reproduce and understand subtle effects in the SMT while minimizing impact on global data
taking. Three examples have been presented: HDI recovery effort, optimization of signal-to-noise and the
buffer management problem. In each case the results from the test-stand led to improved performance for
the entire SMT system. The training of new experts at the test-stand allowed for new insights into the oper-
ation of the SMT, which in turn increased the stability and performance of the SMT.
The SMT detector is performing very well, providing good tracking and vertexing capabilities for the DØ ex-
periment, which is vital for high efficiency b-tagging and electron/photon identification.
References
[1] S.N. Ahmed et al, The DØ Silicon Microstrip Tracker, NIM A 634 8, [arXiv:1005.0801], 2011.
[2] Z.Ye, TIPP2011 talk, Radiation Damage to DØ Silicon Microstrip Detector, 2011.
[3] R. Angstadt et al, The L0 Inner Silicon Detector of the DØ experiment, NIM A, 622, 298, [arXiv:0911.2522], 2010.
[4] I. Kipnis, S. Kleinfelder, L.Luo, O. Milgrome, M. Sarraj, R. Yarema, T. Zimmerman: A Beginners Guide to the SVXIIE,
FERMILAB-TM-1892. version from 10/96.
[5] M. Garcia-Sciveres et al, The SVX4 integrated circuit, NIM A, 511, 171, 2003.
[6] E. Barsotti, M. Bowden, H. Gonzalez, M. Johnson, D. Mendoza, T. Zmuda, VME Readout Buffer, Fermilab Document Nr
ESE-SVX-950719, 10/12/2001.
