A Radiation Tolerant 4.8 Gb/s Serializer for the Giga-Bit Transceiver by Çobanoglu, Ö et al.
A Radiation Tolerant 4.8 Gb/s Serializer for the Giga-Bit Transceiver
O¨. C¸obanogˇlua, P. Moreiraa, F. Faccioa
a CERN, PH-ESE-ME, 1211 Geneva 23, Switzerland
ozgur.cobanoglu@cern.ch
Abstract
This paper describes the design of a full-custom 120:1 data seri-
alizer for the GigaBit Transceiver (GBT) which has been under
development for the LHC upgrade (SLHC). The circuit operates
at 4.8Gb/s and is implemented in a commercial 130 nmCMOS
technology. The serializer occupies an area of 0.6 mm2 and its
power consumption is 300 mW . The paper focuses on the tech-
niques used to achieve radiation tolerance and on the simulation
method used to estimate the sensitivity to single event transients.
I. INTRODUCTION
The GBT project aims at developing a radiation tolerant op-
tical transceiver operating at 4.8 Gb/s within the framework
of the LHC luminosity upgrade. Links implemented using the
GBT will replace the three types of communication links cur-
rently in use, namely the timing, trigger and control (TTC) links,
the data acquisition (DAQ) links and the slow control (SC) links,
therefore providing a single solution for all the communication
needs at the SLHC.
The GBT chip set will include a radiation tolerant serial-
izer (SER) which converts 120-bit wide data words into a 4.8
Gb/s serial stream. Operating from a single 1.5 V supply, the
circuit accepts CMOS-level data and control signals. The seri-
alizer outputs a differential signal with a worst-case simulated
pattern-dependent jitter smaller than 6 ps at 4.8 Gb/s.
In the following section, the serializer architecture is de-
tailed and a brief overview of its operation is provided. Section
III deals with the circuit design of the major functional blocks.
Section IV introduces the method used to estimate the perfor-
mance of each circuit under radiation. Some relevant simulation
results are also provided within this section. Finally, Section V
summarizes the work.
II. ARCHITECTURE
Fig. 1 shows the overall architecture of the serializer. It
consists of a 120-bit input register, three 40-bit shift registers,
a frequency synthesizer consisting of a phase-lock loop (PLL)
with a feedback divider which is composed of two stages, one
dividing by 3 and the other dividing by 40, thus a total division
ratio of 120 and a 3:1 multiplexer shown as three switches.
The SER architecture is based on dividing the 120-bit frame
into 3 40-bit words which are serialized at 1.6 Gb/s and then
time division multiplexed to form the final 4.8 Gb/s serial bit
stream. This architecture reduces the number of components
operating at full speed.
Figure 1: The architecture of the serializer.
A fully integrated programmable charge-pump phase-locked
loop (CP-PLL) synthesizes a 4.8 GHz clock from the 40 MHz
LHC clock reference. To optimize the output jitter, the values
of the loop filter resistor and the charge-pump current are pro-
grammable with 2 and 4-bit resolution, ranging from 1.5 KΩ to
6.0 KΩ and from 1 µA to 100 µA, respectively.
The CP-PLL is designed to be tolerant to SETs by a combi-
nation of techniques: i) The voltage controlled oscillator (VCO)
transistors are designed as triple-well devices for better isola-
tion, to reduce the active volume where charge is collected and
finally to promote a quick drift of charge due to the electrical
field established by the bias voltages of the P and N wells. ii)
Triple Modular Redundancy (TMR) is used in the feed-back di-
vider of the PLL to mitigate the single event upsets (SEU). The
design targets the temperature range of [-20 Co, 100 Co] and
operates at 1.5 V , tolerant to power supply variation of 10%.
Fig. 1 and Fig. 2 sketch the overall operation of the se-
rializer as follows: at every rising edge of the master clock
(Clock40MHz) the 120-bit frame is loaded into the input reg-
ister. At every rising edge of load<i> signal, a 40-bit word is
loaded into the respective 40-bit shift register.
Since the PLL locks to the 40 MHz reference clock, it gen-
erates a bit clock (fBIT ) with frequency equal to 120 times
570
40 MHz, that is, 4.8 GHz from which three non-overlapping
clock phases (Q0, Q1, and Q2) are obtained. As shown in Fig.
2, these three clock phases are used to clock three shift registers
and to control the fast multiplexer in order to time division mul-
tiplex the three 1.6 Gb/s serial streams into a single 4.8 Gb/s
serial stream.
Figure 2: The timing diagram of the serializer operation.
A. Radiation Issues
In deep sub-micron technologies, the performance of high
speed circuits depend on many effects related to the layout to
an extent which is much greater than that for older CMOS pro-
cesses. Therefore in relatively-recent technologies, the layout
work should be introduced in a very early stage since it has a
large impact on the final performance. For accurate simulation,
some of the loop parameters, which play important roles in the
loop behavior such as the VCO gain, must be extracted from the
actual layout implementation.
As reported in [8, 9] and [11], the charge-pump and the VCO
are the most sensitive components of PLL circuits to SETs, and
their design has to take into account the increased sensitivity of
modern deep submicron technologies to SETs. In such tech-
nologies the integrated devices are located closer to each other,
thus an ionizing particle can affect simultaneously several de-
vices. Additionally the response of the parasitic devices to SETs
can lead to charge collection exceeding that deposited by the
ionizing particle. Examples are the PNPN parasitic structures
in CMOS devices which can even lead to latch-up and the par-
asitic bipolar junction transistors which cause enhanced charge
collection [3]. In this work, such conditions are addressed in the
VCO differential delay cells and the fast multiplexer (fMUX)
where triple-well transistors are used. The triple-well structure
is expected to better isolate the devices from radiation-induced
charge collection.
Considering the registers within the serializer, triple modu-
lar redundancy (TMR) scheme is used in the clock generator to
increase SET immunity. This technique however limits the max-
imum frequency to values much lower than is otherwise achiev-
able with this technology.
The techniques followed to minimize such penalties are
summarized in the next section.
III. CIRCUIT DESIGN
The delay cell[7] chosen for the ring-oscillator is a standard
differential-pair with symmetric loads as shown in Fig. 3. The
low-pass filter (LPF) voltage, shown as bn in Fig 3, is used to
control the differential pair tail current and thus to control the
VCO oscillation frequency. The bias of the symmetrical loads
is generated through the replica-bias circuit represented in Fig.
3-B.
Figure 3: The differential delay cell (A) and its replica-bias (B).
Figure 4: The fast-multiplexer with the history clearing scheme.
The fast multiplexer shown in Fig. 4, is implemented by an 8
input nMOS logic And-Or-Inverter (AOI) structure driven both
by the clock phasesQi and the pseudo-complementary shift reg-
ister signals SRi. The required clock phases Q0, Q1, and Q2
are generated within the PLL clock divider. The clock phases
Q0, Q1 and Q2 are non-overlapping so at any time only one of
the fMUX branches is active.
A straightforward AOI multiplexer has the following draw-
backs that the circuit presented in Fig. 4 addresses. Firstly, de-
pending on the history of logic levels of SRi inputs driving the
branches which are disabled by logic-low Qi signals, the output
node experiences different amounts of charge sharing between
the nodes n1 to n3 leading to different delays and thus pattern-
dependent jitter. In order to solve this problem, relatively small
transistors driven by the next Qi phase are connected in parallel
with those driven by SRi to clear the effect of signal history.
When a branch is selected by the corresponding Qi phase, these
small transistors ensure that the node in between the two tran-
sistors is pre-discharged to the ground so that all the transitions
start with identical initial conditions. In this way, the pattern-
dependent timing ambiguity is minimized.
A. SEU Tolerance
Radiation tolerance of the feed-back divider is obtained by
the TMR techniques. Due to the extra logic employed, these
techniques limit limit the maximum achievable operating fre-
quency. Circuits voting the inputs or outputs of D-FFs increase
571
the logic propagation delays and cannot be used for high speed
applications. Instead, a novel voted dynamic D-FF was de-
signed which embeds the voter. Its schematic can be seen in
Fig. 5.
Figure 5: Improved TRM dynamic D-FF.
IV. SIMULATION RESULTS
The PLL architecture adopted can be modeled by a second-
order type-II negative feed-back loop for which an analytic
model can be found in [5] and [4]. Fig. 6 shows the possible
operating points (circles) of the CP-PLL on the stability map
which plots the normalized forward loop gain as a function of
the normalized reference input. The overload and z-plane sta-
bility limits[4] are also shown. The desired operating points are
located in the vicinity of 10 % of the overload limit which set in
at lower values.
Figure 6: CP-PLL stability plot.
A practical issue in designing PLLs is the fact that not all the
loop parameters can be arbitrarily chosen, requiring some build-
ing blocks to be laid out before the actual model based simula-
tions can take place. As examples, the time constant of the loop
filter or the charge-pump current can be freely chosen, however
the designer cannot set arbitrarily the VCO gain since it very
much depends on the circuit topology and the semiconductor
process used. The VCO gain contributes to the forward gain of
the control loop and is a very important parameter for the loop
behavior. The VCO gain and its variations can be known only
once the circuit is laid out and the parasitics are extracted. Only
then the model based simulations mentioned above can be per-
formed. It is thus necessary to start the PLL design with the
implementation of the VCO down to the layout level before re-
alistic model based simulations of the loop itself can be done.
Circuit design in these cases is thus an iterative optimization
process between the schematic and the layout levels.
A. Single-Event Transient Simulations
In the simulation results presented in this section, the charge
released within the silicon by an ionizing particle is modeled as
ideal rectangular current pulses[6] of different amplitudes with
a fixed duration of 10 ps. Even though a double-exponent wave
form with a relatively long tail better resembles the actual wave
form, it must be extracted from process simulations to corre-
spond to a real conditions. At the time of this writing, how-
ever, such process simulation results were not available. Conse-
quently, the effects of the wave form of the injected pulse was
not modeled and only that of the magnitude of the injected net
charge was considered.
An incoming ionizing particle releases charge that is col-
lected by the microelectronic devices nearby. Fig. 7 sketches
how the ionizing particle passages are modeled as ideal cur-
rent pulses applied to SET vulnerable nodes of the circuit un-
der study. For the VCO differential cell shown in Fig. 7, the
charge released is sensed by the drain and/or the source of the
transistors causing an effective phase shift at the VCO signal.
The simulation result of Fig. 8 shows the low-swing differen-
tial VCO signal and the corresponding large-swing single-ended
output when an ionizing particle releases charge in the circuit at
approximately t=300 ps instant from the beginning of the sim-
ulation. The injected charge is relatively small causing only a
small phase shift, however in case the amount of charge released
by the ionizing particle is large enough, the VCO can even cease
oscillation for a while and then recover nominal operation. Such
a condition is shown in Fig. 9.
Figure 7: Differential delay cell (D) and the two vulnerable points to
be affected by an ionizing particle passage, denoted as A and B.
572
Figure 8: Moderate SET-induced disturbance: ionizing particle strike
perturbs (0.1 pC) the VCO (small amplitude signals) and shifts (as 10
ps) the D2S phase from its nominal evolution (large amplitude non-
perturbed signal and its shifted copy).
Figure 9: Due to excessive charge deposition (50 pC) by ionizing par-
ticles, the VCO oscillation can be temporarily interrupted.
Figure 10: The effect of the ionizing particle’s arrival instant on the
amount of delay it causes.
The phase shift induced on the VCO signal by the ionizing
particle does not only depend on the magnitude of the charge
deposited but also on the instant the charge is collected by the
circuit in relation to the VCO cycle. It is possible to find the
worst-case sensitivity to the collection of charge via simulation
by sweeping the ”arrival time” of the ionizing radiation.
Fig. 10 sketches conceptually a simulation result where the
output of the differential ring-oscillator is plotted at the top and
the delay caused by 0.1 pC of charge injected is plotted as a
function of the arrival instant at the bottom. The injection in-
stant is swept over a single VCO cycle. Simulations show that
there are two time intervals where the sensitivity is the highest:
these correspond to the periods where the VCO output changes
at a faster rate. The instants marked as ta and tb correspond
to the maximum phase shift (PSmax) and the sensitivity is 100
s/C or is equivalently 1.6x10−17 s/e−. The SET performance
of the VCO is evaluated based on these worst-case time instants.
The worst-case phase error as a function of injected charge
is plotted in Fig. 11. The design criteria used for the 4.8 GHz
VCO was that a 30 mA current pulse with 10 ps width, corre-
sponding to 0.3 pC of charge release, injected/sank to/from the
nodes A and/or B of Fig. 7 should cause a maximum phase shift
of approximately 20 ps. Intuitively considering closed loop PLL
operation, the amount of timing error per reference clock cycle
that the ionizing particles generate should not be bigger than the
amount of correction that the loop can perform. This limits the
maximum phase error and prevents bursts of errors that other-
wise will lead to a significant increase in serializer bit-error rate
(BER). This is an issue specific to the design of radiation toler-
ant PLLs. To achieve such a robustness, we adopted the solution
of keeping the current flowing through the transistors just large
enough so that the charge released by an ionizing particle does
not significantly affect the circuit biasing and oscillation cycle.
To accommodate the higher currents while keeping a specific
oscillation frequency, transistor widths have to be increased ac-
cordingly. This helps achieving tolerance to SETs due to the in-
creased circuit capacitances. The disadvantage of the technique
is the increased power consumption of the VCO which might
have to be biased with currents several times higher than those
that would be normally required to achieve low phase noise op-
eration at the given operating frequency. Running the VCO at
high currents does not however impair its phase noise perfor-
mance.
Figure 11: The worst-case phase error versus the injected current.
The phase shifts considered so far occurs only once an ioniz-
ing particle releases charge within the VCO delay cell. However
if the charge is deposited at the charge-pump node connected
to the filter, the voltage difference it causes on the VCO con-
trol signal modulates the VCO frequency. The VCO frequency
difference will integrate over 120 VCO cycles until the phase-
frequency detector (PFD) generates the next correction signal.
In order to minimize this effect, a large filter capacitance must
be employed. In the serializer PLL, segmented nMOS transis-
tors in accumulation mode were used with a total capacitance of
300 pF. The LPF itself occupies a total area of less then 400x200
µm2.
V. SUMMARY
The BER performance of high-speed links depends strongly
on the jitter characteristics of the serializer and deserializer cir-
573
cuits. For the serializer, jitter in the transmitted signal has two
main origins: random jitter generated by the VCO phase noise
and the tracking behavior of the clock multiplying PLL, pattern
dependent jitter essentially due to bandwidth limitations, and
clock skew in the parallel-to-serial conversion circuits. For se-
rializer circuits operating under radiation, ionizing particles can
contribute significantly to the increase of the BER[13]. This can
take the form of single or bursts of errors. Single errors can hap-
pen for example when one of the bits of the serializer shift regis-
ters suffers a single event upset. However circuits like the VCO
and the PLL loop-filter when disturbed can lead to bursts of er-
rors adversely affecting the BER which can even lead to losses
of link synchronization which will result in relatively large dead
times in the data transmission system. It is thus particularly im-
portant to minimize the effects of SETs on these last two circuits
since they keep a ”long term” memory of the disturbing event.
For the VCO, in the best case, a SET will appear as a phase
jump that will stay uncorrected until the PLL action restores the
steady state conditions. In the case of loop-filter, any distur-
bance will be integrated resulting in large phase errors which
again need to be compensated by the PLL. Since in serializer
PLLs the loop bandwidth is typically several orders of magni-
tude lower than the VCO oscillation frequency, the loop action
alone is not fast enough to fully compensate for the effects of
SETs. It is thus important to use SET robust circuits in the PLL.
This paper described the approach adopted to achieve this goal.
In particular, the design criteria and simulation method used to
design a SET robust VCO were detailed. There it was shown
that for SEU tolerance, running the VCO with relatively high
currents is an advantage. Although low-power consumption is
always desirable, our study shows that ring oscillators can only
be made low power at the cost of high sensitivity to SETs.
Another critical component in a PLL working under radia-
tion is the feedback counter. Any upset in this circuit might ap-
pear to the PLL as large phase shift resulting on a long settling
time or even in a full locking cycle. In any case, such an event
will almost certainly desynchronize the receiver PLL resulting
in a long dead time. To avoid such behavior the clock divider
must use a triple modular redundancy architecture. However,
due to the high speed operation of the counter, it became evi-
dent that the common scheme of using a flip-flop preceded by
a majority voter would not allow to design a high yield circuit
for the specified range of process, temperature and voltage vari-
ations. To overcome this obstacle a new dynamic flip-flop with
embedded voter was developed and is used in the ASIC for the
digital circuits that operate at the highest clock frequencies.
Also with the aim of achieving high yield, the parallel-to-
serial converter uses three shift registers operation at 1/3 of the
bit clock frequency. The full data rate serial stream is obtained
by time division multiplexing those three serial steams using a
single fast multiplexer. This multiplexer uses a special architec-
ture to minimize pattern dependent jitter and it is described in
detail in the paper.
A serializer/de-serializer ASIC that contains the serializer
described in this work was designed in a commercial 130 nm
CMOS technology. Fig. 12 shows the serializer layout.
Figure 12: Layout of the serializer occupying 0.6 mm2 of die area.
REFERENCES
[1] A.I. Chumakov, et al., Elsevier Radiation Measurements
30 (1999) 547-552
[2] G. Anelli, et al., IEEE transactions on nuclear science,
2002, Vol. 49, No 4
[3] G. Bruguier et al., IEEE transactions on nuclear science,
Vol. 44, 522-532, April 1996
[4] F. M. Gardner, IEEE journal of solid-state circuits, Vol.
com. 28, no. 11, 1980
[5] F. M. Gardner, Phaselocking Techniques, John Wiley and
Sons, 2005
[6] H. H. Chung et al., IEEE transactions on nuclear science,
Vol. 53, no. 6, 2006
[7] J. G. Maneatis, IEEE journal of solid-state circuits, Vol.
31, issue 11, November 1996, page(s):1723-1732
[8] T. D. Loveless et al., IEEE transactions on nuclear science,
Vol. 53, no. 6, 2006
[9] T. D. Loveless et al., IEEE transactions on nuclear science,
Vol. 54, no. 6, 2007
[10] W. Chen et al., IEEE transactions on nuclear science, Vol.
50, no. 6, 2003
[11] Y. Boulghassoul et al., IEEE transactions on nuclear sci-
ence, Vol. 52, no. 6, 2005
[12] Z. Cao et al., IEEE journal of solid-state circuits, Vol. 43,
no. 9, 2008
[13] T. Toifl, P. Moreira and A. Marchioro, Proceeding of
the Sixth Workshop on Electronics for LHC Experiments,
Cracow, Poland, 11-15 Sept. 2000, pp.226-30
574
