Abstract-A 3.2-Gbit/s serializer prototype has been fabricated in a 0.13-m CMOS technology to demonstrate its applicability within future Large Hadron Collider (LHC) data readout and trigger systems. The IC includes a clock-multiplying phase-locked-loop (PLL), a 50-line driver, internal self-testing features, and data pattern generation. The serial output stream is 8 B/10 B encoded for compatibility with commercial receivers. Radiation hardening layout techniques have been adopted, which guarantee radiation tolerant operation inside the innermost LHC detectors over more than 10 yr. This paper describes the circuit architecture and reports on the experimental results. Signal quality (jitter, noise floor, eye opening) and bit-error rate (BER) are measured at different transmission rates using laboratory instrumentation and dedicated test beds.
I. INTRODUCTION

I
N modern high-energy physics (HEP) colliders, the high rate of interactions multiplied by the large number of electronic channels generate a massive amount of data to be transmitted from particle detectors to counting rooms, over a typical distance of a few hundred meters. Limited power, space, and material budget, as well as cost constraints, dictate the use of high-speed optical links and fast serializers and deserializers. Components can be found on the market that meet the large bandwidth requirements for most HEP applications; however, they generally fail to be reliable for operation in the specific radiation environment of particle detectors.
Custom solutions are often required and, in particular, it has been shown that specific chip-design rules can be applied to commercial CMOS technologies in order to produce devices that are sufficiently radiation tolerant [1] , [2] . In the first generation of ASICs for the Large Hadron Collider (LHC), many chip developments have been undertaken using a 0.25-m CMOS technology and these radiation-tolerant design rules. In particular, the Gigabit Optical Link (GOL) chip [3] - [5] was designed to implement a gigabit serializer for large volume data transmission in the experiments. This serializer chip could operate at two transmission speeds (800 Mbps and 1.6 Gbps) and implements two standard line-encoding schemes (G-Link and Gigabit-Ethernet). The GOL is currently being integrated in the readout systems of all LHC detectors.
Ever smaller feature-size technologies are being introduced at a fast pace by the semiconductor industry. To profit from these developments and to cover such needs as replacing, upgrading, and developing new parts for current and future experiments, it is important that a potential alternative to the 0.25-m CMOS is identified and qualified. For these reasons, we have started a thorough evaluation of a 0.13-m CMOS technology, available in production since 2002. We have implemented a wide range of test structures, from simple transistors, to basic circuit blocks and complete applications [6] - [8] . This effort has the following goals: a) to qualify the technology for application in the various LHC environments; b) to assess the potential improvements in performance, should the current building blocks be translated in this newer technology. In this context, a multigigabit serializer has been implemented which exploits the very high device speed available. This paper is organized as follows: Section II presents the circuit architecture. Section III gives information about the technology and layout implementation. Section IV presents the measurement setup and results. SectionV compares circuit behavior before, during and after ionizing irradiation. Section VI contains a summary of the results.
II. CIRCUIT ARCHITECTURE
The circuit includes various high-frequency blocks, as illustrated in Fig. 1 . The core function is represented by the gigabit serializer, which receives data in parallel and pipelines them toward the output 50 line driver at very high speed. The clock-multiplying phase-locked-loop (PLL) provides a low-jitter master clock, which is scaled at different frequencies and distributed to other circuit blocks by the clock divider. In this circuit demonstrator, for ease of testing and debugging, test data patterns are generated internally by the pseudorandom sequence generator (PRSG) and supplied to the serializer inputs through the 8 B/10 B encoder. The following signals must be provided (or can be monitored) externally: the input clock (InClk) is a reference clock at a frequency 20 times lower than the transmission frequency; the output clock (OutClk) is the PLL-synthesized clock whose phase/frequency are locked to the input clock; an external strobe signal toggles between transmission of pseudorandom digital characters and conventional idle characters; other digital control levels are used for configuring the PLL and for enabling various test signals. Serial data is transmitted off chip using 50 transmission lines.
A. PLL
The PLL is a charge-pump PLL [9] , [10] and is based on the following blocks, shown in Fig. 2 : the three-state phase and frequency detector (PFD), the charge-pump (CP), the RC loopfilter, the voltage controlled oscillator (VCO), and the VCO output buffer (BUF).
The CP current can be varied over more than one decade (12-192 A) to allow good control over the loop parameters (damping factor and loop bandwidth), in spite of process variations. The loop filter capacitor (20 pF) and resistance have been implemented on chip. The resistance value can be chosen between two possible values (3.5 and 5 k ). The loop bandwidth has been designed to be MHz (2-4% of the clock-rate, which is generally considered to be a good compromise when different noise sources are accounted for).
The VCO is a four-cell ring oscillator. The VCO cells are differential delay-cells with symmetric-loads, featuring good supply-noise rejection and large tuning range [11] . The use of four cells in a loop is enough to match speed requirements, while allowing more relaxed gain and phase-shift constraints in the design of individual cells. The overall jitter is independent from the number of cells [12] .
The VCO buffer converts the high-speed clock from differential to single-ended, while preserving the 50% duty cycle in view of a dual-phase serializer implementation (see later). However, due to the poor modeling of analog transistor behavior which was available to us at design time [7] , it would have been risky to base the design performance on a precise duty-cycle at the output of this buffer. The high-frequency clock is therefore processed by a fast frequency divider, before the serializer. The PLL block has been optimized for low-jitter operation in the 2.5-4 GHz range.
B. Clock Divider
The clock divider provides the different clocks required for circuit operation starting from the highest-frequency VCO clock. That is: the bit clock, the byte clock, and the output clock.
The bit clock has half of the frequency of the VCO clock. Both rising and falling edges are used to trigger the dual-phase serializer, so that two bits of data are transmitted for every clockcycle. A symmetrical clock-divider is used to generate a 50% duty-cycle (half-frequency) clock from the VCO clock.
The byte clock is used to trigger the low-frequency sequential operations. The pseudorandom byte generation, the 8 to 10-bit data encoding, and the parallel loading of the serializer are all triggered through the byte clock. Since 10 bits are transmitted for every byte-clock cycle, the byte-clock frequency is 1/10 of the VCO frequency.
The output clock is derived from the byte clock by applying yet another frequency scaling by a factor of 2. The resulting clock (1/20 times the frequency of the VCO clock) is the PLLsynthesized clock whose phase is locked to the external reference clock (input clock). This output signal can be used to characterize the PLL jitter performance or to synchronize a receiver.
C. Gigabit Serializer
The 10-bit words must be serialized and streamed out at the VCO clock rate. The serializer structure is explained in Fig. 3 . It is composed of two 5-bit shift-registers which are simultaneously loaded with even bits and odd bits. The shift registers are clocked out at the bit clock rate (half of the VCO clock rate). An additional latch delays one of the two outputs by half a clock cycle. The two outputs are finally multiplexed using a fast 2:1 multiplexer and streamed out at the VCO clock rate. The advantage of this dual-phase architecture is that the shift registers are clocked at half the transmission rate, while only the output multiplexer is running at full speed. This scheme is faster than a single full-speed shift-register; however, it relies on a clock with carefully controlled duty-cycle.
D. Line Driver
The line driver is a pseudodifferential 4 Gbps 50 driver with adjustable gain, capable of interfacing with the most common optical transmitter modules or electrical receivers. The driver is back terminated on 50 thus reducing backreflections and supports ac coupling. The driver gain is sensitive to technology variations, however, the range of gain variability was chosen so that it is always possible to make sure that logical levels are compatible with external modules.
E. Data Generation and Encoding
Standard CCITT data patterns are generated inside the chip by the PRSG based on a 16-bit shift register. Internal data generation is useful for debugging (also in a final application) and was considered sufficient for our circuit evaluation purposes. No parallel data is provided externally and the circuit acts only as a serializer demonstrator, while no real data-transmission application is possible. Indeed, at this prototyping stage, parallel data inputs would have had an important cost due to the additional area required for the large number of corresponding pads.
The encoder block translates 8-bit words into dc-balanced 10-bit words, according to an 8 B/10 B-encoding scheme such as the one used, for example, by the proprietary Fiber Channel protocol [13] . Conventional idle characters are generated when random data transmission is inactive.
III. LAYOUT AND TECHNOLOGY
The IC design has been submitted for prototyping in a 0.13-m commercial CMOS technology, featuring: twin-wells on nonepitaxial p-substrate, shallow-trench isolations (STI), 2.2-nm gate oxide thickness, and 1.5-V power supply. Six copper layers and aluminum pads are used for the interconnects. Special devices with nonstandard threshold voltages and oxide thicknesses, though attractive for high-speed applications, were not available to us at design time.
The circuit has been integrated using the following radiation-hardening design rules: all NMOS devices have been laid out using enclosed (edgeless) geometry; p-diffusion guard rings are used to isolate all n-diffusions at different potentials, i.e., source/drain diffusions for n-devices and n-well diffusions for p-devices. These two simple practices prevent the formation of radiation-induced parasitic conductive channels in the thick oxides [1] , [2] . No precaution to prevent the malfunctioning of the serializer due to single-event-upset (SEU) events has been included in the circuit; such blocks will be added in the production version. Refer to [5] for a discussion of circuit practices to reduce the impact of SEU. The final chip size was 1.9 mm 1.2 mm, largely determined by the pad count (36).
The IC has been processed with three different set of technology parameters, corresponding to fast, typical, and slow corners. This practice (called striping) is useful at a prototyping stage and allows evaluation of the performance spread due to the unavoidable process variations during production.
IV. PROTOTYPE EVALUATION
A. Measurement Setup
The prototype IC has been bonded directly on a PCB transmitter board. The reference clock that was supplied to the IC had a controllable jitter, a useful feature which allowed to measure the PLL jitter transfer-function and to evaluate the IC robustness to input jitter. A large-bandwidth (20 GHz) digital oscilloscope was used to observe the quality of the PLL and serializer outputs (jitter, eye opening, and noise floor) and to quantify their performance.
A receiver board based on a commercial deserializer (TLK3101 from Texas Instruments) and a field programmable gate array (FPGA) (Altera) has also been developed. The deserializer contains a Fiber-Channel decoder (8 B/10 B protocol) and allows data transmission rates up to 3.2 Gbps. The FPGA is used to control the deserializer and analyze the received data. The transmitter and receiver boards can be connected via coaxial cables or Small Factory Format (SFF) optical transceivers.
This setup allows measurement of the bit-error rate (BER) under various operating conditions. For this purpose, the FPGA is programmed to emulate the on-chip data generator and to synchronize with the received and decoded data stream. The actual comparison between expected and received data is done (inside the FPGA) in parallel on four consecutive words, allowing the FPGA to run at a reduced frequency. A logic analyzer connected to a PC is used to download information about the synchronization state and number of transmission errors. Fig. 4 shows the PLL jitter transfer-function. The damping factor and bandwidth can be varied to some extent by adjusting the CP current and filter-resistor values. The loop bandwidth can be varied between 2 and 10 MHz and is, therefore, well centered on its design value of 5 MHz. The PLL output jitter is minimum when the bandwidth is minimum, thus showing that the intrinsic (VCO) jitter is not dominant. 5 shows the PLL output jitter versus operating frequency, at different levels. From this graph, it is possible to infer the maximum lock frequency of the PLL. At sufficiently low frequency, the intrinsic PLL jitter stabilizes to a normal-operation value in the order of 2-3 ps, rms. However, when the frequency is increased, eventually the output jitter starts rising. At the same time, activity increases on the control signals of the charge-pump showing that the loop is not working efficiently. Beyond this limit, the PLL does not lock any more to the input signal. The maximum lock-frequency for a PLL operating at 1.5 V is 3.3 GHz (nominal process). Fig. 6 shows the data-transmission eye-diagram, when random-data is serialized at 3 Gbps over a 50-coaxial cable. The eye-opening and noise-floor in this diagram reflect good dynamic and jitter performance. Some of the ICs have been "double" bonded (with two wires for every bonding pad) but do not show further improvement in the output waveforms. This is a good indication that the circuit is not very sensitive to inductance in the bonding wires.
B. Measurement Results
Error-free transmission has been performed at 3 Gbps and 3.2 Gbps for more than 4 d (1.5 V). This corresponds to a BER well below . Fig. 7 shows the BER measured as a function of the input jitter. This graph shows the limit of operation of the serializer, in particular with respect to its robustness to input jitter. The serializer was operated error-free at 3.2 Gbps, with a "white" input-jitter of 530 ps amplitude (peak-to-peak).
The serializer power consumption with the PLL locked at 3.2 GHz is 90 mW (1.5 V).
V. RADIATION HARDNESS
One serializer IC has been irradiated at 25 C, during normal operation at 3 Gbps 1.5-V power supply. The in-house X-ray generator SEIFERT RP149 has been used to this purpose. X-rays, peaking at 10 keV, were produced by a Tungsten target. The tube provided an ionizing dose-rate of 240 Gy/min. A total ionizing dose (TID) of 300 kGy has been achieved in 24 h. Annealing was performed for 24 h at room temperature, and for 1 wk at 100 C (accelerated annealing), while keeping the device operational.
The supply current measured before and after irradiation (both immediately after irradiation and after annealing) did not show any significant change. However, the supply current monitored during irradiation showed a temporary increase of up to 10 mA. The output jitter has been measured before irradiation, three times during irradiation (at 10, 100, and 300 kGy, respectively), after room-temperature annealing and after accelerated annealing. In all the measurements, the output jitter did not show any significant change. Error-free data transmission at 3.2 Gbps was performed again, for 24 h during the room-temperature annealing and for more than 4 d at the end of the accelerated annealing .
VI. CONCLUSION
A multigigabit serializer demonstrator has been implemented in 0.13-m CMOS containing all of the critical blocks required for a full transmitter chip design: a clock-multiplying PLL, a gigabit serializer, and a gigabit line-driver. The evaluation gave the following results: the PLL has an intrinsic jitter of 3 ps (rms) and 19 ps (p-p); the maximum PLL lock frequency is GHz for a "nominal" process operating at 1.5 V (nominal) supply voltage; the line driver shows good dynamic and noise characteristics, also with respect to bonding parasitics and intersymbol interference. BER in point-to-point transmission has been measured and it is less than , both before and after the irradiation of the serializer and with a white jitter on the reference clock up to 500 ps (p-p).
