Abstract-SVX4 is the new silicon strip readout IC designed to meet the increased radiation tolerance requirements for Run IIb at the Tevatron collider. Devices have been fabricated, tested, and approved for production. The SVX4 design is a technology migration of the SVX3D design currently in use by CDF. Whereas SVX3D was fabricated in a 0.8-m radiation-hard process, SVX4 was fabricated in a standard 0.25-m mixed-signal CMOS technology using the "radiation tolerant by design" transistor topologies devised by the CERN RD49 collaboration. The specific cell layouts include digital cells developed by the ATLAS Pixel group, and full-custom analog blocks. Unlike its predecessors, the new design also includes the necessary features required for generic use by both the CDF and D0 experiments at Fermilab. Performance of the IC includes 20 MRad total dose tolerance, and 2000 e-rms equivalent input noise charge with 40-pF input capacitance, when sampled at 132-ns period with an 80-ns preamp risetime. At the nominal digitize/readout rate of 106/53 MHz, the 9 mm 6.3 mm die dissipates 2 mW/channel average at 2.5 V. A review of typical operation, details of the design conversion process, and performance measurements are covered.
of the SVX3D IC currently in use by CDF [2] , but it includes enhancements and improvements to that design, as well as superior radiation tolerance. A photo of an SVX4 die is depicted in Fig. 1 .
SVX4 incorporates 128 charge preamplifiers, a 128 46-cell analog latency pipeline capable of buffering up to 4 samples, a 128 8-bit Wilkinson-type analog-to-digital converter (ADC), a readout sparsification circuit with 8-bit parallel data I/O via differential transceivers, and a 192-bit single-event upset (SEU)-tolerant configuration register. Typical operation is similar to that of SVX3D. For each channel, charge signals of up to 60 fC are stored in the 46-deep analog pipeline by periodic sampling of the charge preamplifier output at a 132-ns rate. The resulting analog voltage stored in each cell represents a correlated double sample of the input charge, which has the effect of rejecting lowfrequency noise. The preamplifier stage has a programmable risetime, which limits the high-frequency noise. The pipeline operates as a ring of storage cells per channel, with the capability to remove up to four cells from the circular chain to accommodate the indefinite storage of up to four 128-channel samples before a digitize-readout sequence is required. The design features "dead-timeless" operation, meaning that pipeline data collection continues during digitize and readout modes. The dead-timeless operating mode is made possible by concurrent use of the pipeline read and write amplifiers depicted in Fig. 2 . Pipeline cells flagged for readout by an external trigger system will be skipped by the pipeline write logic until they are read out. Various levels of readout sparsification can be programmed.
II. DESIGN CONVERSION
SVX4 keeps the same floorplan and low-impedance backside-grounding scheme that was proven successful for deadtimeless operation in SVX3D [3] . While reuse of the floorplan expedited the layout somewhat, several major blocks required redesign for implementation in the 0.25-m process. This came about for technical reasons related to the process design rules, transistor parameters, and availability of precise passive components, as well as a general desire to take advantage of the performance potential of the new process.
A. Backend Digital
The backend digital core, which includes the FIFO, gray counter, and I/O control cells, was synthesized using the digital library developed by the ATLAS Pixel group [4] . This cell library features radiation tolerant layouts based on the research of the RD49 collaboration [5] . The area of the new static logic design, using nonminimum transistors as required for radiation-tolerant layout, is about one-half that of the 0.8-m radiation-hard design.
The FIFO module of SVX4 is the heart of the readout logic. It consists mainly of a 128 15 static register array that provides the memory for the ADC data and channel addresses. Additional readout control logic within the FIFO configures memory access to provide for parallel loading of the counter values during digitize mode, and serialization and sparsification of the memory contents during readout. One half of the FIFO register block is constructed as a 128 8 data register matrix. In digitize mode, each channel latches the 8-bit counter value present at the time the Wilkinson ADC ramp crosses its signal voltage and fires its comparator. It is concurrently decided, based on a digital threshold value and the selected sparsification mode, whether a given channel is flagged for readout. The other half of the FIFO register is a 128 7 matrix that is preset to each channel's address at the start of readout. During readout, the two register banks, address and data, are shifted out alternately on complementary 26-MHz clock phases, resulting in a 52-Mbyte/s data rate. Channels not flagged for readout are passed over by "skip logic," effectively creating an -bit parallel shift register for both the address and data banks, where is the number of hit channels according to the programmed parameters (readout mode, digital threshold).
The critical part of this block is the skip logic cell used to sparsify the data. Fig. 3 provides a block diagram of the skip logic cell and describes the circuit function. The delay of this cell limits the readout speed to for the worst-case scenario, i.e., a ch. 127-only hit in sparse mode. The measured propagation delay of the cell ( 300 ps), deduced from the maximum readout speed, is within 10% of that simulated with layout capacitance parasitics. Depending on the process parameters, the performance of the skip logic can reach its performance limit at 26 MHz. At higher clock frequencies, the logic fails gracefully, resulting in repeated readout of upper channel addresses rather than data corruption. In order to completely eliminate the small probability of invalid data due to a metastable condition in this operating mode, a selectable feature was included to readout channel 63 regardless of the mode. This increases the occupancy 1%, but also halves the total propagation delay through the skip logic cells.
B. Charge Preamp
The preamplifier was the first stage to be redesigned for the 0.25-m process. The new design employs a fast reset switch with a charge-injection circuit to set the operating point. The fixed charge injection occurs coincident with the release of the reset signal, and its magnitude is determined by an on-chip reference that can be overridden. This technique allows the output quiescent bias point to be set arbitrarily, independent of the input transistor bias point. A programmable "black-hole" elimination feature was also added to the new design. This circuit has the capability to shunt up to 2 A from the inputs of any shorted detector strips by holding their respective preamplifiers in reset mode.
In the initial design phase, a frontend test chip was produced and measured in order to optimize the input device sizing. The three factors considered were low noise-slope, maximum , and minimal excess noise contribution. The size of the input transistor selected is 1290/0.8 m. The noise data from the study is provided in Table I . The measurements were taken at the pipeline output with a CDS period of 100 ns, a 70-ns preamp rise-time, and a 250-A input device bias current. The area of the preamplifier is about one-half that of the SVX3D design.
C. Pipeline
The pipeline controller was redesigned to fix a previous bug, where issuing additional level-1 accepts beyond 4 would corrupt the buffered analog data. The new design implements the same functionality as SVX3D, with minimum enclosed-layout transistor sizes. Together, the gain of the preamp and pipeline were maintained in SVX4, while dual polarity operation was removed in order to enhance the dynamic range for electrononly charge collection in the lower voltage process. Metal-insulator-metal (MiM) capacitors were used to improve the pipeline cell matching and reduce the layout area. The use of MiM capacitors for the storage cells also made it possible to integrate PMOS decoupling capacitors into the pipeline module for both the analog and digital power busses. The area of the new pipeline and pipeline controller is about one-third that of the original design.
D. ADC
The ADC is based on the same Wilkinson-type design used in SVX3D. Digitization is accomplished by starting a digital counter coincident with a linear analog voltage ramp. The ramp voltage is compared to the DC signal value to be digitized using an analog discriminator. The discriminator will then fire with a time delay proportional to the signal value. This discriminator signal is used to latch the digital counter value in a register for each channel. The SVX4 voltage ramp is generated by a highlinearity current source and capacitor bank, where the output is buffered by a unity-gain op-amp. The slope of the voltage ramp can be trimmed with configuration bits that select various capacitance values in the ramp generator.
The pipeline data can be digitized in one of two modes. The standard mode of operation starts the ADC ramp and the counter at the same instant, as described above. The practical problem that arises in this mode, however, is that system noise picked up by the detectors and/or injected into the ICs by nonideal power supplies can cause significant baseline shifts (pedestal variation) from sample to sample in the pipeline. From the perspective of the experiment, these common-mode pedestal variations cannot be distinguished from noise, and therefore can cause false triggers. This is unfortunate, because 1) this is not IC noise, but affects the triggering capability just as if it were, and 2) it is unpredictable in magnitude at design time since there are far too many variables in the assembly of such a large detector system. For these reasons, a second mode of operating the ADC was introduced on SVX3, and is carried over to the SVX4 design. Called real-time pedestal subtraction (RTPS) mode, this design feature introduces a second comparator into each ADC channel, and one additional threshold discriminator per chip.
The operation of RTPS is described below, with reference to Fig. 4 . First, the full-swing output of the channel discriminator will contribute a fixed amount of charge if sensed by a series capacitor. This contribution from each channel can be summed onto a single common capacitor in order to produce a voltage proportional to the number of channels that fire. The common capacitor voltage can then be compared to a reference voltage using an additional threshold discriminator in order to produce a timing signal that indicates when a selected number channels have fired.
If the threshold discriminator output is used to delay the start of the counter, and if a relatively large fraction of the channels (40 channels are typical for the CDF experiment) must fire in order to fire the discriminator, then the common-mode value of a given sample (pipeline cell) will be subtracted from all channels. Note that it is necessary to also delay the channel discriminator signal when using this technique so that zero-count data values are not produced, i.e., the counter must be made to start before the firing of the any of the channel discriminators. This condition requires that the second channel comparator, called the "delay" comparator in Fig. 4 , be slower than the threshold discriminator in all cases. Note also that this technique should only be used where sparse data is present. If a majority of channels have a large signal value, this technique will yield a zero-count result for the channels without signal (because they will have fired relatively long before), and just a few counts for the channels with signal (the counter value difference between the channel comparator delay and the threshold discriminator delay), which would be an incorrect result.
Initially, the SVX3 ADC layout was translated directly into 0.25-m with the same topology and similar transistor sizing, but using enclosed-layout devices. Simulations of the extracted layout with parasitics yielded good performance expectations ( 1/2 LSB noise, 8 bits linearity). However, one of the few measured deficiencies of the prototype IC was a spatial variation of the pedestal value of about 10 counts peak-to-peak in the nominal operating condition. This effect was reproduced in simulation using Monte Carlo variation of the of suspect transistors over the expected process range. The main contribution to the pedestal variation was confirmed to be the result of insufficient bias current in the PMOS active load transistors of the channel discriminator stage of the ADC. The measurement and simulation results are detailed in Fig. 5 .
Two different solutions were implemented in two designs to increase the probability of a successful pre-production device. In the first version, the comparator bias current was simply increased, thereby raising forward bias of the current sources, and making them less susceptible to variation due to variation. In the second version, the ADC was redesigned, which resulted in superior performance, including a lower absolute pedestal count (delay) and a lower spatial pedestal variation. For both versions, the performance predicted by the Monte Carlo simulations matched the measured results. The two new versions achieve 2-count/1-count peak-to-peak spatial pedestal variation across the chip, and 30/ 20 ADU absolute pedestal value, respectively.
E. SEU-Tolerant Register
The 192-bit SEU-tolerant configuration register module is implemented as a D-flip-flop shift register with an SEU-tolerant shadow register composed of latch cells designed by the ATLAS Pixel group. The topology of the special latch cells is that of the DICE cell [6] .
F. Pad Transceivers
The differential output drivers are a custom design based on SVX3D. They produce up to a 1-V swing across a 75-load. Three bits of the configuration register are used to set the driver strength. The receivers are the LVDS design of ATLAS.
III. INTEGRATION, SIMULATION, AND VERIFICATION
The ability to check the functionality of the entire design was a key ingredient in the success of the first prototype. This was possible due to improvements in associated design tools and simulation models over the past few years, which now make it possible to accurately simulate large numbers of transistors in an analog mode.
The simulation and verification process used for the modules is briefly described. The backend digital core was simulated as a module with 15 Verilog test vectors. This block was also simulated at process corners with HSPICE 1 using piecewise-linear sources, which were generated from the Verilog stimulus. This was done again on the extracted layout. The analog modules, including the preamp, pipeline, and ADC, were simulated individually with HSPICE at all process corners using the layout-extracted circuit for each module. These components were assembled into frontend and backend modules, which were simulated with Nanosim 2 individually, as HSPICE could no longer converge on an operating point with the number of elements included. At this stage, Calibre DRC 3 was run on each individual module, as well as the completed top-level device layout, and divaLVS 4 was run on the corresponding extracted views. The results were confirmed by streaming the GDS-format physical layout file back into the Cadence framework for design-rulecheck (DRC) and layout-versus-schematic (LVS) verification.
Nanosim was used to simulate the operation of the top-level design with associated support circuitry. This simulation included two comprehensive test vectors that were converted from Verilog into piecewise-linear sources using a PERL script. Using Nanosim, the entire 250k transistor design, from the input pads to the output pads, was simulated in about 8 h per vector.
IV. MEASUREMENTS
The SVX4 prototype and pre-production devices have undergone comprehensive testing at several facilities, including total 1 HSPICE is a Synopsys Corp. simulation tool. 2 Nanosim is a Synopsys Corp. simulation tool. 3 Calibre is a Mentor Graphics Corp. IC verification tool. 4 DivaLVS is a Cadence Design Systems IC verification tool. dose tolerance and SEU rate evaluations. Operation on three different hybrid assemblies, a 10-chip design (D0), a 4-chip design (CDF), and a 2-chip design (CDF and D0) has been verified. Tens of each hybrid have been assembled and tested, and integrated into larger system modules called staves. The measurements performed on the pre-production devices have demonstrated that the shortcomings of the prototype have been completely mitigated in the new versions, which would allow for full production in the future using the same mask set. The yield is estimated to be greater than 90% for the pre-production run of 24 wafers in the TSMC 0.25-m mixed-signal fabrication process.
Irradiation in excess of 20 MRad from a Co source resulted in no significant performance difference from that of the pre-irradiation device. The SEU cross-section limit of the configuration register cell was determined using a 63-MeV proton beam. The pre and post-irradiation noise performance is plotted in Fig. 6 , together with reference data from SVX3D.
Extensive testing of the dead-timeless operating mode was also performed by CDF personnel. Fig. 7 shows a plot of the data from one such test. The horizontal axis represents the pipeline cell number for a single channel on one IC. Since the pipeline is overwritten every 46 samples, the pipeline cell number is converted into a "bucket" number, which represents the total number of pipeline cell samples in the entire sequence. The vertical axis represents the pedestal value read from the pipeline cell. One data set was taken in the standard ADC digitization mode, while the other was taken RTPS mode. The different mode labels along the "bucket" axis indicate which concurrent operation was being performed by the IC when the pedestal value was acquired, i.e., digitizing or reading out data. For the points taken during the "acquisition" mode, the IC was performing no concurrent task (nondead-timeless). Each data point plotted represents the average of 200 events, and the error bar represents the rms value. The dead-timeless testing method outlined above is an excellent mixed-signal performance measurement for SVX4 devices, as it simulates many features of actual detector system operation. This test is performed for all channels concurrently on a single chip, resulting in 128 data sets such as those in Fig. 7 , and can be extended to both hybrids and staves to evaluate higher levels of system integration. A summary of the pre-production and prototype IC performance compiled from Table II , along with the SVX3D data for reference.
V. CONCLUSION SVX4, a new readout IC for HEP experiments at the Tevatron collider, has been developed for use in the Run IIb silicon upgrade. The device was specifically designed to achieve the higher total-dose radiation tolerance requirement of 20 MRad. Other performance improvements include lower ENC and higher SEU tolerance compared to previous SVX generations. SVX4 meets all design specifications, was completed in just two iterations, and has been approved for production by the D0 and CDF experiments at Fermilab. Given the flexibility of the design and its ubiquitous feature set, it is anticipated the IC could also be useful for other strip detector readout applications.
