A digital VLSI device for silicon-strip detector read-out has been developed which supports 128 channels as well as providing amplifier calibration signals. This device provides buffering for 256 machine cycles of the hit-pattern of the strips as sensed by multi-channel amplifier/discriminator chips. A second-level buffer is provided. Testability, and alignment features are also provided. The device has been manufactured in a radiation hardened CMOS process. The power demand is 0.21 mW per channel at normal operating conditions. Maximum operating frequency is above 60
I. General
The rea.d-out chip functions logically as a storage pipeline(figure 1). The hit pattern from the amp chip is sampled every clock cycle, and passes through the pipeline at the clock rate. After the trigger latency period, a record of the hit-pattern emerges from the far end of the pipeline, and is loaded into the level-2 buffer if a trigger occurs. When detector read-out is initiated, the contents of the level-2 buffer are loaded into a result register which is then shifted onto the DAQ bus in a coordinated manner.
Several additional functions have been added to the basic pipeline architecture. The accumulator register marks any channel that has become active since the last cycle that the device was reset. This is useful for detector alignment using columnated X-rays. A data-path bypass is implemented which allows the pipeline to be filled through the DAQ bus rather than from the amplifier chip, for testing the bus and the chip. A circuit on the input of the pipeline can be programmed to filter all but the first one or two clocks of an active channel. This is used for l-supression and time-walk correction. Additionally, the chip provides a 'timestamp' of the pipeline stage being read-out, which is useful as a debugging utility for system timing.
The pipeline depth is 256, and the width is 128 bits. At GO MHz, this provides a storage time of 4 psec. The level-2 buffer can store up to 9 events. The physical architecture A digital read-out chip has been manufactured for use in silicon-strip detector experiments. The architecture implements a clock-driven pipeline (CDP128) in a CMOS VLSI chip, similar to the earlier CDP64 being used in NA50[2], and DTSC being used in ZEUS LPS [l] . It is meant to be used with an aniplifier/discriininat,or analog VLSI chip which provides a hit/no-hit signal for each strip every machine cycle. The hit-pattern is provided in parallel to the CDP chip via chip-to-chip bonds at-the strip pitch. The silicon detector, and many amplifier/CDP128 pairs reside on a hybrid near interaction-region, with the DAQ bus leading to a controller away froin the beain-line(figure 2). The CDP128 provides the following primary functions: makes use of a RAM block and an address register. The address register points to the 128-bit word in the RAM block that can be written and read during a given clock cycle. During data taking, the chip is instructed to write into the RAM ancl increment the address pointer every clock. When the address register reaches a count of 256, it autoniat,icly resets itself on the following clock cycle. When the trigger arrives, the hit-pattern from the appropriate tinie-bin is moved into the level-2 buffer and eventually read out through the bus interface.
Local generation of calibration pulses for the amplifier chip can be produced in response to a cont<rol command, along with a two-bit calibration code for the amplifier. The calibrat,ioii code selects one of four possible calibrat,ion pat-0 Storage of Hit Pattern for Trigger Latency Period 0 Second-level buffering of Level-1 tagged events 0 Serial DAQ Bus Interface 0 Channel Activity Histogram for Alignment 0 Device and System Testability * Calibration Pulse Generation for Amplifier 0 l-Supression for Edge Detectmion terns for the amplifier chip. When-the CDP128 receives a calibrat,ion command, it. raises the differeiitial calibration signal and lowers it aft,er 128 clock cycles. This is useful because the edge-shape of the calibration pulse affects the behavior of the amplifier, and calibration signals would be badly distorted by the 20-meter DAQ bus from the countiiig house. Control and read-out of tlie chip occurs serially, so that the DAQ bus can be implemented with optical fiber. The system requires four fibers to operate a large number of CDP128 chips, and future revisions will reduce this to three. The CDP chip has 150 signal pads, grouped into various busses. These are itemized as follows:
Combined level 1 trigger and control sig- The CDP128 receives a combined Levell-trigger and Control signal. If the level1 trigger remains true for inore than three clock cycles, the next series of signals on this line are interpreted as a conimand word. The command word triggers special actions wit,hin the chip, such as reset and the calibration pusle. Values on tlie 128 input pads arc sampled every clock cycle, and the command word determines how this data will be dealt with.
The test,-niode signal causes the chip to substitute the input data word with tlie value in the input test. register. The contents of this register are set by the controller through the DAQ bus. This allows tlie chip to be operated without a companion amplifier chip, for either system testing or production testing. When TEN=l, the chip behaves as if 
i[127:124]=not(TP[S:O]).
Data-taking mode causes tlie pipeline to be filled with an unfiltered version of the hit-pattern from the a.mplifier. The edge-mode signal EEN causes the chip to filter the input word through 128 parallel timer circuits. The period of the timer circuits is determined by the Itime reference current. If a channel is inactive, a zero is recorded in the RAM pipeline. If a channel is active and has been active for less than the timer period (acceptance window), a hit is recoded in the pipeline. If t.he channel remains active longer than this period, no more hits are recorded until that channel has deactivated and reactivated. The Itinie pad self-biases to a current corresponding to the nominal 20 ns acceptance window. If a different period is needed, an offchip Itinie current can be supplied through the bond-pad. (bugnote: Pola.rity of EEN is reversed. EEN=O causes edge-mode)
The bus interface is a token-passing data-push style with a differeiitia.1 output bit-pair D [&I. It, is synchronous, and runs at tlie same clock rate as the pipeline. The signals involved are tlie LAST a.nd NEXT token passing signals, differential Red-Strobe RS [k] signal, a,nd the Data signal (D). When the chips receive a read-strobe pulse, they load their result registers from the level-2 buffer. A chip can then unload its result register through its data port when it, gets a. read-out fla.g. Chips on a single hybrid have a daisy-chained read-out flag, which enters a. chip through a LAST pad on the lower side of the chip and exits through a NEXT pad on the upper edge, which is bonded to the LAST pa.d of the previous chip. When the controller sends a. reset coninland, all chips assert false on their NEXT pads. The LAST pad on t,he first chip in the chain can be wired high or provided from the controller. If LAST=true and a chip has not yet read out, it, shifts its result register out its D pad to the fiber LED driver. Once a c.liip is finished shifting out, its data, it tristates the D signal and raises its NEXT signal. The next chip in the chain will then hea.r LAST=true, and assert its data ont,o the D signal.
The output, drivers of tlie chip have an output impedance of about 500 R. This is intended to accommodate 50 R terminatecl cable with 0.4 volt signal levels, for low-noise and high bandwidth at moderate power. When not addressed, the output. drivers are in high-impedance state.
All run-time control and bus signals are differential. Reduced voltage logic levels with as little as 0.125 volt swing can be used to minimize system noise. Testing and configuration signals (TEN, EEN, TP[3:0] ) are single-ended CMOS signals in this implementation. LAST and NEXT are also single-ended CMOS, because they are chip-to-chip wires and have slew-rate limiting drivers.
The CDP128 receives the 128-bit data word from the amplifier chip by means of current-mode signals(figure 3). The arrangement can provide up to 100 Mbit/second ba.ndwidth through each chip-to-chip bonding wire. A current signal is received froin the amp chip iiito a moderately iow input impedance receiver (approximately 500R). Current flows from the CDP supply volta.ge through a transistor in the amplifier chip, and back to the CDP through a ground- Due to the use of dual-ported RAM to implement t.he pipeline, the chip has no dead time. 111 this architecture, there is no occupancy-dependent loss of efficiency as might be the case for a CAM-based pipeline. Low-noise bus architecture allows data acquisition t,o occur during dat,a-taking. Digital noise coupling into the front-end a.mplifiers seems to be less than 5% of the typical amplifier input signal.
The architecture requires about 400K transistors. The drawn die size is ((5.0 X 5.25) "n, and the physical size is (6.2 X 5.45) nirn' X 400 microns. The device has been manufactured in Honeywell's RICMOS-IV radiat,ionhardened 0.8 micron 3-metal single-poly n-well CMOS process. Radiation tolerance of the CDP128 to gamma, proton, and neutron exposure is being measured. The design can also be manufactured through MOSIS, using HP's 0.8 micron 3-metal process. A combined aniplifier/CDP chip ha3 just entered production through MOSIS using the SVX CMOS amplifier developed at LBL. This chip will allow measurement of digital noise coupling through the chip substrate. If t.he noise level is not raised significantly, such a mixed-mode chip would allow the reduction of one wire-bond per channel.
Operation and Programming
When an output cycle is initiated (LASTdrue, RS pulse has been received), the chip drives out its result register by putting a new bit. on the bus every clock cycle. The result register is 13G bits wide (128 bit hit-pattern and 8 column-code bits), so a rea. The CDP128 responds to a small number of control commands. These are encoded as unique sequences of pulses on t,he T/C signal. If the T / C signal is sampled true for between one and three clocks, it is interpreted as a level-1 trigger. If T/C is sampled true for four consecutive clock cycles, t,he value of T/C over the next, four clock cycles is decoded as a. command a.nd trigger a.ct,ivity is suppressed in tlie chip. Sending a command to t3he chip therefore requires 8 clock cycles. The comma,nd sequences a.re as follows:
... Note that Imause the 'T/C signal niust be filtered by 4 clocks to det8ecta a coniinand, its effectq as a trigger appears to be delayed by 4 clocks with respect to the data and other control signals. This is why the L1 t,rigger and RS pulse are separat,ed by four clocks in tlie following program. Command sequences must, be separated by at least one cycle with T/C=false. This nieans t,hat two consecutive commands require a total of 17 clock cycles to be transmitted.
A short program to write 00 into the calibration-code register and then 11 into this register is givcn below. Three commands are sent: a rcset followed by two register writes. Note that-since the pipeline RAM is actually 257 columns long, t8he coluinn-code for t.he last column in the RAM is 01010101. The col~im~i-code for the arcumulator register is 10101010. These two column-codes are not unique, corresponding to two locations in the pipeline.
In this architecture, the operating current is influenced primarily by the length of the pipeline, the supply voltage, and the operating frequency. Figure 4 shows the chip's current draw vs. frequency at various supply volt,ages. At the nominal conditions of 4V supply and 40 MHz, the chip will require 160 pW for zero occupancy and trigger rate. Each active channel will require an additional 50 p h . The out>put bus requires 8 niA when act8ive. In a typical operating situation, with 0.1% occupancy and and a chip reading out at an 10% duty-cycle, the the per-channel power will be 210 pW. If the chip were built with a 128-stage pipeline for an experimeilt with shorter trigger latency, the power could be expected to decrease to 150 pW per channel.
