Abstract-We developed a field-programmable gate array (FPGA) TDC module fo r the tracking deviees of Fermilab SeaQuest (E906) experiment, including drift ehamber, proportional tube and hodoscope. The requirement of time resolution is better than 4 ns. This 64-channel TDC is made of a 6U VMEbus unit equipped with a Microsemi ProASIC3 Flash based FPGA of low power and high radiation tolerance. The new FPGA firmware (Run2-TDC) is designed to reduce the data volume and data aequisition (DAQ) dead time. The firmware digitizes multiple input hits of both polarities while allowing users to turn on a multiple-hit elimination logic to remove after pulses in the wire chambers and proportional tubes. A scaler is fitted into the firmware of each channel to allow fo r recording the number of hits in its respeetive channel. The TDC resolution is determined by internal cell delays of 450 ps, and a measurement preeision of 200 ps has been aehieved. As a demonstration of constructing the Wave Union TDC using an existing multi-hit TDC without modifying its firmware, external wave union launehers are utilized in our test to improve the TDC's measurement precision. Measurement precision is nearly halved (108 ps) in data based on a fo ur-edge wave union. An even larger improvement of measurement precision (69 ps) has been reached by combining the approaches of Wave Union TDC and multiple-channel ganging.
(FPGA) TDC module fo r the tracking deviees of Fermilab SeaQuest (E906) experiment, including drift ehamber, proportional tube and hodoscope. The requirement of time resolution is better than 4 ns. This 64-channel TDC is made of a 6U VMEbus unit equipped with a Microsemi ProASIC3 Flash based FPGA of low power and high radiation tolerance. The new FPGA firmware (Run2-TDC) is designed to reduce the data volume and data aequisition (DAQ) dead time. The firmware digitizes multiple input hits of both polarities while allowing users to turn on a multiple-hit elimination logic to remove after pulses in the wire chambers and proportional tubes. A scaler is fitted into the firmware of each channel to allow fo r recording the number of hits in its respeetive channel. The TDC resolution is determined by internal cell delays of 450 ps, and a measurement preeision of 200 ps has been aehieved. As a demonstration of constructing the Wave Union TDC using an existing multi-hit TDC without modifying its firmware, external wave union launehers are utilized in our test to improve the TDC's measurement precision. Measurement precision is nearly halved (108 ps) in data based on a fo ur-edge wave union. An even larger improvement of measurement precision (69 ps) has been reached by combining the approaches of Wave Union TDC and multiple-channel ganging.
I. I NTRODUCTION
T HE TDC is packed on a 6U VMEbus module fo r drift chamber and proportional tube signal digitization, as shown in Fig. l(a) . The time measurement is based on the delay of logic elements (VersaTile) inside the ProASIC3 FPGA (A3P I000). During the first run of the Fermilab SeaQuest experiment, we were using a TDC made by the same hardware but using diffe rent FPGA Firmware, which was modified fr om a latch card firmware. There was no zero suppression, and the time resolution is 2.S ns. In order to reduce the data volume and DAQ deadtime, the Run2 -TDC is developed.
Not only does the Run2-TDC have naturally applied zero suppression, it also comes with a higher resolution of 4S0ps. 978-1-4799-0534-8/13/$31.00 ©2013 IEEE A scaler is fitted in each channel to allow fo r recording the number of hits in its respective channel. A scaler is a very powerful tool fo r deadtime calculation and debugging. The Run2-TDC allows a user to record multiple hits, as well as apply multiple-hit elimination within a user-selected time window. There is an updating mode and a non-updating mode fo r multiple-hit elimination. The block diagram of the Run2-TDC firmware is shown in Fig. l(b) . The input hit in each channel propagates in a delay chain of 9 taps with a nominal delay of 4S0 ps/tap as shown by Delay9ph in Fig. l(b) . The delay pattern is registered every clock cycle at 2S0 MHz and encoded into the fine time code. If a valid hit is detected, the fine time code (a hex value from 0 to 8) along with the coarse time count (with a least significant bit equal to 4ns) are temporarily stored in the first layer buffer (pipe4) with up to 4 hits per channel. The stored hits are constantly read out at 62.S MHz and are put into a block of memory shared by a group of 4 channels. The memory is organized as 2, 4, or 8 pipelines (circular buffers) with user selectable lengths of 2048, 1024 or S12 ns to store hits waiting fo r a trigger. When a trigger arrives, the writing pointer is pointed to the address of the next circular buffer and the current circular buffer is read out, copying valid hits into the event buffer. During the read out time or copy in progress (CIP) time, the users are allowed to select only hits within a predefined time-window to accommodate various detectors such as different sized chambers with diffe rent drift time. The hit data stored in the event buffer are read out by the DAQ system via the VMEbus interface.
The modular method is chosen to measure the measurement precision, the detail of this method will be introduced in Section II.
Benchmark tests show that the measurement precision of the time difference between two channels is better than 200 ps (RMS) after an appropriate bin-by-bin calibration.
External wave union launchers may improve the TDC measurement precision without modifying the FPGA firmware. The measurement precision is nearly halved just by using a fo ur-edge wave union.
By using 16 channels and external wave union launchers. An even larger improvement of measurement precision has been reached (69ps). Two diffe rent analysis methods fo r the external wave union launchers will be introduced at Section II. 
II. C ONCEPTS AND M ETHODS

A. Mu ltiple hit and Mu lti-Hit Elimination
Chambers with a large tube size may generate multiple pulses which can be studied using the multiple-hit detecting ability of the TDC. In real operation however, it is desirable to capture only the first transition edge of a hit while eliminating pulses within a pre-defined period of time to reduce total data volume. The multi-hit elimination function is implemented in the firmware and its parameters are controlled by a register accessible via the VMEbus.
There is a 250MHz clock used by a 6-bit counter which counts up to the user-selected elimination time window. The user may choose to tum on multi-hit elimination or not. If the multi-hit elimination is on, the user-selected multi-hit elimination time window will be applied. The Run2-TDC will record the first hit, and then subsequent hits within the multi hit elimination time window in its respective channel will not be recorded. The multi-hit elimination time window may be chosen as any value between 16 ns to 272 ns using a 4 ns step.
There are also two modes of multi-hit elimination, a non updating mode and an updating mode. The user-selected multi-hit elimination time window begins after the first hit. During the non-updating mode, no matter how many hits occur within the multi-hit elimination time window in its respective channel, the duration of the window won't change, and the hits within the window will not be recorded into the buffer. During the updating mode, every time when this channel receive more hits within the multi-hit elimination time window in its respective channel, the multi-hit elimination time will start over again, in other words, the multi-hit elimination time window is extended because TDC receives more hits within the multi-hit elimination time window.
B. Scaler
In almost all applications, it is desirable to implement a scaler fo r each TDC channel as a tool fo r deadtime estimation or quick detector checking. The conventional method of implementing a scaler consumes a large amount of FPGA silicon resource, so it is difficult to pair a scaler with each TDC channel. We developed a new scheme with reduced resource consumption that allows us to fit 64 scalers with the 64-channel. The resolution of each scaler is 16ns, and there is an 8-bit counter fo r each channel. These counters are synchronously read into the first layer buffer then reset. The 64 first layer buffe rs then pass scaler data one at a time, taking 2048 ns to read out all channels. Each of first layer buffers may record up to 128 hits. There is an adder before the 32-bit buffer of each channel to total the output fr om the 8-bit counters, so that the scaler can count up to 2 32
• There are 8 buffers fo r the 64 channels. The user may switch to a diffe rent scaler buffer by accessing a register accessible via the VMEbus. The ProASIC3 flash based FPGA fam ily is relatively slow. A frequency of 250 MHz fo r the delay pattern register clock is used to reduce delay line length and encoder size so that 64 channels can be fit into a low cost device. However, as many authors in this field have pointed out, it is very difficult to implement other logic blocks at the same 250 MHz, especially the coarse time counter.
To overcome this problem, we implemented the higher bits of the coarse time counter, TC[2-10], with 62.5MHz, and lower bits of it, TC[O-I], with 250MHz. By using the ll-bit coarse time counter, as shown in Fig. 2 , by generating a 4 ns pulse to trigger a 250MHz counter, one can get coarse time (TC[O-ID fr om the output of the 250MHz counter. The 4 ns pulser is made by an AND gate and three D-flip-flops (DF) with a 250MHz clock. The first flip-flop (DFO) catches the rising edge fr om TC[2], which is the bit referring to a value of 16ns. The second flip-flop (DF 1) catches output fr om the DFO, so it will be two clocks later in comparison with TC [2] . The inputs of the AND gate are DFO and !DF l. The output of the AND gate goes to the third flip-flop, and that is our 4 ns pulser. The 250MHz 2-bit counter uses the 4 ns pulser as a synchronous clear input (SCLR). The output of this 250MHz 2-bit counter goes to TC [O-I] , so that only two bits are running with a 250MHz clock.
D. Me asurement Precision.
Usually the time resolution of the TDC time is simply decided by the bin width of the fine time. However, in order to see how well the calibration of the bin width works, and to measure the precision of the External Wave Union, the Modular Method is chosen.
To perform this measurement, we use a commercially- available FPGA 6U VMEbus module as the pulser to generate signals to the TDC. There are two diffe rent kinds of input signals to the TDC. One is the hit signal, and the other is the stop signal. The TDC time of a channel is the time difference between that channel's hit signal and the stop signal. For each event, the measurement of the TDC time begins with a hit signal which is fo llowed by a stop signal with a step that increases by 7.2917 ns in each subsequent event.
By using the modular method, as shown by the equation on top right of Fig. 4 , t is calculated by the TDC time. One can fill t in a histogram in the 7.2917 ns window to see the distribution of t, which is a Gaussian distribution showing the resolution. The chi-square of Gaussian distribution is decreased 30% after bin-width calibration and a 191 ps time resolution can be achieved.
E. Test of the External Wa ve Un ion Launcher
A wave union launcher creates a pulse train so that the TDC can perform multiple measurements fo r a better preCISIOn. Wave union launchers implemented inside the FPGA require careful redesign of the TDC firmware. As a demonstration of implementing the Wave Union TDC using an existing multi-hit TDC, a very simple external wave union launcher is designed and tested. The launcher is constructed with a LeCroy 429A logic-fan-inlout unit and an open end delay cable, which is connected via a T connection to the main line as shown in Fig. 4. (a) . When a pulse is sent into the unit, a double pulse is generated due to cable reflection. The TDC records fo ur transition times of both the rising and fa lling edges of the two pulses fo r each input event. Two wave union launchers generate two double pulses which are fe d into two sets of TDC channels, 8 channels per set fo r digitization .
Test results of the Wave Union TDC are shown in Fig. 4 . (b) Arbitrary offsets are added to these histograms so that they can be plotted in a single picture. The histograms come from the same data, but they are analyzed using diffe rent methods. The distributions of the time differences of the two leading edges (TlA-TlB) are plotted in the left two histograms. The ones marked with "BeforeCali" and "AfterCali" are the results fo r before and after a bin-by-bin calibration, respectively. This is the case of a regular TDC digitizing only a single transition edge. With digitized times of more wave union edges used, measurement precision is improved.
The histogram marked with "NEdges" represents the difference of the average times of two trailing edges ((T2A+T4A)/2 -(T2B+ T4B)/2), and similarly "PEdges" represents ((Tl A+T3A)l2 -(TlB+T3B)/2). The leading edges have a fa ster propagation speed in the TDC delay line, and therefore the related measurement precision is better than that of the trailing edges. The "AI lEdges" histogram includes all fo ur edges in the average ((Tl A+T2A+T3A+T4A)/4 (TlB+T2B+T3B+T4B)/4) and has even better measurement precision, as expected. It can be seen that this simple fo ur edge wave union TDC improves measurement precision fr om about 200 ps to 108 ps without any change in the TDC firmware and without any significant system cost increase.
The histograms marked with "WU_Ave" and "WU_LUT" are results of combining data from all 16 channels (8 channels per set). Both of them simply get the difference between the averages of each set, as in TABLE I For each channel, WU_Ave takes an average of 4 hits of both signal and trigger hits, and a 68ps time resolution can be achieved. WU_LUT is made by a 2048 value look-up-table which is made by 11 bits (Table II) . The summation of the fine-time of 4 hits is recorded in the first 4 binary bits (0-4). The summation of each coarse time subtracts the coarse time of first hit that is recorded in the binary bits (5-10). A 2048 value look-up-table fo r a calibration of each bin size is made. By taking an average of both signal and trigger hits, a 67ps time resolution can be achieved. 
III. S UMMARY
The TDC firmware can digitize multiple hits of both polarities and possesses a multiple-hit elimination functionality which can be turned on to remove after-pulses in the wire chambers and proportional tubes. A scaler module is implemented to record the number of hits fo r each channel. This new firmware can reduce the SeaQuest readout dead time by more than a fa ctor of ten and will be crucial to allow the experiment to achieve its goals. Furthermore we demonstrate the way of constructing the Wave Union TDC using the current multi-hit TDC with external wave union launchers. Based on a fo ur edge wave union, the measurement precision is improved by a fa ctor of 2 to 108 ps. An even more significant improvement of measurement preCISIon, up to 69 ps, is achieved by combining the approaches of Wave Union TDC and multiple channel ganging.
A CKNOWLEDGMENT
Many thanks fo r Hsi-Hung Yao from Institute of Physics, Academia Sinica, Taiwan, fo r assisting us with the production test. We also thank Da-Shung Su and Wen-Chen Chang from Institute of Physics, Academia Sinica, Taiwan, fo r developing the TDC hardware. Many thanks to SeaQuest collaboration, especially David Christian, fo r providing the TDC requirements and a lot of valuable input while we developed the TDC firmware.
