An experiment to measure an invariant mass of φ mesons in nuclear medium is planned as the J-PARC E16 experiment. A trigger merging module (TRG-MRG) has been developed to detect leading-edges from 256 channels of discriminator-output signals and transmit those serialized hit data to trigger decision module with four optical links. The result of the test shows enough performance of the TRG-MRG as 1-ns time-to-digital converter (TDC) and data multiplexer with four 6.25 Gb/s transceivers. Index Terms-Field-programmable gate array (FPGA), J-PARC E16, serial communication, time-to-digital converter (TDC), trigger, Xilinx Aurora protocol.
a duration of 2 s, to nuclear targets at the high momentum beam-line at J-PARC. We measure φ mesons in the electronpositron decay channels and reconstruct the invariant mass. Fig. 1 shows a top view of the spectrometer. Four types of detectors are used for the momentum measurement of electron/positron. From an inner side, silicon-strip detectors (SSD) and three layers of GEM trackers (GTR 1, 2, 3) are located in the strong magnetic field for flight-path detection and momentum reconstruction [3] . The electron/positron identification is performed with hadron-blind detectors (HBD) and lead-glass (LG) calorimeters [4] . The number of readout 0018-9499 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
A. Setup of Spectrometer
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. TABLE I REQUIREMENT TO THE TRG-MRG channels is 112 996, and the waveform data from all types of detectors are taken to solve piled up signals. The waveform data are buffered with modules using APV25-S1 chips [5] and DRS4 chips [6] with the buffering-time of 4 and 2 μs, respectively [7] . Therefore, the required latency for the trigger signal is less than 2 μs.
B. Trigger System
For the trigger generation, discriminator-output signals from GTR3, HBD, and LG are used. The number of trigger channels is 2620. The maximal single rate is expected to be typically 1 MHz/ch, and the minimal width of the discriminator-output signals is 3 ns. Therefore, the sampling time must be less than 3 ns. The overview of the trigger system is shown in Fig. 2 . The trigger primitive signals from GTR3 and HBD are picked up from the cathode plane of the induction gap of GEM chamber by an Amplifier-Shaper-Discriminator (ASD) developed for the experiment [8] , [9] . The signals from LG is discriminated by DRS4 analog-to-digital converter/time-todigital converter (TDC) [10] . In the trigger merging modules (TRG-MRGs), leading-edges are detected and serialized data of them are transmitted to a trigger decision module by optical transceivers. Belle II Universal Trigger board 3, that has 16 QSFP+, is used for the trigger decision module [11] . Finally, the trigger signal is distributed to readout modules by Belle II Frontend Timing SWitch [12] .
The latency before the TRG-MRG is estimated to be 600 ns, mainly due to the drift time of GTR3. Out of the allowed latency of 2 μs, GTR3 takes 600 ns due to its drift time. Therefore, we design the latency of the rest of the components as follows: the detection of the leading-edge and transmission to take 500 ns, trigger decision to take 500 ns, and trigger distribution to take 300 ns. With the design value, the total latency including the drift time of GTR3 becomes 1900 ns.
The requirement to the TRG-MRG is summarized in Table I . zanine card has four 32 channels low-voltage differential signaling (LVDS) receivers and converters from LVDS to 1.8-V low-voltage complementary metal-oxide-semiconductor format and is replaceable according to the formats of the input connectors. In the mainboard, two field-programmable gate arrays (FPGAs) (Xilinx Kintex-7 160T-2 and Xilinx Spartan-3 50AN-4), two crystal oscillators of 125 MHz, and eight SFP+ transceivers are installed [13] , [14] . The 1-ns and 256-channel multi-hit TDC and 6.25-Gb/s and four-lanes GTX transceivers are implemented in the Kintex-7 by Vivado2017.2 provided by Xilinx. The channel reduction from 2620 channels to maximal 64 optical transceivers is realized by using about 15 TRG-MRGs.
A. Firmware
The diagram of the firmware implemented in the FPGA is shown in Fig. 5 . The firmware of TDC and transceiver sections is explained in the following paragraphs.
1) TDC Section: The TDC section consists of deserializers, edge detectors, delay controllers, and hit buffers. The input signals are sampled with 1 ns by 500 MHz Double Data Rate (DDR) deserializers of Vivado IP core, ISERDESE2. The component converts 1 Gb/s to four 250 Mb/s. In the edge detector, leading-edges are detected from each 4 bits data. To calibrate the intrinsic time difference among channels, delays are added in the delay controller. The component is implemented by using RAM-based shift register to save the number of flip flops and able to delay the data of each channel up to 1024 ns with a 4-ns unit. If the leading edges are detected, the hit timing and channel number data are buffered in the hit buffer. Maximal eight hits data are buffered for 64 ns in every 64 channels. The efficiency of event transfer with this criteria is discussed in Section III-G. The data to the transceiver section have the width of 32 bits and are output during 5 cycles in every 64 channels.
2) Transceiver Section: The transceiver section consists of FIFO and Aurora8B/10B protocol [15] . The Aurora8B/10B is a link-layer protocol for high-speed serial communication. As the clock frequency in the protocol is determined from the line rate and lane width of the transceiver, FIFO is installed for clock domain crossing. In the Aurora transmitting, the 32-bit/lane data are encoded to 40-bit/lane data and are serialized. The data are deserialized and decoded at the stage of data receiving.
III. PERFORMANCE EVALUATION
Items of evaluated performance are listed as follows: 1) time resolution; 2) integral nonlinearity (INL); 3) differential nonlinearity (DNL); 4) minimum pulsewidth; 5) double pulse separation; 6) latency; 7) transfer efficiency.
A. Time Resolution
The time resolution was evaluated by inputting two signals with a fixed delay to two channels of the TRG-MRG as illustrated in Fig. 6 . LEMO cables and a fixed delay module (KN1651) are used to add delay. The time difference of the output from the TRG-MRG was measured as shown in Fig. 7 . The time resolution is defined from the distribution as σ/ √ 2, σ is defined as the standard deviation of the distribution. Even if the TDC has no clock jitter, the time differences distribute at least two least significant bits (LSBs), which is called a quantization error. The quantization error depends on the remainder of (time difference)/LSB, represented as t in in this paper, as (t in (1 − t in )) 1/2 . The measured distribution is in good agreement with the expected quantization error as shown in Fig. 8 . The time resolution of better than 0.35 ns are obtained.
B. Integral Nonlinearity
The INL was estimated by the same data described in Section III-A. Fig. 9 shows the relation between input time difference and output time difference. By fitting the measured points as At+ B and calculating the residual between the measured points and the fitting line, the INL was estimated as the maximal value of the residual of [−0.04 LSB, +0.04 LSB] (Fig. 10) . The effect of INL turned out to be negligible for the performance. 
C. Differential Nonlinearity
In the TRG-MRG, the DNL is expected to be originated from the deserializer. The accuracy of the output from the clock generator and the skew of interconnection length in the deserializer make the DNL worse. As mentioned above, the input data are deserialized to 4 bits in each 4 ns. Therefore, the DNL is expected to have a periodicity of 4 ns. The DNL measurement was performed by the code density test with a clock with the period of 80.008 ns. The edges of the input clock are expected to distribute with the interval of 0.008 ns into the expected periodicity of 4 ns. The distribution of those edges is shown in Fig. 11 . As expected, the 4-ns periodicity is seen. The DNL was estimated at [−0.022 LSB, +0.022 LSB], as shown in Fig. 12 . The effect of DNL turned out to be negligible for the performance.
D. Minimum Pulsewidth
As mentioned previously, the TRG-MRG must detect the narrow signal of 3 ns, expected in the experiment. By inputting such narrow signals, the detection efficiency was measured. As a result, it is understood that the TRG-MRG can detect signals of 1.0-ns width in 100% efficiency. Therefore, the performance of the TRG-MRG about minimum pulsewidth satisfies the requirement from the experiment. 
E. Double Pulse Separation
The double pulse separation was estimated by measuring the signal detection efficiency with changing the width between the trailing-edge of the first signal and the leading-edge of the second signal, shown in Fig. 13 . The TRG-MRG can discriminate two signals with the interval of 2.5 ns in 100% efficiency. From the result, the inefficiency due to the TRG-MRG is estimated to be 0.27% in the worst case.
F. Latency
The latency was evaluated in two sections, namely, TDC and transceiver sections as defined in Section II-A, separately.
1) TDC Section: The latency of the TDC section was estimated by using a logic simulator in Vivado. The latency before the TRG-MRG is 600 ns, which delays added by the delay controller is included. Including the buffering time of 64 ns, the latency of the TDC section is maximal 179 ns.
2) Transceiver Section: The latency of the transceiver section is measured by inputting the output of the 250-MHz counter to the FIFO and receiving the data passing Aurora and optical cable of 1 m (expected length). The latency is mainly defined from the necessary time for data receiving (deserializing and decoding). The obtained result is shown in Fig. 15 . The latency of 290 ns is obtained for 99.8% data, which is consistent with the result from a measurement with a logic analyzer in Vivado. On the other hand, the data of 0.2% have longer latency of 310 ns. It seems to be due to the busy signal of Aurora protocol for clock compensation. Clock compensation is the basic function of the Aurora protocol and outputs busy among 3 cycles (19.2 ns) in each 2500 cycles of the clock for Aurora of 156.25 MHz [15] . The results of the value and percentage of the increase of the latency are consistent with the expectation from clock compensation. As a result, the latency of the transceiver section is maximal 318 ns. 3) Total Latency: The latency of the TDC and transmission is estimated to be maximal 179 + 318 = 497 ns. It satisfies the requirement of less than 500 ns.
G. Transfer Efficiency
As mentioned in Section II-A1, in the TRG-MRG, the hit data are transferred according to the criteria that the maximal eight hit data are buffered for 64 ns in every 64 channels. A detector simulation with a simulator of the passage of particle, Geant4, was performed to estimate the transfer efficiency under the expected experimental condition [16] - [18] . In the experiment, the proton-beam intensity is 1 ×10 10 /pulse and the maximal single rate reaches 1 MHz/ch. For considering the microstructure of the beam intensity, instantaneous beam intensity distributes up to 2×10 10 /pulse. Fig. 16 shows the hit multiplicities for 64-ns time windows under the beam intensity of 1 × 10 10 /pulse (full line) and 2 × 10 10 /pulse (dotted line). The fraction of discarded hits depends on the beam intensity. The beam rate dependence of the transfer efficiency is shown in Fig. 17 . At the expected beam intensity of 1 × 10 10 /pulse (5 GHz), the transfer efficiency is expected to be 99.95%. Even at the intensity of 2 × 10 10 /pulse (10 GHz), the efficiency stays better than 98%. In conclusion, it is confirmed that he developed module meets the required performance. 
H. Summary of the Performance Evaluation
At the last of this section, the result of the performance evaluation is summarized in Table II . All the results satisfy the requirement of the experiment.
IV. CONCLUSION
The J-PARC E16 experiment is planned in order to investigate the partial restoration of breaking of chiral symmetry at nuclear density. To handle a massive number of trigger channels of 2620, the TRG-MRG has been developed. The TRG-MRG consists of one main board and two mezzanine cards and will be installed between discriminators and the trigger decision module. It works as 1-ns TDC and data multiplexer with four 6.25 Gb/s transceivers. From the results of the performance tests, for example, time resolution, latency, transfer efficiency, it is confirmed that the TRG-MRG achieves the requirement for the experiment which will be started in Japan Fiscal Year 2019.
