Abstract-The ARAGORN front-end offers high-performance readout capabilities for state-of-the-art high-energy physics experiments. The design constraints aim to develop a cost-efficient Time-to-Digital Converter (TDC) platform with considerable channel density at smallest form factor possible. Four Xilinx Artix-7 FPGAs implement 384 input channels on a single frontend board with an average time resolution of 165 ps, allowing for precise time-of-flight or drift-time measurements. Another Artix-7 FPGA acts as data concentrator and masters the communication with auxiliary board components. An optional multichannel optical transceiver slot can be employed to interconnect with up to seven boards through a star topology. This design feature makes it possible to read out eight boards yielding 3072 input channels via a single optical fiber at a bandwidth of 6.6 Gb/s.
I. INTRODUCTION
N OWADAYS applications in high-energy physics experiments challenge the existing readout systems with increased input rates, long trigger latencies or even triggerless data acquisitions. In order to deal with these requirements the ARAGORN front-end has been developed, aimed to be operated at the COMPASS experiment [1] at the CERN Super Proton Synchrotron (SPS).
Implementing precision time digitizer circuits with fully configurable, low-cost FPGAs has attracted grater interest from the high-energy physics community for some time. The advantages of this approach are obvious, allowing to adapt to widespread customized applications and thus to instrument facilities in a very cost-effective way.
The ARAGORN front-end (figure 1) provides a competitive layout comprising 4+1 Xilinx Artix-7 (xc7a200t) FPGAs. The TDC application combines four FPGAs processing 384 input channels on a single module. The input signals are linked to four high-speed SMD connectors on the solder side, providing an interface for extension boards. The timestamps acquired with the TDC-FPGAs are passed on to the fifth FPGA acting as data hub and generic board master.
This project aims to implement an outstanding optical readout scheme, leveraging both an SFP+ transceiver slot for data output and a multi-channel CXP transceiver socket, to interconnect with seven ARAGORN boards as satellites using an optical breakout cable (figure 2). The optical transceiver modules again interconnect with the high-speed transceiver tiles of the central FPGA. The star topology network permits eight boards thus 3072 TDC channels to be concentrated and read out via a single optical fiber. Thanks to its pluggable implementation, the costly CXP module is only required for the master board application. Setting up the multi-tiered frontend arrangement does not require any modification of the board layout. II. TIME DIGITIZER Our design processes 96 input channels with a single Artix-7 FPGA. The block diagram of a TDC channel is shown in figure 3 to guide the reader through the technical description. The time digitization of the incoming hits is accomplished by sampling the state of the input signal with a set of eight edge-triggered flip-flops. The associated register is driven by a multiphase clock, activating the flipflops in consecutive order. Employing two MMCMs 1 inside the FPGA fabric for frequency synthesis and phase alignment, no further calibration of the converter circuit is required. The sampling clock frequency matches an integer multiple of the COMPASS reference clock. Accordingly, a quantization bin size LSB = 1/(311.04 MHz × 8) is obtained.
Prior to the encoding step, the register outputs are synchronized to a single clock domain. The encoded fine time corresponds to the binary value of the bin number a hit was detected in the register. The user may select between leading, trailing or both edge sensitivity during operation. A clock counter counting up clock periods delivers the coarse time to extend the dynamic range of the TDC. The timestamps combining the result of the fine time measurement with the coarse time tag are stored in (2k x 18) dual-port hit buffers. The entire digitization and readout process is dead time free. The benchmarks of the developed TDC firmware are summarized in table I. In modern collider or fixed-target experiments, preprocessing of the raw data frames as early as on the frontend boards is mandatory. Therefore, our design implements an advanced trigger matching feature to select only such hits from the hit buffers which are time correlated to the trigger primitives. The time of trigger arrival is measured with coarse counter precision and a programmable latency time is subtracted to account for the trigger generation and distribution delay. The corrected trigger time defines the lower limit of the acceptance window. The upper limit is given by a configurable gate time depending on the drift-time or time-of-flight in the detector. The digitized parameters of the acceptance window are buffered in FIFO primitives together with an identifier tag. 1 The search mechanism copies the timestamps that coincide with the selective time window from the hit buffer to an output FIFO. The process is completed by relocating the memory read pointer to the first entry in the acceptance window, discarding earlier timestamps which are no longer relevant for future trigger events. If for longer periods no trigger is received, the write pointer may catch up with the search start address pointer, asserting the buffer full flag. In that case, new hits would be lost. In order to increase the processing performance and to prevent overflow conditions in the hit buffers, artificial triggers are generated at regular intervals. Unlike for real triggers, no data is written to the output FIFOs. The sole purpose of the fake triggers is to clean-up the hit buffers from old hits by updating the initial read address pointer the search process is supposed to start for the next trigger event.
Mixed-Mode Clock Manager
The processed event data is merged into a single data stream and combined with the associated trigger labels for later event building. A high-speed serial link interface transfers the data packages to the central FPGA for data output via the optical transceiver network.
III. CONSTANT-LATENCY LINK
The up-link of the star topology network distributes the trigger primitives and a reference clock signal. Measuring time intervals between different detector channels requires that the reference clock shows a predictable phase offset between all front-end boards after a power cycle, reset or a loss-of-lock. Similarly, deterministic data distribution delay is essential to synchronize system timing. In the standard configuration, the GTP transceivers [2] embedded in the Artix-7 FPGA fabric do not comply with these prerequisites.
Synchronous trigger and clock distribution systems based on FPGA transceiver links are of great interest to the high-energy physics community. A comprehensive description of the design obstacles is given in [3] . Besides, the star topology network requires synchronous retransmission of the up-link data from the master front-end to all connected slave nodes as outlined in section I. The strict jitter specifications of the GTP transmitter clock source prevent the unfiltered CDR 2 clock to be directly employed as a reference. Thus, our design integrates an onboard jitter attenuator (Texas Instruments LMK04906), providing the transceiver tiles and the TDC application with a cleaned version of the reference clock.
Latency variations arise at the receiver SIPO 3 block between the serial CDR clock and the low-speed recovered clock domain. The frequency division to produce the parallel data clock implies potential phase variations and thus variable latency in the data path. Implementing a constant latency link requires to select always the same phase of the recovered clock. Another source of latency variations may be the alignment of the serial data stream to the correct word boundaries. Word alignment is achieved by adjusting the phase of the parallel recovered clock only for an even number of bit shifts. Otherwise, a bit shift in the parallel data is introduced. In fact, neither the integrated alignment circuit nor the manual bit-slip operation controlled by user logic guarantees deterministic latency. In order to take full control over the alignment operation, we implemented the comma detection logic and the 8b/10b decoder inside the FPGA fabric. The process then issues a reset to the GTP until the up-link is correctly aligned. This approach terminates quickly as in the link idle state the alignment sequence is transmitted frequently.
