The NA62 experiment at CERN SPS aims at measuring the Branching Ratio of the ultra-rare decay of the charged kaon into a pion and a neutrino-antineutrino pair. The expected value has been recently estimated within the Standard Model to be of the order of 10 −10 thus requiring a high intensity kaon beam which leads to an output event rate of the detectors of the order of 10 MHz.
The Ring Imaging Cherenkov (RICH) detector is a key element for particle identification in the NA62 experiment. As one of the subdetectors responsible for the first level trigger (L0), it identifies pions from muons in the momentum range from 15 GeV/c to 35 GeV/c and measures the particle arrival time with better than 100 ps resolution. It then provides the reference time and a fast signal for the L0 trigger system through a multiplicity count.
Much more information is however available from the RICH. Indeed, the features of the rings generated by charged particles crossing its volume would provide the velocity and the direction of the particle as independent measurements with respect to previous detectors. This information should be evaluated at the online trigger stage in order to allow for implementation of more selective trigger algorithms. However, information about the particle is available only after a first stage of ring identification. This latter would also need to be performed online, which requires very high-speed processing units.
General Purpose computing on Graphics Processing Unit (GPGPU) is nowadays widespread in scientific areas requiring large processing power such as computational astrophysics, lattice QCD calculations, and medical imaging. GPUs offer parallel architecture and the fact that most of the chip resources are devoted to computation. In fact, GPUs provide a huge computing power on a single device, thus allowing to take complex decisions with a signifcantly high speed capable to match valid event rates.
This project aims at exploiting the parallel computing power of a commercial GPU to perform fast pattern matching in the RICH detector for the L0 trigger of the NA62 experiment. In our approach, the ring-fitting algorithm is seedless, i.e., it is fed with raw RICH data, with no previous information on the ring position from other detectors. Moreover, since the L0 trigger is provided with a more elaborated information than a simple multiplicity number, it results in a higher selection power.
While in the applications for which GPUs have been originally developed, a low total processing latency is not of relevant importance, in this case the data transfer latency from the Network Interface Card (NIC) to the GPU and its stability in time become a very important issue.
Two methods have been studied and tested in order to reduce the data transfer latency, i.e., the use of a dedicated NIC device driver with very low latency (PFRING) and a direct data transfer protocol from a custom FPGA-based NIC to the GPU (NaNet). Here, results obtained through the usage of PFRING system will be shown, analyzed and compared to the NaNet approach.
