328 research outputs found
GPU-based Real-time Triggering in the NA62 Experiment
Over the last few years the GPGPU (General-Purpose computing on Graphics
Processing Units) paradigm represented a remarkable development in the world of
computing. Computing for High-Energy Physics is no exception: several works
have demonstrated the effectiveness of the integration of GPU-based systems in
high level trigger of different experiments. On the other hand the use of GPUs
in the low level trigger systems, characterized by stringent real-time
constraints, such as tight time budget and high throughput, poses several
challenges. In this paper we focus on the low level trigger in the CERN NA62
experiment, investigating the use of real-time computing on GPUs in this
synchronous system. Our approach aimed at harvesting the GPU computing power to
build in real-time refined physics-related trigger primitives for the RICH
detector, as the the knowledge of Cerenkov rings parameters allows to build
stringent conditions for data selection at trigger level. Latencies of all
components of the trigger chain have been analyzed, pointing out that
networking is the most critical one. To keep the latency of data transfer task
under control, we devised NaNet, an FPGA-based PCIe Network Interface Card
(NIC) with GPUDirect capabilities. For the processing task, we developed
specific multiple ring trigger algorithms to leverage the parallel architecture
of GPUs and increase the processing throughput to keep up with the high event
rate. Results obtained during the first months of 2016 NA62 run are presented
and discussed
Seeing Shapes in Clouds: On the Performance-Cost trade-off for Heterogeneous Infrastructure-as-a-Service
In the near future FPGAs will be available by the hour, however this new
Infrastructure as a Service (IaaS) usage mode presents both an opportunity and
a challenge: The opportunity is that programmers can potentially trade
resources for performance on a much larger scale, for much shorter periods of
time than before. The challenge is in finding and traversing the trade-off for
heterogeneous IaaS that guarantees increased resources result in the greatest
possible increased performance. Such a trade-off is Pareto optimal. The Pareto
optimal trade-off for clusters of heterogeneous resources can be found by
solving multiple, multi-objective optimisation problems, resulting in an
optimal allocation of tasks to the available platforms. Solving these
optimisation programs can be done using simple heuristic approaches or formal
Mixed Integer Linear Programming (MILP) techniques. When pricing 128 financial
options using a Monte Carlo algorithm upon a heterogeneous cluster of Multicore
CPU, GPU and FPGA platforms, the MILP approach produces a trade-off that is up
to 110% faster than a heuristic approach, and over 50% cheaper. These results
suggest that high quality performance-resource trade-offs of heterogeneous IaaS
are best realised through a formal optimisation approach.Comment: Presented at Second International Workshop on FPGAs for Software
Programmers (FSP 2015) (arXiv:1508.06320
Canadian Hydrogen Intensity Mapping Experiment (CHIME) Pathfinder
A pathfinder version of CHIME (the Canadian Hydrogen Intensity Mapping
Experiment) is currently being commissioned at the Dominion Radio Astrophysical
Observatory (DRAO) in Penticton, BC. The instrument is a hybrid cylindrical
interferometer designed to measure the large scale neutral hydrogen power
spectrum across the redshift range 0.8 to 2.5. The power spectrum will be used
to measure the baryon acoustic oscillation (BAO) scale across this poorly
probed redshift range where dark energy becomes a significant contributor to
the evolution of the Universe. The instrument revives the cylinder design in
radio astronomy with a wide field survey as a primary goal. Modern low-noise
amplifiers and digital processing remove the necessity for the analog
beamforming that characterized previous designs. The Pathfinder consists of two
cylinders 37\,m long by 20\,m wide oriented north-south for a total collecting
area of 1,500 square meters. The cylinders are stationary with no moving parts,
and form a transit instrument with an instantaneous field of view of
100\,degrees by 1-2\,degrees. Each CHIME Pathfinder cylinder has a
feedline with 64 dual polarization feeds placed every 30\,cm which
Nyquist sample the north-south sky over much of the frequency band. The signals
from each dual-polarization feed are independently amplified, filtered to
400-800\,MHz, and directly sampled at 800\,MSps using 8 bits. The correlator is
an FX design, where the Fourier transform channelization is performed in FPGAs,
which are interfaced to a set of GPUs that compute the correlation matrix. The
CHIME Pathfinder is a 1/10th scale prototype version of CHIME and is designed
to detect the BAO feature and constrain the distance-redshift relation.Comment: 20 pages, 12 figures. submitted to Proc. SPIE, Astronomical
Telescopes + Instrumentation (2014
Firmware and gateway for the ACE1 reconfigurable accelerator card
This thesis describes the continued work on the in-house designed FPGA based co-processor daughtercard referred to as ACE1. The aim: to create an ecosystem incorporating firmware, bootstrapping code, drivers and a development environment to create a seamless environment. Challenges in setting up and debugging the interface that connects the coprocessor daughtercard to the host server include: problems with the power network, the edge connectors and timing problems with the primary protocol which prevented host-based communications. The options include allowing the daughtercard to function in a stand-alone fashion and we present a gateware solution that allows users to select from a number of alternatives for each of the layers in the Open Systems Interconnect networking model
Virtualized FPGA accelerators for efficient cloud computing
Hardware accelerators implement custom architectures to significantly speed up computations in a wide range of domains. As performance scaling in server-class CPUs slows, we propose the integration of hardware accelerators in the cloud as a way to maintain a positive performance trend. Field programmable gate arrays (FPGAs) represent the ideal way to integrate accelerators in the cloud, since they can be reprogrammed as needs change and allow multiple accelerators to share optimised communication infrastructure. We discuss a framework that integrates reconfigurable accelerators in a standard server with virtualised resource management and communication. We then present a case study that quantifies the efficiency benefits and break-even point for integrating FPGAs in the cloud
- ā¦