1,988 research outputs found
APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters
We describe herein the APElink+ board, a PCIe interconnect adapter featuring
the latest advances in wire speed and interface technology plus hardware
support for a RDMA programming model and experimental acceleration of GPU
networking; this design allows us to build a low latency, high bandwidth PC
cluster, the APEnet+ network, the new generation of our cost-effective,
tens-of-thousands-scalable cluster network architecture. Some test results and
characterization of data transmission of a complete testbench, based on a
commercial development card mounting an Altera FPGA, are provided.Comment: 6 pages, 7 figures, proceeding of CHEP 2010, Taiwan, October 18-2
NaNet:a low-latency NIC enabling GPU-based, real-time low level trigger systems
We implemented the NaNet FPGA-based PCI2 Gen2 GbE/APElink NIC, featuring
GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is
able to receive a UDP input data stream from its GbE interface and redirect it,
without any intermediate buffering or CPU intervention, to the memory of a
Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices
share the same upstream root complex. Synthetic benchmarks for latency and
bandwidth are presented. We describe how NaNet can be employed in the prototype
of the GPU-based RICH low-level trigger processor of the NA62 CERN experiment,
to implement the data link between the TEL62 readout boards and the low level
trigger processor. Results for the throughput and latency of the integrated
system are presented and discussed.Comment: Proceedings for the 20th International Conference on Computing in
High Energy and Nuclear Physics (CHEP
APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters
Many scientific computations need multi-node parallelism for matching up both
space (memory) and time (speed) ever-increasing requirements. The use of GPUs
as accelerators introduces yet another level of complexity for the programmer
and may potentially result in large overheads due to the complex memory
hierarchy. Additionally, top-notch problems may easily employ more than a
Petaflops of sustained computing power, requiring thousands of GPUs
orchestrated with some parallel programming model. Here we describe APEnet+,
the new generation of our interconnect, which scales up to tens of thousands of
nodes with linear cost, thus improving the price/performance ratio on large
clusters. The project target is the development of the Apelink+ host adapter
featuring a low latency, high bandwidth direct network, state-of-the-art wire
speeds on the links and a PCIe X8 gen2 host interface. It features hardware
support for the RDMA programming model and experimental acceleration of GPU
networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI
library driver are available, allowing for painless porting of standard
applications. Finally, we give an insight of future work and intended
developments
Component-Level Electronic-Assembly Repair (CLEAR) Synthetic Instrument Capabilities Assessment and Test Report
The role of synthetic instruments (SIs) for Component-Level Electronic-Assembly Repair (CLEAR) is to provide an external lower-level diagnostic and functional test capability beyond the built-in-test capabilities of spacecraft electronics. Built-in diagnostics can report faults and symptoms, but isolating the root cause and performing corrective action requires specialized instruments. Often a fault can be revealed by emulating the operation of external hardware. This implies complex hardware that is too massive to be accommodated in spacecraft. The SI strategy is aimed at minimizing complexity and mass by employing highly reconfigurable instruments that perform diagnostics and emulate external functions. In effect, SI can synthesize an instrument on demand. The SI architecture section of this document summarizes the result of a recent program diagnostic and test needs assessment based on the International Space Station. The SI architecture addresses operational issues such as minimizing crew time and crew skill level, and the SI data transactions between the crew and supporting ground engineering searching for the root cause and formulating corrective actions. SI technology is described within a teleoperations framework. The remaining sections describe a lab demonstration intended to show that a single SI circuit could synthesize an instrument in hardware and subsequently clear the hardware and synthesize a completely different instrument on demand. An analysis of the capabilities and limitations of commercially available SI hardware and programming tools is included. Future work in SI technology is also described
FPGA Based Diagnostics for the Mega-Amp Spherical Tokamak Upgrade
Terrestrial fusion power is a low carbon alternative to conventional power sources with reduced waste and proliferation concerns relative to fission power.
The complexity of fusion research devices means that many high performance diagnostics are necessary to investigate the underlying physics of the environment.
Field Programmable Gate Array technology provides a powerful and flexible option when designing bespoke instrumentation
The S2 VLBI Correlator: A Correlator for Space VLBI and Geodetic Signal Processing
We describe the design of a correlator system for ground and space-based
VLBI. The correlator contains unique signal processing functions: flexible LO
frequency switching for bandwidth synthesis; 1 ms dump intervals, multi-rate
digital signal-processing techniques to allow correlation of signals at
different sample rates; and a digital filter for very high resolution
cross-power spectra. It also includes autocorrelation, tone extraction, pulsar
gating, signal-statistics accumulation.Comment: 44 pages, 13 figure
A modified model for the Lobula Giant Movement Detector and its FPGA implementation
The Lobula Giant Movement Detector (LGMD) is a wide-field visual neuron located in the Lobula layer of the Locust nervous system. The LGMD increases its firing rate in response to both the velocity of an approaching object and the proximity of this object. It has been found that it can respond to looming stimuli very quickly and trigger avoidance reactions. It has been successfully applied in
visual collision avoidance systems for vehicles and robots. This paper introduces a modified neural model for LGMD that provides additional depth direction information for the movement. The proposed model retains the simplicity of the previous model by adding only a few new cells. It has been
simplified and implemented on a Field Programmable Gate Array (FPGA), taking advantage of the inherent parallelism exhibited by the LGMD, and tested on real-time video streams. Experimental results demonstrate the effectiveness as a fast motion detector
Development of FPGA controlled diagnostics on the MAST fusion reactor
Field Programmable Gate Array technology (FPGA) is very useful for implementing high performance digital signal processing algorithms, data acquisition and real-time control on nuclear fusion devices. This thesis presents the work done using FPGAs to develop powerful diagnostics. This has been achieved by developing embedded Linux and running it on the FPGA to enhance diagnostic capabilities such as remote management, PLC communications over the ModBus protocol and UDP based ethernet streaming. A closed loop real-time feedback prototype has been developed for combining laser beams onto a single beam path, for improving overall repetition rates of Thomson Scattering systems used for plasma electron temperature and density radial profile measurements. A controllable frequency sweep generator is used to drive the Toroidal Alfven Eigenmode (TAE) antenna system and results are presented indicating successful TAE resonance detection. A fast data acquisition system has been developed for the Electron Bernstein Wave (EBW) Synthetic Aperture Microwave Imaging system and an active probing microwave source where the FPGA clock rate has been pushed to the maximum. Propagation delays on the order of 2 nanoseconds in the FPGA have been finely tuned with careful placement of FPGA logic using a custom logic placement tool. Intensity interferometry results are presented on the EBW system with a suggestion for phase insensitive pitch angle measurement
- …