1,988 research outputs found

    APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

    Full text link
    We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera FPGA, are provided.Comment: 6 pages, 7 figures, proceeding of CHEP 2010, Taiwan, October 18-2

    NaNet:a low-latency NIC enabling GPU-based, real-time low level trigger systems

    Full text link
    We implemented the NaNet FPGA-based PCI2 Gen2 GbE/APElink NIC, featuring GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is able to receive a UDP input data stream from its GbE interface and redirect it, without any intermediate buffering or CPU intervention, to the memory of a Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices share the same upstream root complex. Synthetic benchmarks for latency and bandwidth are presented. We describe how NaNet can be employed in the prototype of the GPU-based RICH low-level trigger processor of the NA62 CERN experiment, to implement the data link between the TEL62 readout boards and the low level trigger processor. Results for the throughput and latency of the integrated system are presented and discussed.Comment: Proceedings for the 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP

    APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

    Full text link
    Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments

    Component-Level Electronic-Assembly Repair (CLEAR) Synthetic Instrument Capabilities Assessment and Test Report

    Get PDF
    The role of synthetic instruments (SIs) for Component-Level Electronic-Assembly Repair (CLEAR) is to provide an external lower-level diagnostic and functional test capability beyond the built-in-test capabilities of spacecraft electronics. Built-in diagnostics can report faults and symptoms, but isolating the root cause and performing corrective action requires specialized instruments. Often a fault can be revealed by emulating the operation of external hardware. This implies complex hardware that is too massive to be accommodated in spacecraft. The SI strategy is aimed at minimizing complexity and mass by employing highly reconfigurable instruments that perform diagnostics and emulate external functions. In effect, SI can synthesize an instrument on demand. The SI architecture section of this document summarizes the result of a recent program diagnostic and test needs assessment based on the International Space Station. The SI architecture addresses operational issues such as minimizing crew time and crew skill level, and the SI data transactions between the crew and supporting ground engineering searching for the root cause and formulating corrective actions. SI technology is described within a teleoperations framework. The remaining sections describe a lab demonstration intended to show that a single SI circuit could synthesize an instrument in hardware and subsequently clear the hardware and synthesize a completely different instrument on demand. An analysis of the capabilities and limitations of commercially available SI hardware and programming tools is included. Future work in SI technology is also described

    FPGA Based Diagnostics for the Mega-Amp Spherical Tokamak Upgrade

    Get PDF
    Terrestrial fusion power is a low carbon alternative to conventional power sources with reduced waste and proliferation concerns relative to fission power. The complexity of fusion research devices means that many high performance diagnostics are necessary to investigate the underlying physics of the environment. Field Programmable Gate Array technology provides a powerful and flexible option when designing bespoke instrumentation

    The S2 VLBI Correlator: A Correlator for Space VLBI and Geodetic Signal Processing

    Get PDF
    We describe the design of a correlator system for ground and space-based VLBI. The correlator contains unique signal processing functions: flexible LO frequency switching for bandwidth synthesis; 1 ms dump intervals, multi-rate digital signal-processing techniques to allow correlation of signals at different sample rates; and a digital filter for very high resolution cross-power spectra. It also includes autocorrelation, tone extraction, pulsar gating, signal-statistics accumulation.Comment: 44 pages, 13 figure

    A modified model for the Lobula Giant Movement Detector and its FPGA implementation

    Get PDF
    The Lobula Giant Movement Detector (LGMD) is a wide-field visual neuron located in the Lobula layer of the Locust nervous system. The LGMD increases its firing rate in response to both the velocity of an approaching object and the proximity of this object. It has been found that it can respond to looming stimuli very quickly and trigger avoidance reactions. It has been successfully applied in visual collision avoidance systems for vehicles and robots. This paper introduces a modified neural model for LGMD that provides additional depth direction information for the movement. The proposed model retains the simplicity of the previous model by adding only a few new cells. It has been simplified and implemented on a Field Programmable Gate Array (FPGA), taking advantage of the inherent parallelism exhibited by the LGMD, and tested on real-time video streams. Experimental results demonstrate the effectiveness as a fast motion detector

    Development of FPGA controlled diagnostics on the MAST fusion reactor

    Get PDF
    Field Programmable Gate Array technology (FPGA) is very useful for implementing high performance digital signal processing algorithms, data acquisition and real-time control on nuclear fusion devices. This thesis presents the work done using FPGAs to develop powerful diagnostics. This has been achieved by developing embedded Linux and running it on the FPGA to enhance diagnostic capabilities such as remote management, PLC communications over the ModBus protocol and UDP based ethernet streaming. A closed loop real-time feedback prototype has been developed for combining laser beams onto a single beam path, for improving overall repetition rates of Thomson Scattering systems used for plasma electron temperature and density radial profile measurements. A controllable frequency sweep generator is used to drive the Toroidal Alfven Eigenmode (TAE) antenna system and results are presented indicating successful TAE resonance detection. A fast data acquisition system has been developed for the Electron Bernstein Wave (EBW) Synthetic Aperture Microwave Imaging system and an active probing microwave source where the FPGA clock rate has been pushed to the maximum. Propagation delays on the order of 2 nanoseconds in the FPGA have been finely tuned with careful placement of FPGA logic using a custom logic placement tool. Intensity interferometry results are presented on the EBW system with a suggestion for phase insensitive pitch angle measurement
    corecore