Search CORE

1,988 research outputs found

APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

Author: A Biagioni
A Lonardo
A Salamon
Ammendola R
Ammendola R
Ammendola R
Ammendola R
Ammendola R
Bodin F
Chalasani Suresh
D Rossetti
F Lo Cicero
F Simula
G Salina
L Tosoratto
NVIDIA Corporation
O Prezza
P S Paolucci
P Vicini
Paolucci P S
Paolucci P S
R Ammendola
Publication venue: 'IOP Publishing'
Publication date: 18/02/2011
Field of study

We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera FPGA, are provided.Comment: 6 pages, 7 figures, proceeding of CHEP 2010, Taiwan, October 18-2

arXiv.org e-Print Archive

Crossref

NaNet:a low-latency NIC enabling GPU-based, real-time low level trigger systems

Author: Ammendola Roberto
Biagioni Andrea
Cicero Francesca Lo
Fantechi Riccardo
Frezza Ottorino
Lamanna Gianluca
Lonardo Alessandro
Pantaleo Felice
Paolucci Pier Stanislao
Piandani Roberto
Pontisso Luca
Rossetti Davide
Simula Francesco
Sozzi Marco
Tosoratto Laura
Vicini Piero
Publication venue: 'IOP Publishing'
Publication date: 05/11/2013
Field of study

We implemented the NaNet FPGA-based PCI2 Gen2 GbE/APElink NIC, featuring GPUDirect RDMA capabilities and UDP protocol management offloading. NaNet is able to receive a UDP input data stream from its GbE interface and redirect it, without any intermediate buffering or CPU intervention, to the memory of a Fermi/Kepler GPU hosted on the same PCIe bus, provided that the two devices share the same upstream root complex. Synthetic benchmarks for latency and bandwidth are presented. We describe how NaNet can be employed in the prototype of the GPU-based RICH low-level trigger processor of the NA62 CERN experiment, to implement the data link between the TEL62 readout boards and the low level trigger processor. Results for the throughput and latency of the integrated system are presented and discussed.Comment: Proceedings for the 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP

arXiv.org e-Print Archive

CERN Document Server

APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

Author: Ammendola Roberto
Biagioni Andrea
Cicero Francesca Lo
Frezza Ottorino
Lonardo Alessandro
Paolucci Pier
Petronzio Roberto
Rossetti Davide
Salamon Andrea
Salina Gaetano
Simula Francesco
Tantalo Nazario
Tosoratto Laura
Vicini Piero
Publication venue
Publication date: 01/01/2010
Field of study

Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments

arXiv.org e-Print Archive

ART

Component-Level Electronic-Assembly Repair (CLEAR) Synthetic Instrument Capabilities Assessment and Test Report

Author: Bradish Martin A.
Oeftering Richard C.
Publication venue
Publication date
Field of study

The role of synthetic instruments (SIs) for Component-Level Electronic-Assembly Repair (CLEAR) is to provide an external lower-level diagnostic and functional test capability beyond the built-in-test capabilities of spacecraft electronics. Built-in diagnostics can report faults and symptoms, but isolating the root cause and performing corrective action requires specialized instruments. Often a fault can be revealed by emulating the operation of external hardware. This implies complex hardware that is too massive to be accommodated in spacecraft. The SI strategy is aimed at minimizing complexity and mass by employing highly reconfigurable instruments that perform diagnostics and emulate external functions. In effect, SI can synthesize an instrument on demand. The SI architecture section of this document summarizes the result of a recent program diagnostic and test needs assessment based on the International Space Station. The SI architecture addresses operational issues such as minimizing crew time and crew skill level, and the SI data transactions between the crew and supporting ground engineering searching for the root cause and formulating corrective actions. SI technology is described within a teleoperations framework. The remaining sections describe a lab demonstration intended to show that a single SI circuit could synthesize an instrument in hardware and subsequently clear the hardware and synthesize a completely different instrument on demand. An analysis of the capabilities and limitations of commercially available SI hardware and programming tools is included. Future work in SI technology is also described

NASA Technical Reports Server

FPGA Based Diagnostics for the Mega-Amp Spherical Tokamak Upgrade

Author: VINCENT CHARLES,HOWARD
Publication venue
Publication date: 01/01/2021
Field of study

Terrestrial fusion power is a low carbon alternative to conventional power sources with reduced waste and proliferation concerns relative to fission power. The complexity of fusion research devices means that many high performance diagnostics are necessary to investigate the underlying physics of the environment. Field Programmable Gate Array technology provides a powerful and flexible option when designing bespoke instrumentation

Durham e-Theses

The S2 VLBI Correlator: A Correlator for Space VLBI and Geodetic Signal Processing

Author: B. R. Carlson
Lynch J. M.
Narayan R.
P. E. Dewdney
R. V. Casorso
T. A. Burgess
W. H. Cannon
W. T. Petrachenko
Publication venue: 'University of Chicago Press'
Publication date: 01/01/1999
Field of study

We describe the design of a correlator system for ground and space-based VLBI. The correlator contains unique signal processing functions: flexible LO frequency switching for bandwidth synthesis; 1 ms dump intervals, multi-rate digital signal-processing techniques to allow correlation of signals at different sample rates; and a digital filter for very high resolution cross-power spectra. It also includes autocorrelation, tone extraction, pulsar gating, signal-statistics accumulation.Comment: 44 pages, 13 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

A modified model for the Lobula Giant Movement Detector and its FPGA implementation

Author: Andrew Hunter
Barron
Barth
Bermudez i Badia
Bertozzi
Bertozzi
Blanchard
Blanchard
Colombo
Connolly
Coombs
Cuadri
Cy Pettit
Galbraith
Hongying Meng
Judge
Kofi Appiah
Lazaros
Longuet Higgins
Mervyn Hobden
Meyer
Nedevschi
Nelson
Nigel Priestley
Okuno
Peter Hobden
Polychronopoulos
Rind
Rind
Rind
Rind
Rind
Rind
Sandini
Santer
Santer
Shigang Yue
Yue
Yue
Yue
Yue
Publication venue: 'Elsevier BV'
Publication date: 22/04/2010
Field of study

The Lobula Giant Movement Detector (LGMD) is a wide-field visual neuron located in the Lobula layer of the Locust nervous system. The LGMD increases its firing rate in response to both the velocity of an approaching object and the proximity of this object. It has been found that it can respond to looming stimuli very quickly and trigger avoidance reactions. It has been successfully applied in visual collision avoidance systems for vehicles and robots. This paper introduces a modified neural model for LGMD that provides additional depth direction information for the movement. The proposed model retains the simplicity of the previous model by adding only a few new cells. It has been simplified and implemented on a Field Programmable Gate Array (FPGA), taking advantage of the inherent parallelism exhibited by the LGMD, and tested on real-time video streams. Experimental results demonstrate the effectiveness as a fast motion detector

University of Lincoln Institutional Repository

Crossref

Nottingham Trent Institutional Repository (IRep)

Sheffield Hallam University Research Archive

Brunel University Research Archive

Development of FPGA controlled diagnostics on the MAST fusion reactor

Author: HUANG BILLY,KIAT
Publication venue
Publication date: 01/01/2013
Field of study

Field Programmable Gate Array technology (FPGA) is very useful for implementing high performance digital signal processing algorithms, data acquisition and real-time control on nuclear fusion devices. This thesis presents the work done using FPGAs to develop powerful diagnostics. This has been achieved by developing embedded Linux and running it on the FPGA to enhance diagnostic capabilities such as remote management, PLC communications over the ModBus protocol and UDP based ethernet streaming. A closed loop real-time feedback prototype has been developed for combining laser beams onto a single beam path, for improving overall repetition rates of Thomson Scattering systems used for plasma electron temperature and density radial profile measurements. A controllable frequency sweep generator is used to drive the Toroidal Alfven Eigenmode (TAE) antenna system and results are presented indicating successful TAE resonance detection. A fast data acquisition system has been developed for the Electron Bernstein Wave (EBW) Synthetic Aperture Microwave Imaging system and an active probing microwave source where the FPGA clock rate has been pushed to the maximum. Propagation delays on the order of 2 nanoseconds in the FPGA have been finely tuned with careful placement of FPGA logic using a custom logic placement tool. Intensity interferometry results are presented on the EBW system with a suggestion for phase insensitive pitch angle measurement

Durham e-Theses