Search CORE

223 research outputs found

APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

Author: Ammendola Roberto
Biagioni Andrea
Cicero Francesca Lo
Frezza Ottorino
Lonardo Alessandro
Paolucci Pier
Petronzio Roberto
Rossetti Davide
Salamon Andrea
Salina Gaetano
Simula Francesco
Tantalo Nazario
Tosoratto Laura
Vicini Piero
Publication venue
Publication date: 01/01/2010
Field of study

Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments

arXiv.org e-Print Archive

ART

APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

Author: A Biagioni
A Lonardo
A Salamon
Ammendola R
Ammendola R
Ammendola R
Ammendola R
Ammendola R
Bodin F
Chalasani Suresh
D Rossetti
F Lo Cicero
F Simula
G Salina
L Tosoratto
NVIDIA Corporation
O Prezza
P S Paolucci
P Vicini
Paolucci P S
Paolucci P S
R Ammendola
Publication venue: 'IOP Publishing'
Publication date: 18/02/2011
Field of study

We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera FPGA, are provided.Comment: 6 pages, 7 figures, proceeding of CHEP 2010, Taiwan, October 18-2

arXiv.org e-Print Archive

Crossref

Using an FPGA for Fast Bit Accurate SoC Simulation

Author: Hölzenspies P.K.F.
Smit G.J.M.
Wolkotte P.T.
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2007
Field of study

In this paper we describe a sequential simulation method to simulate large parallel homo- and heterogeneous systems on a single FPGA. The method is applicable for parallel systems were lengthy cycle and bit accurate simulations are required. It is particularly designed for systems that do not fit completely on the simulation platform (i.e. FPGA). As a case study, we use a Network-on-Chip (NoC) that is simulated in SystemC and on the described FPGA simulator. This enables us to observe the NoC behavior under a large variety of traffic patterns. Compared with the SystemC simulation we achieved a factor 80-300 of speed improvement, without compromising the cycle and bit level accuracy

University of Twente Research Information

Fast, Accurate and Detailed NoC Simulations

Author: Hölzenspies P.K.F.
Smit G.J.M.
Wolkotte P.T.
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2007
Field of study

Network-on-Chip (NoC) architectures have a wide variety of parameters that can be adapted to the designer's requirements. Fast exploration of this parameter space is only possible at a high-level and several methods have been proposed. Cycle and bit accurate simulation is necessary when the actual router's RTL description needs to be evaluated and verified. However, extensive simulation of the NoC architecture with cycle and bit accuracy is prohibitively time consuming. In this paper we describe a simulation method to simulate large parallel homogeneous and heterogeneous network-on-chips on a single FPGA. The method is especially suitable for parallel systems where lengthy cycle and bit accurate simulations are required. As a case study, we use a NoC that was modelled and simulated in SystemC. We simulate the same NoC on the described FPGA simulator. This enables us to observe the NoC behavior under a large variety of traffic patterns. Compared with the SystemC simulation we achieved a speed-up of 80-300, without compromising the cycle and bit level accuracy

University of Twente Research Information

High performance IPC hardware accelerator and communication network for MPSoCs

Author: Chae Soo-Ik
Koo Moonmo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2008
Field of study

In this paper, we explain a configurable IPC module for multimedia MPSoCs, which was implemented in a MPW chip that include three ARM7 CPU cores. According to the test results for an M-JPEG and a H.264 decoder, its IPC synchronization overheads are not more than 1% when the synchronization period is about 5000 cycles.This work was supported by the IC Design Education Center (IDEC) in KAIST, and the Seoul R&BD Program

SNU Open Repository and Archive

A Two Channel Analog Front end Design AFE Design with Continuous Time Σ-Δ Modulator for ECG Signal

Author: Manjunathachari K
Raheem Mohammed Abdul
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2018
Field of study

In this context, the AFE with 2-channels is described, which has high impedance for low power application of bio-medical electrical activity. The challenge in obtaining accurate recordings of biomedical signals such as EEG/ECG to study the human body in research work. This paper is to propose Multi-Vt in AFE circuit design cascaded with CT modulator. The new architecture is anticipated with two dissimilar input signals filtered from 2-channel to one modulator. In this methodology, the amplifier is low powered multi-VT Analog Front-End which consumes less power by applying dual threshold voltage. Type -I category 2 channel signals of the first mode: 50 and 150 Hz amplified from AFE are given to 2nd CT sigma-delta ADC. Depict the SNR and SNDR as 63dB and 60dB respectively, consuming the power of 11mW. The design was simulated in a 0.18 um standard UMC CMOS process at 1.8V supply. The AFE measured frequency response from 50 Hz to 360 Hz, depict the SNR and SNDR as 63dB and 60dB respectively, consuming the power of 11mW. The design was simulated in 0.18 m standard UMC CMOS process at 1.8V supply. The AFE measured frequency response from 50 Hz to 360 Hz, programmable gains from 52.6 dB to 72 dB, input referred noise of 3.5 μV in the amplifier bandwidth, NEF of 3

Crossref

Institute of Advanced Engineering and Science