Search CORE

298 research outputs found

Implementation of MPICH on top of MPLi̲te

Author: Selvarajan Shoba
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2002
Field of study

The goal of this thesis is to develop a new Channel Interface device for the MPICH implementation of the MPI (Message Passing Interface) standard using MPLi̲te. MPLi̲te is a lightweight message-passing library that is not a full MPI implementation, but offers high performance. MPICH (Message Passing Interface CHameleon) is a full implementation of the MPI standard that has the p4 library as the underlying communication device for TCP/IP networks. By integrating MPLi̲te as a Channel Interface device in MPICH, a parallel programmer can utilize the full MPI implementation of MPICH as well as the high bandwidth offered by MPLi̲te. There are several layers in the MPICH library where one can tie a new device. The Channel Interface is the lowest layer that requires very few functions to add a new device. By attaching MPLi̲te to MPICH at the lowest level, the Channel Interface, almost all of the performance of the MPLi̲te library can be delivered to the applications using MPICH. MPLi̲te can be implemented either as a blocking or a non-blocking Channel Interface device. The performance was measured on two separate test clusters, the PC and the Alpha mini-clusters, having Gigabit Ethernet connections. The PC cluster has two 1.8 GHz Pentium 4 PCs and the Alpha cluster has two 500 MHz Compaq DS20 workstations. Different network interface cards like Netgear, TrendNet and SysKonnect Gigabit Ethernet cards were used for the measurements. Both the blocking and non-blocking MPICH-MPLi̲te Channel Interface devices perform close to raw TCP, whereas a performance loss of 25-30% is seen in the MPICH-p4 Channel Interface device for larger messages. The superior performance offered by the MPICH-MPLi̲te device compared to the MPICH-p4 device can be easily seen on the SysKonnect cards using jumbo frames. The throughput curve also improves considerably by increasing the Eager/Rendezvous threshold

Digital Repository @ Iowa State University (ISU)

A low-cost parallel implementation of direct numerical simulation of wall turbulence

Author: Bertolotti
del Álamo
Dmitruk
Günther
Iovieno
Jiménez
Kim
Kim
Kwok
Lele
Mahesh
Maurizio Quadrio
Moin
Moser
Na
Paolo Luchini
Pelz
Pozzi
Quadrio
Quadrio
Spotz
Thomas
Publication venue: 'Elsevier BV'
Publication date: 18/06/2005
Field of study

A numerical method for the direct numerical simulation of incompressible wall turbulence in rectangular and cylindrical geometries is presented. The distinctive feature resides in its design being targeted towards an efficient distributed-memory parallel computing on commodity hardware. The adopted discretization is spectral in the two homogeneous directions; fourth-order accurate, compact finite-difference schemes over a variable-spacing mesh in the wall-normal direction are key to our parallel implementation. The parallel algorithm is designed in such a way as to minimize data exchange among the computing machines, and in particular to avoid taking a global transpose of the data during the pseudo-spectral evaluation of the non-linear terms. The computing machines can then be connected to each other through low-cost network devices. The code is optimized for memory requirements, which can moreover be subdivided among the computing nodes. The layout of a simple, dedicated and optimized computing system based on commodity hardware is described. The performance of the numerical method on this computing system is evaluated and compared with that of other codes described in the literature, as well as with that of the same code implementing a commonly employed strategy for the pseudo-spectral calculation.Comment: To be published in J. Comp. Physic

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Archivio della Ricerca - Università di Salerno

CERN Document Server

Performance evaluation of an open distributed platform for realistic traffic generation

Author: AVALLONE STEFANO
D. Emma
PESCAPE' ANTONIO
VENTRE GIORGIO
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Network researchers have dedicated a notable part of their efforts to the area of modeling traffic and to the implementation of efficient traffic generators. We feel that there is a strong demand for traffic generators capable to reproduce realistic traffic patterns according to theoretical models and at the same time with high performance. This work presents an open distributed platform for traffic generation that we called distributed internet traffic generator (D-ITG), capable of producing traffic (network, transport and application layer) at packet level and of accurately replicating appropriate stochastic processes for both inter departure time (IDT) and packet size (PS) random variables. We implemented two different versions of our distributed generator. In the first one, a log server is in charge of recording the information transmitted by senders and receivers and these communications are based either on TCP or UDP. In the other one, senders and receivers make use of the MPI library. In this work a complete performance comparison among the centralized version and the two distributed versions of D-ITG is presented

Archivio della ricerca - Università degli studi di Napoli Federico II

NIC-assisted cache-efficient receive stack for message passing over Ethernet

Author: Bailey
Browne
Frigo
Goglin
Goglin
Goglin
Huggahalli
Passas
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

International audienceHigh-speed networking in clusters usually relies on advanced hardware features in the NICs, such as zero-copy capability. Open-MX is a high-performance message passing stack tailored for regular Ethernet hardware without such capabilities. We present the addition of a multiqueue support in the Open-MX receive stack so that all incoming packets for the same process are handled on the same core. We then introduce the idea of binding the target end process near its dedicated receive queue. This model leads to a more cache-efficient receive stack for Open-MX. It also proves that very simple and stateless hardware features may have a significant impact on message passing performance over Ethernet. The implementation of this model in a firmware reveals that it may not be as efficient as some manually tuned micro-benchmarks. But our multiqueue receive stack generally performs better than the original single queue stack, especially on large communication patterns where multiple processes are involved and manual binding is difficult

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Oskar Bordeaux

Optimized framegrabber for the Cherenkov telescope array

Author: Berge David
Díaz Alonso Antonio Javier
Giavitto Gianluca
Jiménez López Miguel
Machado Cano Jorge Manuel
Rodríguez Álvarez Manuel
Stephan Maurice
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2019
Field of study

Our contribution presents a high bandwidth platform that implements traffic aggregation and switching capabilities for the Cherenkov telescope array (CTA) cameras. Our proposed system integrates two different data flows: a unidirectional one from the cameras to an external server and a second one, fully configurable dedicated to configuration and control traffic for the camera management. The former requires high bandwidth mechanisms to be able to aggregate several 1 gigabit Ethernet links into one high speed 10 gigabit Ethernet port. The latter is responsible for providing routing components to allow a control and management path for all the elements of the cameras. Hence, a simple, efficient, and flexible routing mechanism has been implemented avoiding complex circuitry that impacts in the system performance. As a consequence, an asymmetric network topology allows high bandwidth communication and, at the same time, a flexible and cost-effective implementation. In our contribution, we analyze the camera requirements and present the proposed architecture. Moreover, we have designed several evaluation tests to demonstrate that our solution fulfills the CTA project needs. Finally, we illustrate the general possibilities of the proposed solution for other data acquisition applications and the most promising futures lines of research are discussed.This work has been partially funded by the Horizon 2020 (H2020) ASTERICS (Grant No. 653477) and AYA2015-65973-C3-2-R AMIGA6

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Granada

DESY

Using XDAQ in Application Scenarios of the CMS Experiment

Author: Berti L.
Brigljevic V.
Bruno G.
Cano E.
Cittolin S.
Csilling A.
Dell V. O?
Drouhin F.
Erhan S.
Gigi D.
Glege F.
Gulmini M.
Gutleber J.
Jacobs C.
Kozlowski M.
Larsen H.
Magrans I.
Maron G.
Meijers F.
Meschi E.
Mirabito L.
Murray S.
Oh A.
Orsini L.
Pollet L.
Racz A.
Samyn D.
Scharff-Hansen P.
Schwick C.
Sphicas P.
Suzuki I.
Toniolo N.
Ventura S.
Zangrando L.
Publication venue
Publication date: 24/03/2003
Field of study

XDAQ is a generic data acquisition software environment that emerged from a rich set of of use-cases encountered in the CMS experiment. They cover not the deployment for multiple sub-detectors and the operation of different processing and networking equipment as well as a distributed collaboration of users with different needs. The use of the software in various application scenarios demonstrated the viability of the approach. We discuss two applications, the tracker local DAQ system for front-end commissioning and the muon chamber validation system. The description is completed by a brief overview of XDAQ.Comment: Conference CHEP 2003 (Computing in High Energy and Nuclear Physics, La Jolla, CA

arXiv.org e-Print Archive

HAL-IN2P3

CERN Document Server