Search CORE

1,508 research outputs found

An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor

Author: D Richie
D Richie
J Ross
JA Ross
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/08/2016
Field of study

This paper reports the implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer. The Epiphany architecture exhibits massive many-core scalability with a physically compact 2D array of RISC CPU cores and a fast network-on-chip (NoC). While fully capable of MPMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to Partitioned Global Address Space (PGAS) programming models and SPMD execution with SHMEM.Comment: 14 pages, 9 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologie

arXiv.org e-Print Archive

Crossref

Using XDAQ in Application Scenarios of the CMS Experiment

Author: Berti L.
Brigljevic V.
Bruno G.
Cano E.
Cittolin S.
Csilling A.
Dell V. O?
Drouhin F.
Erhan S.
Gigi D.
Glege F.
Gulmini M.
Gutleber J.
Jacobs C.
Kozlowski M.
Larsen H.
Magrans I.
Maron G.
Meijers F.
Meschi E.
Mirabito L.
Murray S.
Oh A.
Orsini L.
Pollet L.
Racz A.
Samyn D.
Scharff-Hansen P.
Schwick C.
Sphicas P.
Suzuki I.
Toniolo N.
Ventura S.
Zangrando L.
Publication venue
Publication date: 24/03/2003
Field of study

XDAQ is a generic data acquisition software environment that emerged from a rich set of of use-cases encountered in the CMS experiment. They cover not the deployment for multiple sub-detectors and the operation of different processing and networking equipment as well as a distributed collaboration of users with different needs. The use of the software in various application scenarios demonstrated the viability of the approach. We discuss two applications, the tracker local DAQ system for front-end commissioning and the muon chamber validation system. The description is completed by a brief overview of XDAQ.Comment: Conference CHEP 2003 (Computing in High Energy and Nuclear Physics, La Jolla, CA

arXiv.org e-Print Archive

HAL-IN2P3

CERN Document Server

APENet: LQCD clusters a la APE

Author: A. Salamon
D. Rossetti
F. Palombi
Fodor
G. Mazza
Lippert
Luscher
M. Guagnelli
P. Vicini
R. Ammendola
R. Petronzio
Publication venue: 'Elsevier BV'
Publication date: 14/09/2004
Field of study

Developed by the APE group, APENet is a new high speed, low latency, 3-dimensional interconnect architecture optimized for PC clusters running LQCD-like numerical applications. The hardware implementation is based on a single PCI-X 133MHz network interface card hosting six indipendent bi-directional channels with a peak bandwidth of 676 MB/s each direction. We discuss preliminary benchmark results showing exciting performances similar or better than those found in high-end commercial network systems.Comment: Lattice2004(machines), 3 pages, 4 figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Open Access Repository

Learning from the Success of MPI

Author: A. Geist
A. Skjellum
C.H. Koelbel
J. Boyle
J. Cownie
J. Dongarra
J.L. Traeff
K. Krechmer
Message Passing Interface Forum
Message Passing Interface Forum MPI2
N. Carriero
O. Zaki
P.B. Hansen
R. Hempel
R.C. Whaley
R.W. Numrich
W. Gropp
W. Gropp
W.W. Carlson
Publication venue
Publication date: 01/01/2001
Field of study

The Message Passing Interface (MPI) has been extremely successful as a portable way to program high-performance parallel computers. This success has occurred in spite of the view of many that message passing is difficult and that other approaches, including automatic parallelization and directive-based parallelism, are easier to use. This paper argues that MPI has succeeded because it addresses all of the important issues in providing a parallel programming model.Comment: 12 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

UNT Digital Library

Lattice QCD Production on Commodity Clusters at Fermilab

Author: Gottlieb Steven
Holmgren D.
Mackenzie P.
Simone J.
Singh A.
Publication venue
Publication date: 08/07/2003
Field of study

We describe the construction and results to date of Fermilab's three Myrinet-networked lattice QCD production clusters (an 80-node dual Pentium III cluster, a 48-node dual Xeon cluster, and a 128-node dual Xeon cluster). We examine a number of aspects of performance of the MILC lattice QCD code running on these clusters.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 6 pages, LaTeX, 8 eps figures. PSN TUIT00

arXiv.org e-Print Archive

UNT Digital Library

On the acceleration of wavefront applications using distributed many-core architectures

Author: Hammond Simon D.
Jarvis Stephen A.
Mudalige Gihan R.
Pennycook Simon J.
Wright Steven A.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2012
Field of study

In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P). Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures

CiteSeerX

University of Birmingham Research Portal

Warwick Research Archives Portal Repository

White Rose Research Online