Search CORE

44 research outputs found

Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks

Author: Adamidis Panagiotis
Ciotti Robert
Dossa Don
Fatoohi Rod
Gunney Brian T. N.
Koniges Alice
Mueller Matthias
Rabenseifner Rolf
Saini Subhash
Spelce Thomas E.
Tiyyagura Sunil R.
Publication venue
Publication date: 01/01/2006
Field of study

The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems

CiteSeerX

Crossref

NASA Technical Reports Server

Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Author: Balaji P.
Bova S. W.
Cappello F.
Cappello F.
Cappello F.
Cwire
Cwire
Cwire
Dong S.
Elsen E.
Goglin B.
Griebel M.
Gropp W.
Guermond J.L.L.
Göddeke D.
Hager G.
Hempel R.
Henty D. S.
Kindratenko V.
Luong P.
Lusk E.
Nakajima K.
Nakajima K.
Owens J.D.
Rabenseifner R.
Schive H.
Showerman M.
Simon H.
Thibault J. C.
Wan D.C.
Publication venue: 'IUScholarWorks'
Publication date: 04/01/2011
Field of study

High performance computing using graphics processing units (GPUs) is gaining popularity in the scientific computing field, with many large compute clusters being augmented with multiple GPUs in each node. We investigate hybrid tri-level (MPI-OpenMP-CUDA) parallel implementations to explore the efficiency and scalability of incompressible flow computations on GPU clusters up to 128 GPUS. This work details some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism using OpenMP for intra-node and MPI for inter-node communication. Comparisons between the tri-level MPI-OpenMP-CUDA and dual-level MPI-CUDA implementations are shown using computationally large computational fluid dynamics (CFD) simulations. Our results demonstrate that a tri-level parallel implementation does not provide a significant advantage in performance over the dual-level implementation, however further research is needed to justify our conclusion for a cluster with a high GPU per node density or when using software that can utilize OpenMP’s fine-grain parallelism more effectively

Crossref

Boise State University - ScholarWorks

CMIP: a software package capable of reconstructing genome-wide regulatory networks using gene expression data

Author: A Honkela
AA Margolin
AA Margolin
AC Haury
AL Barabasi
AN Brooks
B Usadel
D Angeli
D Braha
D Marbach
D Marbach
F Liu
Guangyong Zheng
I Cantone
J Nickolls
J Zhao
JJ Faith
L Chen
LE Brown
Luonan Chen
M Chevalier
M Grieb
M Zou
N Friedman
N Kramer
PE Meyer
R Bonneau
R Liu
R Ming
R Rabenseifner
S Kauffman
TS Gardner
W Ma
X Yu
X Zhang
X Zhang
X Zhang
X Zhang
Xin-Guang Zhu
Xiujun Zhang
Y Artzy-Randrup
Y Wang
Yaochen Xu
Zhi-Ping Liu
Zhuo Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Die geregelte logische Uhr, eine globale Uhr fuer tracebasierte Ueberwachung paralleler Anwendungen

Author: Rabenseifner R.
Publication venue
Publication date: 01/01/2000
Field of study

Available from TIB Hannover: RO 7298(44) / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekSIGLEDEGerman

OpenGrey Repository

股関節全置換術におけるRefobacin濃度に関する研究

Author: HIRASAWA Y.
LEIMBECK R.
RABENSEIFNER L.
Publication venue: 京都大学医学部外科整形外科学教室内日本外科宝函編集室
Publication date: 01/11/1986
Field of study

Kyoto University Research Information Repository

GPU-Aware Intranode MPI_Allreduce

Author: Rabenseifner R.
Thakur R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Modern multi-core clusters are increasingly using GPUs to achieve higher performance and power efficiency. In such clusters, efficient communication among processes with data residing in GPU memory is of paramount importance to the performance of MPI applications. This paper investigates the efficient design of intranode MPI Allreduce operation in GPU clusters. We propose two design alternatives that ex-ploit in-GPU reduction and fast intranode communication capabilities of modern GPUs. Our GPU shared-buffer aware design and GPU-aware Binomial reduce-broadcast algorith-mic approach provide significant speedup over MVAPICH2 by up to 22 and 16 times, respectively

CiteSeerX

Crossref

Effective File-I/O Bandwidth Benchmark

Author: Koniges A.E.
Rabenseifner R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

The effective I/O bandwidth benchmark (b{_}eff{_}io) covers two goals: (1) to achieve a characteristic average number for the I/O bandwidth achievable with parallel MPI-I/O applications, and (2) to get detailed information about several access patterns and buffer lengths. The benchmark examines ''first write'', ''rewrite'' and ''read'' access, strided (individual and shared pointers) and segmented collective patterns on one file per application and non-collective access to one file per process. The number of parallel accessing processes is also varied and well-formed I/O is compared with non-well formed. On systems, meeting the rule that the total memory can be written to disk in 10 minutes, the benchmark should not need more than 15 minutes for a first pass of all patterns. The benchmark is designed analogously to the effective bandwidth benchmark for message passing (b{_}eff) that characterizes the message passing capabilities of a system in a few minutes. First results of the b{_}eff{_}io benchmark are given for IBM SP and Cray T3E systems and compared with existing benchmarks based on parallel Posix-I/O

CiteSeerX

Crossref

UNT Digital Library

Communication and Optimization Aspects on Hybrid Architectures

Author: G. Wellein
R. Rabenseifner
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

SPARSE APPROXIMATE INVERSE PRECONDITIONER FOR CONTACT PROBLEMS ON THE EARTH SIMULATOR USING OPENMP

Author: KENGO NAKAJIMA
Rabenseifner R.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref

Implications of non-constant clock drifts for the timestamps of concurrent event

Author: Becker D.
Rabenseifner R.
Wolf F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Crossref

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources