Search CORE

9 research outputs found

Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Author: Balaji P.
Bova S. W.
Cappello F.
Cappello F.
Cappello F.
Cwire
Cwire
Cwire
Dong S.
Elsen E.
Goglin B.
Griebel M.
Gropp W.
Guermond J.L.L.
Göddeke D.
Hager G.
Hempel R.
Henty D. S.
Kindratenko V.
Luong P.
Lusk E.
Nakajima K.
Nakajima K.
Owens J.D.
Rabenseifner R.
Schive H.
Showerman M.
Simon H.
Thibault J. C.
Wan D.C.
Publication venue: 'IUScholarWorks'
Publication date: 04/01/2011
Field of study

High performance computing using graphics processing units (GPUs) is gaining popularity in the scientific computing field, with many large compute clusters being augmented with multiple GPUs in each node. We investigate hybrid tri-level (MPI-OpenMP-CUDA) parallel implementations to explore the efficiency and scalability of incompressible flow computations on GPU clusters up to 128 GPUS. This work details some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism using OpenMP for intra-node and MPI for inter-node communication. Comparisons between the tri-level MPI-OpenMP-CUDA and dual-level MPI-CUDA implementations are shown using computationally large computational fluid dynamics (CFD) simulations. Our results demonstrate that a tri-level parallel implementation does not provide a significant advantage in performance over the dual-level implementation, however further research is needed to justify our conclusion for a cluster with a high GPU per node density or when using software that can utilize OpenMP’s fine-grain parallelism more effectively

Crossref

Boise State University - ScholarWorks

Programming Abstractions for Data Locality

The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal

Crossref

INRIA a CCSD electronic archive server

Oskar Bordeaux

Initial genome sequencing of the sugarcane CP 96-1252 complex hybrid [version 1; referees: 2 approved]

Author: Derek M. Harkins
Granger G. Sutton
Jason R. Miller
Karen Beeri
Kari A. Dilley
Karrie Goglin
Kelvin J. Moncera
Manolito G. Torralba
Reed S. Shabman
Timothy B. Stockwell
Publication venue: 'F1000 Research Ltd'
Publication date: 01/05/2017
Field of study

The CP 96-1252 cultivar of sugarcane is a complex hybrid of commercial importance. DNA was extracted from lab-grown leaf tissue and sequenced. The raw Illumina DNA sequencing results provide 101 Gbp of genome sequence reads. The dataset is available from https://www.ncbi.nlm.nih.gov/bioproject/PRJNA345486/

Directory of Open Access Journals

Affinity-Based Network Interfaces for Efficient Communication on Multicore Architectures

Author: A Ortiz
A Ortiz
Alberto Prieto
Andrés Ortiz
Antonio F. Díaz
B Goglin
D Clark
G Pacifici
G Regnier
H Kim
Julio Ortega
P Balaji
P Magnusson
R Bhoedjang
S GadelRab
T Tu
V Apte
W Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A novel approach for big data processing using message passing interface based on memory mapping

Author: A Gandomi
A Oussous
A Sidelnik
Ashwin M. Aji
B Furht
B Goglin
Bastian Eine
Chi-Jen Wu
CK Emani
D Talia
DJ Mavriplis
J Ousterhout
Jiri Dokulil
M Chen
M Hameed
MM Najafabadi
N Golov
OG Abood
Pekka Pääkkönen
R Elshawi
R Kune
R Rudek
R Titos-Gil
S Rivas-Gomez
SC Dhatrika
V Rajaraman
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Towards EXtreme scale technologies and accelerators for euROhpc hw/Sw supercomputing applications for exascale: The TEXTAROSSA approach

In the near future, Exascale systems will need to bridge three technology gaps to achieve high performance while remaining under tight power constraints: energy efficiency and thermal control; extreme computation efficiency via HW acceleration and new arithmetic; methods and tools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA addresses these gaps through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models, and tools derived from European research

Archivio della Ricerca - Università di Pisa

Molecular Signatures of Environmental Mutagens in Hepatocellular Carcinoma

Crossref

The age of heterozygous telomerase mutant parents influences the adult phenotype of their offspring irrespective of genotype in zebrafish

Crossref

Helicobacter pylori Infection Status Correlates with Serum Parameter Levels Responding to Multi-organ Functions

Author: A Jafarzadeh
A Lundgren
B Schottker
C Hiew
Dong Nannan
E Roux-Goglin Le
EZ Jia
F Russo
G Chimienti
H Ataseven
H Satoh
H Suzuki
I Sterzl
IL Taylor
IM Samloff
IM Samloff
J Kountouras
JG Kusters
JM Kim
K Schimke
K Varis
K Varis
KA Eaton
L Ben Mahmoud
LD Silva
Liu Jingwei
LP Sun
M Kanbay
M Kucukazman
MJ Park
MR Ki
P Sipponen
R Akashi
R Malik
R Ramharack
S Lorente
S Shiota
SM Dobbs
T Kikuchi
T Ohkusa
TJ Karttunen
W Xia
Wang Wei
Y Chen
Yuan Yuan
Yuehua Gong
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref