Search CORE

661 research outputs found

Recommended from our members

Computational Strategies for Scalable Genomics Analysis.

Author: Shi Lizhen
Wang Zhong
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The revolution in next-generation DNA sequencing technologies is leading to explosive data growth in genomics, posing a significant challenge to the computing infrastructure and software algorithms for genomics analysis. Various big data technologies have been explored to scale up/out current bioinformatics solutions to mine the big genomics data. In this review, we survey some of these exciting developments in the applications of parallel distributed computing and special hardware to genomics. We comment on the pros and cons of each strategy in the context of ease of development, robustness, scalability, and efficiency. Although this review is written for an audience from the genomics and bioinformatics fields, it may also be informative for the audience of computer science with interests in genomics applications

eScholarship - University of California

On Modern Offloading Parallelization Methods: A Critical Analysis of OpenMP

Author: Carrion Scott Carlos
Publication venue
Publication date: 24/07/2021
Field of study

The very concept of offloading computationally complex routines to a graphics processing unit for general-purpose computing is a problem left wide open to the academic community, both in terms of application as well as implementation, with several different and popular interfaces exploding into popularity within the last twenty years. The OpenMP standard is among the elites in this category, standing as a parallelization interface that has stood the test of time. The goals that the inquiry presented herein seeks to answer are twofold: Firstly, we aim to assess the performance of common sorting algorithms parallelized and offloaded using OpenMP, offloaded to NVIDIA GPU hardware, and secondly, to critically analyze the programmer experience in using an implementation of the OpenMP standard (again, with offloading to NVIDIA GPU hardware) to implement these algorithms. For completeness, the empirical analysis contains a comparison to the unparallelized algorithms. From this data and the impression of the programming experience, strengths and weaknesses of usage of OpenMP for parallelizing and offloading sorting algorithms are derived. After discussing each benchmark in depth, as well as the data derived from the parallelized implementations of each, we found that OpenMP’s position as one of the forefront parallel programming standards is well-justified, with few, but notable, pitfalls for the average programmer. In terms of its performance in parallelizing common sorting algorithms with offloading to NVIDIA GPU hardware, it was found that OpenMP fails to deliver viable implementations of the algorithms that are advantageous over their single-threaded counterparts, though, this was found not to be the fault of OpenMP, but rather, of the inherent nature of offloading to NVIDIA GPU hardware

Texas A&M Repository

High Performance with Prescriptive Optimization and Debugging

Author: Jensen Nicklas Bo
Publication venue: Technical University of Denmark
Publication date: 01/01/2017
Field of study

Online Research Database In Technology

Infrastructure Plan for ASC Petascale Environments

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Implementation of MPICH on top of MPLi̲te

Author: Selvarajan Shoba
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2002
Field of study

The goal of this thesis is to develop a new Channel Interface device for the MPICH implementation of the MPI (Message Passing Interface) standard using MPLi̲te. MPLi̲te is a lightweight message-passing library that is not a full MPI implementation, but offers high performance. MPICH (Message Passing Interface CHameleon) is a full implementation of the MPI standard that has the p4 library as the underlying communication device for TCP/IP networks. By integrating MPLi̲te as a Channel Interface device in MPICH, a parallel programmer can utilize the full MPI implementation of MPICH as well as the high bandwidth offered by MPLi̲te. There are several layers in the MPICH library where one can tie a new device. The Channel Interface is the lowest layer that requires very few functions to add a new device. By attaching MPLi̲te to MPICH at the lowest level, the Channel Interface, almost all of the performance of the MPLi̲te library can be delivered to the applications using MPICH. MPLi̲te can be implemented either as a blocking or a non-blocking Channel Interface device. The performance was measured on two separate test clusters, the PC and the Alpha mini-clusters, having Gigabit Ethernet connections. The PC cluster has two 1.8 GHz Pentium 4 PCs and the Alpha cluster has two 500 MHz Compaq DS20 workstations. Different network interface cards like Netgear, TrendNet and SysKonnect Gigabit Ethernet cards were used for the measurements. Both the blocking and non-blocking MPICH-MPLi̲te Channel Interface devices perform close to raw TCP, whereas a performance loss of 25-30% is seen in the MPICH-p4 Channel Interface device for larger messages. The superior performance offered by the MPICH-MPLi̲te device compared to the MPICH-p4 device can be easily seen on the SysKonnect cards using jumbo frames. The throughput curve also improves considerably by increasing the Eager/Rendezvous threshold

Digital Repository @ Iowa State University (ISU)

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Author: Fridman Yehonatan
Oren Gal
Tamir Guy
Publication venue
Publication date: 09/04/2023
Field of study

Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs - the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs - were released to the market, with the oneAPI and GNU LLVM-backed compilation for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the potability of advanced directives (using SOLLVE's OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the vast majority of the offloading directives in v4.5 and 5.0 are supported in the latest oneAPI and GNU compilers; however, the support in v5.1 and v5.2 is still lacking. From the performance perspective, we found that PVC is up to 37% better than the A100 on the LULESH benchmark, presenting better performance in computing and data movements.Comment: 13 page

arXiv.org e-Print Archive