Search CORE

16 research outputs found

On the use of many-core machines for the acceleration of a mesh truncation technique for FEM

Author: Amor-Martin Adrian
Belloch Rodríguez José Antonio
Garcia-Castillo Luis E.
García-Donoro Daniel
Martínez Zaldívar Francisco José
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2019
Field of study

Finite element method (FEM) has been used for years for radiation problems in the field of electromagnetism. To tackle problems of this kind, mesh truncation techniques are required, which may lead to the use of high computational resources. In fact, electrically large radiation problems can only be tackled using massively parallel computational resources. Different types of multi-core machines are commonly employed in diverse fields of science for accelerating a number of applications. However, properly managing their computational resources becomes a very challenging task. On the one hand, we present a hybrid message passing interface + OpenMP-based acceleration of a mesh truncation technique included in a FEM code for electromagnetism in a high-performance computing cluster equipped with 140 compute nodes. Results show that we obtain about 85% of the theoretical maximum speedup of the machine. On the other hand, a graphics processing unit has been used to accelerate one of the parts that presents high fine-grain parallelism.This work has been fnancially supported by TEC2016-80386-P, TIN2017-82972-R, CAM S2013/ICE-3004 projects and “Ayudas para contratos predoctorales de Formación del Profesorado Universitario FPU”

RiuNet

Universidad Carlos III de Madrid e-Archivo

High-throughput Protein Sequence Alignment on Multi-core Systems

Author: Ali Syed Asad
Hasan Laiq
Yahya Muhammad
Publication venue: 'Penerbit UTHM'
Publication date: 03/08/2020
Field of study

Rapid evolution in sequencing technologies results in generating data on an enormous scale. A focal and main challenge in analyzing data at such a large scale is the alignment of the DNA/Protein sequences, whereby reads are compared to the reference sequences. To find similar sequences, alignment algorithms are used to align a query sequence with the database. Alignment algorithms can be utilized to classify the source of a sequence, to discover similarities among the organisms, or to deduce a progenitor connection. A wide range of algorithms for alignment has been developed in recent years.In this paper, an accurate method of accelerating such algorithms using GPUs has been investigated. A Swiss-Prot database has been processed using GPU implemented Smith-Waterman Sequence Alignment Algorithm. The first step in the process generates the alignment scores but not the actual alignment. Various available alignment tools like ssearch2 are then utilized to align the output file generated during the first step.The performance of GPU-accelerated implementation as compared to other techniques is then evaluated for performance /throughput improvement. Swiss-Prot database was aligned using various alignment tools. NVIDIA TESLA K40 GPU is being utilized for generating the results for this research. This implementation achieves the performance of 44.3 Giga cell updates per second (GCUPS), which is 22.9 times better than its implementation on GTX 275. Performance is improved as the workload of sequences of equal length is equally distributed among all the threads on Multiprocessors of GPU

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

International Journal of Integrated Engineering

Fine-Grained Parallel Genomic Sequence Comparison

Author: Dominique Lavenier
Publication venue: 'IntechOpen'
Publication date: 01/01/2010
Field of study

IntechOpen

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units

Author: Liu Yongchao
Maskell Douglas L
Schmidt Bertil
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware. Findings Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card) provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS. Conclusion CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Accelerating large-scale protein structure alignments with graphics processing units

Author: Becchi Michela
Korkin Dmitry
Pang Bin
Shyu Chi-Ren
Zhao Nan
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present <it>ppsAlign</it>, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, <it>ppsAlign </it>could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated <it>ppsAlign </it>on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions <it>ppsAlign </it>is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions

Author: A Szalkowski
A Wirawan
A Wozniak
Bertil Schmidt
Douglas L Maskell
E Lindholm
G Peris
J Nickolls
JD Thompson
JP Comet
M Farrar
MA Larkin
O Bastien
O Gotoh
SA Manavski
SF Altschul
SF Altschul
T Oliver
T Oliver
T Rognes
T Smith
TI Li
W Liu
WR Pearson
Y Liu
Yongchao Liu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. Findings This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA). A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT) abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD) abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72) times using the optimized SIMT algorithm and up to 1.77 (1.66) times using the partitioned vectorized algorithm, with a performance of up to 17 (30) billion cells update per second (GCUPS) on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295) graphics card. Conclusions CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Accelerating pairwise statistical significance estimation for local alignment by harvesting GPU's power

Author: A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Mitrophanov
A Poleksic
A Samuel
AA Schäffer
Alok Choudhary
Ankit Agrawal
C Camacho
D Honbo
DS Roos
L Ligowski
M Pagni
M Waterman
Md Mostofa Ali Patwary
ML Sierk
ML Sierk
NVIDIA
NVIDIA
P Aleksandar
R Mott
R O
S Altschul
S Karlin
S Manavski
S Ryoo
S Yooseph
S Zuyderduyn
Sanchit Misra
SF Altschul
SR Eddy
T Rognes
T Smith
W Liu
W Pearson
W Pearson
Wei-keng Liao
WR Pearson
Y Liu
Y Liu
Y Yu
Y Yu
Y Zhang
Y Zhang
Yuhong Zhang
Zhiguang Qin
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Coupling SIMD and SIMT Architectures to Boost Performance of a Phylogeny-aware Alignment Kernel

Author: Alachiotis N.
Berger S.
Stamatakis A.
Publication venue: BioMed Central
Publication date: 13/02/2013
Field of study

Background: Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to reference alignments and corresponding evolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles that correspond to the branches of such a reference tree. The algorithm needs to perform an immense number of pairwise alignments. Therefore, we explore vector intrinsics and GPUs to accelerate the PaPaRa alignment kernel. Results: We optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (Single Instruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, we obtained a 9-fold acceleration on a single core as well as linear speedups with respect to the number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga Cell Updates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4 GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. We also used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, Multiple Threads) architecture. A NVIDIA GeForce 560 GPU delivered peak and average performance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD and SIMT implementations into a hybrid CPU-GPU system that achieved an accumulated peak performance of 33.8 GCUPS. Conclusions: This accelerated version of PaPaRa (available at www.exelixis-lab.org/software.html) provides a significant performance improvement that allows for analyzing larger datasets in less time. We observe that state-of-the-art SIMD and SIMT architectures deliver comparable performance for this dynamic programming kernel when the “competing programmer approach” is deployed. Finally, we show that overall performance can be substantially increased by designing a hybrid CPU-GPU system with appropriate load distribution mechanisms

KITopen

Lock-free Parallel Dynamic Programming

Author: García de la Banda M.
Hermenegildo Manuel V.
Stivala A.
Stuckey P.J.
Wirth A.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2010
Field of study

We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

University of Melbourne Institutional Repository

Archivo Digital UPM

CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions

Author: A Khajeh-Saeed
A Szalkowski
A Wirawan
A Wozniak
Adrianto Wirawan
B Alpern
Bertil Schmidt
C Camacho
CM Liu
D Hains
E Lindholm
H Li
J Blazewicz
J Qiu
JD Thompson
L Ligowski
M Farrar
N Alachiotis
NVIDIA
NVIDIA
NVIDIA
O Gotoh
SA Manavski
SF Altschul
SF Altschul
T Oliver
T Oliver
T Rognes
T Rognes
T Smith
TI Li
W Liu
WR Pearson
Y Liu
Y Liu
Y Liu
Y Liu
Yongchao Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref