1,125 research outputs found
SkelCL: enhancing OpenCL for high-level programming of multi-GPU systems
Application development for modern high-performance systems with Graphics Processing Units (GPUs) currently relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs.
In this paper, we present SkelCL – a high-level programming approach for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL provides three main enhancements to the OpenCL standard: 1) computations are conveniently expressed using parallel algorithmic patterns (skeletons); 2) memory management is simplified using parallel
container data types (vectors and matrices); 3) an automatic data (re)distribution mechanism allows for implicit data movements between
GPUs and ensures scalability when using multiple GPUs. We demonstrate how SkelCL is used to implement parallel applications on one- and two-dimensional data. We report experimental results to evaluate our approach in terms of programming effort and performance
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
State-of-the-art in Smith-Waterman Protein Database Search on HPC Platforms
Searching biological sequence database is a common and repeated task in bioinformatics and molecular biology. The Smith–Waterman algorithm is the most accurate method for this kind of search. Unfortunately, this algorithm is computationally demanding and the situation gets worse due to the exponential growth of biological data in the last years. For that reason, the scientific community has made great efforts to accelerate Smith–Waterman biological database searches in a wide variety of hardware platforms. We give a survey of the state-of-the-art in Smith–Waterman protein database search, focusing on four hardware architectures: central processing units, graphics processing units, field programmable gate arrays and Xeon Phi coprocessors. After briefly describing each hardware platform, we analyse temporal evolution, contributions, limitations and experimental work and the results of each implementation. Additionally, as energy efficiency is becoming more important every day, we also survey performance/power consumption works. Finally, we give our view on the future of Smith–Waterman protein searches considering next generations of hardware architectures and its upcoming technologies.Instituto de Investigación en InformáticaUniversidad Complutense de Madri
A Fast Parallel Poisson Solver on Irregular Domains Applied to Beam Dynamic Simulations
We discuss the scalable parallel solution of the Poisson equation within a
Particle-In-Cell (PIC) code for the simulation of electron beams in particle
accelerators of irregular shape. The problem is discretized by Finite
Differences. Depending on the treatment of the Dirichlet boundary the resulting
system of equations is symmetric or `mildly' nonsymmetric positive definite. In
all cases, the system is solved by the preconditioned conjugate gradient
algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG)
preconditioning. We investigate variants of the implementation of SA-AMG that
lead to considerable improvements in the execution times. We demonstrate good
scalability of the solver on distributed memory parallel processor with up to
2048 processors. We also compare our SAAMG-PCG solver with an FFT-based solver
that is more commonly used for applications in beam dynamics
Efficiency of linked cell algorithms
The linked cell list algorithm is an essential part of molecular simulation
software, both molecular dynamics and Monte Carlo. Though it scales linearly
with the number of particles, there has been a constant interest in increasing
its efficiency, because a large part of CPU time is spent to identify the
interacting particles. Several recent publications proposed improvements to the
algorithm and investigated their efficiency by applying them to particular
setups. In this publication we develop a general method to evaluate the
efficiency of these algorithms, which is mostly independent of the parameters
of the simulation, and test it for a number of linked cell list algorithms. We
also propose a combination of linked cell reordering and interaction sorting
that shows a good efficiency for a broad range of simulation setups.Comment: Submitted to Computer Physics Communications on 22 December 2009,
still awaiting a referee repor
Comparative Analysis of Computationally Accelerated NGS Alignment
The Smith-Waterman algorithm is the basis of most current sequence alignment technology, which can be used to identify similarities between sequences for cancer detection and treatment because it provides researchers with potential targets for early diagnosis and personalized treatment. The growing number of DNA and RNA sequences available to analyze necessitates faster alignment processes than are possible with current iterations of the Smith-Waterman (S-W) algorithm. This project aimed to identify the most effective and efficient methods for accelerating the S-W algorithm by investigating recent advances in sequence alignment. Out of a total of 22 articles considered in this project, 17 articles had to be excluded from the study due to lack of standardization of data reporting. Only one study by Chen et al. obtained in this project contained enough information to compare accuracy and alignment speed. When accuracy was excluded from the criteria, five studies contained enough information to rank their efficiency. The study conducted by Rucci et al. was the fastest at 268.83 Giga Cell Updates Per Second (GCUPS), and the method by Pérez-Serrano et al. came close at 229.93 GCUPS while testing larger sequences. It was determined that reporting standards in this field are not sufficient, and the study by Chen et al. should set a benchmark for future reporting
- …