19 research outputs found

    Smith-Waterman Acceleration in Multi-GPUs: A Performance per Watt Analysis

    Get PDF
    Artículo publicado en el libro de actas del congreso.We present a performance per watt analysis of CUDAlign 4.0, a parallel strategy to obtain the optimal alignment of huge DNA se- quences in multi-GPU platforms using the exact Smith-Waterman method. Speed-up factors and energy consumption are monitored on different stages of the algorithm with the goal of identifying advantageous sce- narios to maximize acceleration and minimize power consumption. Ex- perimental results using CUDA on a set of GeForce GTX 980 GPUs illustrate their capabilities as high-performance and low-power devices, with a energy cost to be more attractive when increasing the number of GPUs. Overall, our results demonstrate a good correlation between the performance attained and the extra energy required, even in scenarios where multi-GPUs do not show great scalability.Universidad de Málaga, Campus de Excelencia Internacional Andalucía Tech

    SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs

    Full text link
    Sequence alignment forms an important backbone in many sequencing applications. A commonly used strategy for sequence alignment is an approximate string matching with a two-dimensional dynamic programming approach. Although some prior work has been conducted on GPU acceleration of a sequence alignment, we identify several shortcomings that limit exploiting the full computational capability of modern GPUs. This paper presents SaLoBa, a GPU-accelerated sequence alignment library focused on seed extension. Based on the analysis of previous work with real-world sequencing data, we propose techniques to exploit the data locality and improve workload balancing. The experimental results reveal that SaLoBa significantly improves the seed extension kernel compared to state-of-the-art GPU-based methods.Comment: Published at IPDPS'2

    Fast reconstruction of 3D volumes from 2D CT projection data with GPUs

    Get PDF
    cited By 0International audienceMeso-F.E. modelling of 3D textile composites is a powerful tool, which can help determine mechanical properties and permeability of the reinforcements or composites. The quality of the meso F.E. analyses depends on the quality of the initial model. A direct method based on X-ray tomography imaging is introduced to determine finite element models based on the real geometry of 3D composite reinforcements. The method is particularly suitable regarding 3D textile reinforcements for which internal geometries are numerous and complex. An analysis of the image's texture is performed. A hyperelastic model developed for fibre bundles is used for the simulation of the deformation of the 3D reinforcement. © EDP Sciences, 2016

    GPU acceleration of Levenshtein distance computation between long strings

    Get PDF
    Computing edit distance for very long strings has been hampered by quadratic time complexity with respect to string length. The WFA algorithm reduces the time complexity to a quadratic factor with respect to the edit distance between the strings. This work presents a GPU implementation of the WFA algorithm and a new optimization that can halve the elements to be computed, providing additional performance gains. The implementation allows to address the computation of the edit distance between strings having hundreds of millions of characters. The performance of the algorithm depends on the similarity between the strings. For strings longer than million characters, the performance is the best ever reported, which is above TCUPS for strings with similarities greater than 70% and above one hundred TCUPS for 99.9% similarity.This research was supported by the European Union Regional Development Fund (ERDF) within the framework of the ERDF Operational Program of Catalonia 2014–2020 with a grant of 50% of the total cost eligible under the Designing RISC-V based Accelerators for next generation computers project (DRAC) [001-P-001723], in part by the Catalan Government under grant 2017-SGR-1624, and in part by the Spanish Ministry of Science, Innovation and Universities under grant RTI2018-095209-B-C22.Peer ReviewedPostprint (published version

    GPU acceleration of Levenshtein distance computation between long strings

    Get PDF
    Altres ajuts: acords transformatius de la UABComputing edit distance for very long strings has been hampered by quadratic time complexity with respect to string length. The WFA algorithm reduces the time complexity to a quadratic factor with respect to the edit distance between the strings. This work presents a GPU implementation of the WFA algorithm and a new optimization that can halve the elements to be computed, providing additional performance gains. The implementation allows to address the computation of the edit distance between strings having hundreds of millions of characters. The performance of the algorithm depends on the similarity between the strings. For strings longer than million characters, the performance is the best ever reported, which is above TCUPS for strings with similarities greater than 70% and above one hundred TCUPS for 99.9% similarity

    Formalization of block pruning: reducing the number of cells computed in exact biological sequence comparison algorithms

    Get PDF
    This is a pre-copyedited, author-produced version of an article accepted for publication in The Computer Journal following peer review. The version of record Edans F O Sandes, George L M Teodoro, Maria Emilia M T Walter, Xavier Martorell, Eduard Ayguade, Alba C M A Melo; Formalization of Block Pruning: Reducing the Number of Cells Computed in Exact Biological Sequence Comparison Algorithms, The Computer Journal, Volume 61, Issue 5, 1 May 2018, Pages 687–713 is available online at: The Computer Journal https://academic.oup.com/comjnl/article-abstract/61/5/687/4539903 and https://doi.org/10.1093/comjnl/bxx090.Biological sequence comparison algorithms that compute the optimal local and global alignments calculate a dynamic programming (DP) matrix with quadratic time complexity. The DP matrix H is calculated with a recurrence relation in which the value of each cell Hi,j is the result of a maximum operation on the cells’ values Hi-1,j-1, Hi-1,j and Hi,j-1 added or subtracted by a constant value. Therefore, it can be noticed that the difference between the value of cell Hi,j being calculated and the values of direct neighbor cells previously computed respect well-defined upper and lower bounds. Using these bounds, we can show that it is possible to determine the maximum and the minimum value of every cell in H, for a given reference cell. We use this result to define a generic pruning method which determines the cells that can pruned (i.e. no need to be computed since they will not contribute to the final solution), accelerating the computation but keeping the guarantee that the optimal result will be produced. The goal of this paper is thus to investigate and formalize properties of the DP matrix in order to estimate and increase the pruning method efficiency. We also show that the pruning efficiency depends mainly on three characteristics: (a) the order in which the cells of H are calculated, (b) the values of the parameters used in the recurrence relation and (c) the contents of the sequences compared.Peer ReviewedPostprint (author's final draft

    Avaliação de estratégias Alignment-free para determinar o fator de pruning em comparação paralela de sequências

    Get PDF
    Trabalho de Conclusão de Curso (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2019.A Bioinformática tem como uma de suas principais operações a comparação entre se- quências biológicas. Através de métodos exatos baseados no alinhamento, são obtidos resultados ótimos, porém o tempo para obtenção desses alinhamentos é muito grande. Com a adição da técnica de block pruning, o tempo pode ser bastante reduzido. Porém, no block pruning a área de poda é obtida ao longo da execução do alinhamento. Para determinar o tempo de execução de uma comparação com pruning antes do alinhamento existe uma regressão multi-linear. No entanto, essa fórmula requer que o fator de block pruning seja conhecido antes da execução da comparação, o que não acontece. O presente trabalho de graduação tem como objetivo analisar algoritmos que fazem a comparação sem se basear no alinhamento, alignment-free, e que são executados rapidamente, com o objetivo de utilizá-los para gerar os valores de block pruning antes do alinhamento das sequências. Os resultados da análise de diferentes algoritmos mostraram que a ferramenta ALF-N2 (ALignment-Free framework), através de seu algoritmo N2, apresenta resultados satisfatórios para a geração de uma correlação entre o fator de block pruning e a simila- ridade entre as sequências por meio do alignment-free. Usando a estratégia proposta no presente trabalho de graduação, conseguimos obter taxas de block pruning com muito boa acurácia, apresentando erros de até 7,0% para sequências pequenas e 3,7% para sequências maiores.Sequence comparison is one of the most basic operations in Bioinformatics. With alignment- based exact methods it is possible to obtain very good results, but these methods require huge execution times, when the sequences are long. Block pruning is an optimization that may reduce considerably the execution times if the sequences involved have high similarity. In order to predict execution times of comparisons with pruning, a multiple re- gression formula was proposed. However, this formula requires that the block pruning rate is known before the comparison, which is not the case, since the pruned area is computed during the alignment process. This graduation project evaluates different alignment-free algorithms that execute much faster than the exact methods, with the goal to generate block pruning values before the sequences alignment itself. The results of the evaluation of various algorithms showed that the ALF-N2 tool provided very good results using its algorithm N2. With the ALF-N2 tool, we were able to generate a correlation between block pruning percentages and sequences similarity. Using the proposed strategy, we were able to obtain block pruning rates that are very close to the real ones, with maximum error rates of 7.0% for small sequences and 3.7% for longer ones

    ALFALFA : fast and accurate mapping of long next generation sequencing reads

    Get PDF

    DynaProg for Scala

    Get PDF
    Dynamic programming is an algorithmic technique to solve problems that follow the Bellman’s principle: optimal solutions depends on optimal sub-problem solutions. The core idea behind dynamic programming is to memoize intermediate results into matrices to avoid multiple computations. Solving a dynamic programming problem consists of two phases: filling one or more matrices with intermediate solutions for sub-problems and recomposing how the final result was constructed (backtracking). In textbooks, problems are usually described in terms of recurrence relations between matrices elements. Expressing dynamic programming problems in terms of recursive formulae involving matrix indices might be difficult, if often error prone, and the notation does not capture the essence of the underlying problem (for example aligning two sequences). Moreover, writing correct and efficient parallel implementation requires different competencies and often a significant amount of time. In this project, we present DynaProg, a language embedded in Scala (DSL) to address dynamic programming problems on heterogeneous platforms. DynaProg allows the programmer to write concise programs based on ADP [1], using a pair of parsing grammar and algebra; these program can then be executed either on CPU or on GPU. We evaluate the performance of our implementation against existing work and our own hand-optimized baseline implementations for both the CPU and GPU versions. Experimental results show that plain Scala has a large overhead and is recommended to be used with small sequences (≤1024) whereas the generated GPU version is comparable with existing implementations: matrix chain multiplication has the same performance as our hand-optimized version (142% of the execution time of [2]) for a sequence of 4096 matrices, Smith-Waterman is twice slower than [3] on a pair of sequences of 6144 elements, and RNA folding is on par with [4] (95% running time) for sequences of 4096 elements. [1] Robert Giegerich and Carsten Meyer. Algebraic Dynamic Programming. [2] Chao-Chin Wu, Jenn-Yang Ke, Heshan Lin and Wu Chun Feng. Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. [3] Edans Flavius de O. Sandes, Alba Cristina M. A. de Melo. Smith-Waterman alignment of huge sequences with GPU in linear space. [4] Guillaume Rizk and Dominique Lavenier. GPU accelerated RNA folding algorithm

    MASA-SSE : comparação de sequências biológicas utilizando instruções vetoriais

    Get PDF
    Monografia (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2015.A comparação de sequências biológicas é uma das operações mais básicas e importantes da Bioinformática. Os métodos exatos de comparação de sequências possuem complexidade quadrática de tempo e por isso soluções paralelas são utilizadas para acelerar a produção de resultados. O framework MASA [3] é uma solução paralela flexível e customizável que permite o alinhamento de sequências biológicas em diferentes hardwares e softwares. Ele foi inicialmente pensado para execução paralela da comparação de sequências em GPUs (Graphics Processing Units), porém, atualmente existem duas soluções MASA para CPU: MASA-CPU e MASA-OpenMP. Essas soluções não utilizam instruções vetoriais, deixando de explorar um grande potencial para paralelismo. O presente trabalho de graduação propõe e avalia o MASA-SSE, uma solução em CPU que utiliza as instruções vetoriais SSE da Intel, implementando o algoritmo de Farrar [6], que é considerado o estado da arte em comparação de sequências biológicas com instruções vetoriais. Os resultados obtidos a partir da comparação de várias sequências reais de DNA em duas máquinas distintas mostram que o MASA-SSE, executando em uma thread e, utilizando instruções vetoriais, possui desempenho superior ao do MASA-OpenMP com quatro threads. _____________________________________________________________________________ ABSTRACTBiological sequence comparison is one of the most basic and important operations in Bioinformatics. The exact methods that compare two biological sequences have quadratic time complexity and, for this reason, parallel solutions are often used to accelerate the execution. The MASA framework [3] is a flexible and customizable parallel solution for biological sequence comparison which was initially designed for GPU (Graphics Processing Unit) execution but nowadays integrates two CPU solutions: MASA-CPU and MASA-OpenMP. These CPU solutions do not use vector instructions and thus miss the opportunity of exploring a high potential for parallelism. This graduation project proposes and evaluates MASA-SSE, a CPU solution that uses the SSE vector instructions from Intel and implements the Farrar algorithm [6], which is the state-of-the-art algorithm for biological sequence comparison with vector instructions. Experimental results obtained with the comparison of real DNA sequences in two different machines show that MASA-SSE, executing with one thread and vector instructions, outperforms MASA-OpemMP, execution with four threads
    corecore