258 research outputs found

    SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs

    Full text link
    Sequence alignment forms an important backbone in many sequencing applications. A commonly used strategy for sequence alignment is an approximate string matching with a two-dimensional dynamic programming approach. Although some prior work has been conducted on GPU acceleration of a sequence alignment, we identify several shortcomings that limit exploiting the full computational capability of modern GPUs. This paper presents SaLoBa, a GPU-accelerated sequence alignment library focused on seed extension. Based on the analysis of previous work with real-world sequencing data, we propose techniques to exploit the data locality and improve workload balancing. The experimental results reveal that SaLoBa significantly improves the seed extension kernel compared to state-of-the-art GPU-based methods.Comment: Published at IPDPS'2

    MASA-StarPU: Parallel Sequence Comparison with Multiple Scheduling Policies and Pruning

    Get PDF
    International audienceSequence comparison tools based on the Smith-Waterman (SW) algorithm provide the optimal result but have high execution times when the sequences compared are long, since a huge dynamic programming (DP) matrix is computed. Block pruning is an optimization that does not compute some parts of the DP matrix and can reduce considerably the execution time when the sequences compared are similar. However, block pruning's resulting task graph is dynamic and irregular. Since different pruning scenarios lead to different pruning shapes, we advocate that no single scheduling policy will behave the best for all scenarios. This paper proposes MASA-StarPU, a sequence aligner that integrates the domain specific framework MASA to the generic programming environment StarPU, creating a tool which has the benefits of StarPU (i.e., multiple task scheduling policies) and MASA (i.e., fast sequence alignment). MASA-StarPU was executed in two different multicore platforms and the results show that a bad choice of the scheduling policy may have a great impact on the performance. For instance, using 24 cores, the 5M x 5M comparison took 1484s with the dmdas policy whereas the same comparison took 3601s with lws. We also show that no scheduling policy behaves the best for all scenarios

    Smith-Waterman Acceleration in Multi-GPUs: A Performance per Watt Analysis

    Get PDF
    Artículo publicado en el libro de actas del congreso.We present a performance per watt analysis of CUDAlign 4.0, a parallel strategy to obtain the optimal alignment of huge DNA se- quences in multi-GPU platforms using the exact Smith-Waterman method. Speed-up factors and energy consumption are monitored on different stages of the algorithm with the goal of identifying advantageous sce- narios to maximize acceleration and minimize power consumption. Ex- perimental results using CUDA on a set of GeForce GTX 980 GPUs illustrate their capabilities as high-performance and low-power devices, with a energy cost to be more attractive when increasing the number of GPUs. Overall, our results demonstrate a good correlation between the performance attained and the extra energy required, even in scenarios where multi-GPUs do not show great scalability.Universidad de Málaga, Campus de Excelencia Internacional Andalucía Tech

    Inference of Many-Taxon Phylogenies

    Get PDF
    Phylogenetic trees are tree topologies that represent the evolutionary history of a set of organisms. In this thesis, we address computational challenges related to the analysis of large-scale datasets with Maximum Likelihood based phylogenetic inference. We have approached this using different strategies: reduction of memory requirements, reduction of running time, and reduction of man-hours

    Masa-StarPU : estratégia com múltiplas políticas de escalonamento de tarefas para alinhamento de sequências com pruning

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2020.A comparação de sequências biológicas é uma tarefa importante executada com frequência na análise genética de organismos. Algoritmos que realizam este procedimento utilizando um método exato possuem complexidade quadrática de tempo, demandando alto poder computacional e uso de técnicas de paralelização. Muitas soluções têm sido propostas para tratar este problema utilizam aceleradores como GPUs e FPGAs, porém poucas soluções utilizam apenas CPUs. O MASA é uma ferramenta multiplataforma específica para realizar a comparação de sequências biológicas. Uma de suas maiores virtudes é a otimização block pruning que realiza a poda da matriz de programação dinâmica em tempo de execução acelerando o processamento, porém introduzindo um problema de desbalanceamento de carga. O StarPU é uma ferramenta de programação paralela que possui implementações de diversas políticas de escalonamento dinâmico de tarefas. Neste trabalho, propomos e avaliamos o MASA-StarPU, uma ferramenta que utiliza a estrutura do MASA para realizar a comparação de sequências biológicas e as políticas do StarPU adequadas ao block pruning com o objetivo de eliminar o problema de desbalanceamento de carga. O MASA-StarPU foi testado em dois ambientes, avaliando pares de sequências de DNA cujos tamanhos variam entre 10 KBP (milhares de pares de bases) e 47 MBP (milhões de pares de bases), e as políticas de escalonamento de tarefas foram avaliadas em diferentes casos. Quando comparado com outras soluções da literatura que utilizam apenas CPU, o MASA-StarPU obteve o melhor resultado para todas as comparações. O MASA-StarPU atingiu o máximo de 18,4 GCUPS (bilhões de células atualizadas por segundo).The comparison of biological sequences is an important task performed frequently in the genetic analysis of organisms. Algorithms that perform this procedure using an exact method have quadratic time complexity, demanding high computational power and, consequently parallelization techniques. Many solutions have been proposed to address this problem using accelerators such as GPUs and FPGAs, but few solutions use only CPUs. MASA is a domain-specific platform for performing biological sequence comparison. One of its greatest virtues is the optimization block pruning. Which prunes the dynamic programming matrix at run time introducing load imbalance. StarPU is a generic parallel programming tool that provides several dynamic task scheduling policies. In this work, we propose and evaluate MASA-StarPU, a tool that uses the MASA structure to carry out the comparison of biological sequences and uses the StarPU policies to accelerate the computation. MASA-StarPU was tested in two environments, evaluating pairs of DNA sequences whose sizes vary between 10 KBP (thousands of base pairs) and 47 MBP (millions of pairs of bases), and multiple task scheduling policies were evaluated in different cases. When compared to other solutions in the literature that use only CPU, MASA-StarPU obtained the best result for all comparisons and reached a maximum of 18.4 GCUPS (billions of cells updated by second)

    Inexact Mapping of Short Biological Sequences in High Performance Computational Environments

    Full text link
    La bioinformática es la aplicación de las ciencias computacionales a la gestión y análisis de datos biológicos. A partir de 2005, con la aparición de los secuenciadores de ADN de nueva generación surge lo que se conoce como Next Generation Sequencing o NGS. Un único experimento biológico puesto en marcha en una máquina de secuenciación NGS puede producir fácilmente cientos de gigabytes o incluso terabytes de datos. Dependiendo de la técnica elegida este proceso puede realizarse en unas pocas horas o días. La disponibilidad de recursos locales asequibles, tales como los procesadores multinúcleo o las nuevas tarjetas gráfi cas preparadas para el cálculo de propósito general GPGPU (General Purpose Graphic Processing Unit ), constituye una gran oportunidad para hacer frente a estos problemas. En la actualidad, un tema abordado con frecuencia es el alineamiento de secuencias de ADN. En bioinformática, el alineamiento permite comparar dos o más secuencias de ADN, ARN, o estructuras primarias proteicas, resaltando sus zonas de similitud. Dichas similitudes podrían indicar relaciones funcionales o evolutivas entre los genes o proteínas consultados. Además, la existencia de similitudes entre las secuencias de un individuo paciente y de otro individuo con una enfermedad genética detectada podría utilizarse de manera efectiva en el campo de la medicina diagnóstica. El problema en torno al que gira el desarrollo de la tesis doctoral consiste en la localización de fragmentos de secuencia cortos dentro del ADN. Esto se conoce bajo el sobrenombre de mapeo de secuencia o sequence mapping. Dicho mapeo debe permitir errores, pudiendo mapear secuencias incluso existiendo variabilidad genética o errores de lectura en el mapeo. Existen diversas técnicas para abordar el mapeo, pero desde la aparición de la NGS destaca la búsqueda por pre jos indexados y agrupados mediante la transformada de Burrows-Wheeler [28] (o BWT en lo sucesivo). Dicha transformada se empleó originalmente en técnicas de compresión de datos, como es el caso del algoritmo bzip2. Su utilización como herramienta para la indización y búsqueda posterior de información es más reciente [22]. La ventaja es que su complejidad computacional depende únicamente de la longitud de la secuencia a mapear. Por otra parte, una gran cantidad de técnicas de alineamiento se basan en algoritmos de programación dinámica, ya sea Smith-Watterman o modelos ocultos de Markov. Estos proporcionan mayor sensibilidad, permitiendo mayor cantidad de errores, pero su coste computacional es mayor y depende del tamaño de la secuencia multiplicado por el de la cadena de referencia. Muchas herramientas combinan una primera fase de búsqueda con la BWT de regiones candidatas al alineamiento y una segunda fase de alineamiento local en la que se mapean cadenas con Smith-Watterman o HMM. Cuando estamos mapeando permitiendo pocos errores, una segunda fase con un algoritmo de programación dinámica resulta demasiado costosa, por lo que una búsqueda inexacta basada en BWT puede resultar más e ficiente. La principal motivación de la tesis doctoral es la implementación de un algoritmo de búsqueda inexacta basado únicamente en la BWT, adaptándolo a las arquitecturas paralelas modernas, tanto en CPU como en GPGPU. El algoritmo constituirá un método nuevo de rami cación y poda adaptado a la información genómica. Durante el periodo de estancia se estudiarán los Modelos ocultos de Markov y se realizará una implementación sobre modelos de computación funcional GTA (Aggregate o Test o Generate), así como la paralelización en memoria compartida y distribuida de dicha plataforma de programación funcional.Salavert Torres, J. (2014). Inexact Mapping of Short Biological Sequences in High Performance Computational Environments [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/43721TESI

    IMPROVING BWA-MEM WITH GPU PARALLEL COMPUTING

    Get PDF
    Due to the many advances made in designing algorithms, especially the ones used in bioinformatics, it is becoming harder and harder to improve their efficiencies. Therefore, hardware acceleration using General-Purpose computing on Graphics Processing Unit has become a popular choice. BWA-MEM is an important part of the BWA software package for sequence mapping. Because of its high speed and accuracy, we choose to parallelize the popular short DNA sequence mapper. BWA has been a prevalent single node tool in genome alignment, and it has been widely studied for acceleration for a long time since the first version of the BWA package came out. This thesis presents the Big Data GPGPU distributed BWA-MEM, a tool that combines GPGPU acceleration and distributed computing. The four hardware parallelization techniques used are CPU multi-threading, GPU paralleled, CPU distributed, and GPU distributed. The GPGPU distributed software typically outperforms other parallelization versions. The alignment is performed on a distributed network, and each node in the network executes a separate GPGPU paralleled version of the software. We parallelize the chain2aln function in three levels. In Level 1, the function ksw\_extend2, an algorithm based on Smith-Waterman, is parallelized to handle extension on one side of the seed. In Level 2, the function chain2aln is parallelized to handle chain extension, where all seeds within the same chain are extended. In Level 3, part of the function mem\_align1\_core is parallelized for extending multiple chains. Due to the program's complexity, the parallelization work was limited at the GPU version of ksw\_extend2 parallelization Level 3. However, we have successfully combined Spark with BWA-MEM and ksw\_extend2 at parallelization Level 1, which has shown that the proposed framework is possible. The paralleled Level 3 GPU version of ksw\_extend2 demonstrated noticeable speed improvement with the test data set

    Avaliação de estratégias Alignment-free para determinar o fator de pruning em comparação paralela de sequências

    Get PDF
    Trabalho de Conclusão de Curso (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2019.A Bioinformática tem como uma de suas principais operações a comparação entre se- quências biológicas. Através de métodos exatos baseados no alinhamento, são obtidos resultados ótimos, porém o tempo para obtenção desses alinhamentos é muito grande. Com a adição da técnica de block pruning, o tempo pode ser bastante reduzido. Porém, no block pruning a área de poda é obtida ao longo da execução do alinhamento. Para determinar o tempo de execução de uma comparação com pruning antes do alinhamento existe uma regressão multi-linear. No entanto, essa fórmula requer que o fator de block pruning seja conhecido antes da execução da comparação, o que não acontece. O presente trabalho de graduação tem como objetivo analisar algoritmos que fazem a comparação sem se basear no alinhamento, alignment-free, e que são executados rapidamente, com o objetivo de utilizá-los para gerar os valores de block pruning antes do alinhamento das sequências. Os resultados da análise de diferentes algoritmos mostraram que a ferramenta ALF-N2 (ALignment-Free framework), através de seu algoritmo N2, apresenta resultados satisfatórios para a geração de uma correlação entre o fator de block pruning e a simila- ridade entre as sequências por meio do alignment-free. Usando a estratégia proposta no presente trabalho de graduação, conseguimos obter taxas de block pruning com muito boa acurácia, apresentando erros de até 7,0% para sequências pequenas e 3,7% para sequências maiores.Sequence comparison is one of the most basic operations in Bioinformatics. With alignment- based exact methods it is possible to obtain very good results, but these methods require huge execution times, when the sequences are long. Block pruning is an optimization that may reduce considerably the execution times if the sequences involved have high similarity. In order to predict execution times of comparisons with pruning, a multiple re- gression formula was proposed. However, this formula requires that the block pruning rate is known before the comparison, which is not the case, since the pruned area is computed during the alignment process. This graduation project evaluates different alignment-free algorithms that execute much faster than the exact methods, with the goal to generate block pruning values before the sequences alignment itself. The results of the evaluation of various algorithms showed that the ALF-N2 tool provided very good results using its algorithm N2. With the ALF-N2 tool, we were able to generate a correlation between block pruning percentages and sequences similarity. Using the proposed strategy, we were able to obtain block pruning rates that are very close to the real ones, with maximum error rates of 7.0% for small sequences and 3.7% for longer ones

    SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences

    Get PDF
    Background: The Smith-Waterman (SW) algorithm is the best choice for searching similar regions between two DNA or protein sequences. However, it may become impracticable in some contexts due to its high computational demands. Consequently, the computer science community has focused on the use of modern parallel architectures such as Graphics Processing Units (GPUs), Xeon Phi accelerators and Field Programmable Gate Arrays (FGPAs) to speed up large-scale workloads. Results: This paper presents and evaluates SWIFOLD: a Smith-Waterman parallel Implementation on FPGA with OpenCL for Long DNA sequences. First, we evaluate its performance and resource usage for different kernel configurations. Next, we carry out a performance comparison between our tool and other state-of-the-art implementations considering three different datasets. SWIFOLD offers the best average performance for small and medium test sets, achieving a performance that is independent of input size and sequence similarity. In addition, SWIFOLD provides competitive performance rates in comparison with GPU-based implementations on the latest GPU generation for the large dataset. Conclusions: The results suggest that SWIFOLD can be a serious contender for accelerating the SW alignment of DNA sequences of unrestricted size in an affordable way reaching on average 125 GCUPS and almost a peak of 270 GCUPS.Instituto de Investigación en Informátic
    corecore