275 research outputs found

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    Algorithm Engineering for fundamental Sorting and Graph Problems

    Get PDF
    Fundamental Algorithms build a basis knowledge for every computer science undergraduate or a professional programmer. It is a set of basic techniques one can find in any (good) coursebook on algorithms and data structures. In this thesis we try to close the gap between theoretically worst-case optimal classical algorithms and the real-world circumstances one face under the assumptions imposed by the data size, limited main memory or available parallelism

    PASQUAL: Parallel Techniques for Next Generation Genome Sequence Assembly

    Full text link

    Fast, Parallel, and Cache-Friendly Suffix Array Construction

    Get PDF
    String indexes such as the suffix array (SA) and the closely related longest common prefix (LCP) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few scalable parallel algorithms for constructing these are known, and the existing algorithms can be highly non-trivial to implement and parallelize. In this paper we present CaPS-SA, a simple and scalable parallel algorithm for constructing these string indexes inspired by samplesort. Due to its design, CaPS-SA has excellent memory-locality and thus incurs fewer cache misses and achieves strong performance on modern multicore systems with deep cache hierarchies. We show that despite its simple design, CaPS-SA outperforms existing state-of-the-art parallel SA and LCP-array construction algorithms on modern hardware. Finally, motivated by applications in modern aligners where the query strings have bounded lengths, we introduce the notion of a bounded-context SA and show that CaPS-SA can easily be extended to exploit this structure to obtain further speedups

    Lightweight Massively Parallel Suffix Array Construction

    Get PDF
    The suffix array is an array of sorted suffixes in lexicographic order, where each sorted suffix is represented by its starting position in the input string. It is a fundamental data structure that finds various applications in areas such as string processing, text indexing, data compression, computational biology, and many more. Over the last three decades, researchers have proposed a broad spectrum of suffix array construction algorithms (SACAs). However, the majority of SACAs were implemented using sequential and parallel programming models. The maturity of GPU programming opened doors to the development of massively parallel GPU SACAs that outperform the fastest versions of suffix sorting algorithms optimized for the CPU parallel computing. Over the last five years, several GPU SACA approaches were proposed and implemented. They prioritized the running time over lightweight design. In this thesis, we design and implement a lightweight massively parallel SACA on the GPU using the prefix-doubling technique. Our prefix-doubling implementation is memory-efficient and can successfully construct the suffix array for input strings as large as 640 megabytes (MB) on Tesla P100 GPU. On large datasets, our implementation achieves a speedup of 7-16x over the fastest, highly optimized, OpenMP-accelerated suffix array constructor, libdivsufsort, that leverages the CPU shared memory parallelism. The performance of our algorithm relies on several high-performance parallel primitives such as radix sort, conditional filtering, inclusive prefix sum, random memory scattering, and segmented sort. We evaluate the performance of our implementation over a variety of real-world datasets with respect to its runtime, throughput, memory usage, and scalability. We compare our results against libdivsufsort that we run on a Haswell compute node equipped with 24 cores. Our GPU SACA is simple and compact, consisting of less than 300 lines of readable and effective source code. Additionally, we design and implement a fast and lightweight algorithm for checking the correctness of the suffix array

    A framework for genomic sequencing on clusters of multicore and manycore processors

    Get PDF
    [EN] The advances in genomic sequencing during the past few years have motivated the development of fast and reliable software for DNA/RNA sequencing on current high performance architectures. Most of these efforts target multicore processors, only a few can also exploit graphics processing units, and a much smaller set will run in clusters equipped with any of these multi-threaded architecture technologies. Furthermore, the examples that can be used on clusters today are all strongly coupled with a particular aligner. In this paper we introduce an alignment framework that can be leveraged to coordinately run any single-node aligner, taking advantage of the resources of a cluster without having to modify any portion of the original software. The key to our transparent migration lies in hiding the complexity associated with the multi-node execution (such as coordinating the processes running in the cluster nodes) inside the generic-aligner framework. Moreover, following the design and operation in our Message Passing Interface (MPI) version of HPG Aligner RNA BWT, we organize the framework into two stages in order to be able to execute different aligners in each one of them. With this configuration, for example, the first stage can ideally apply a fast aligner to accelerate the process, while the second one can be tuned to act as a refinement stage that further improves the global alignment process with little cost.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The researchers from the University Jaume I were supported by the MINECO/CICYT (grant numbers TIN2011-23283 and TIN2014-53495-R) and FEDER.Martínez, H.; Barrachina, S.; Castillo, M.; Tárraga, J.; Medina, I.; Dopazo, J.; Quintana Ortí, ES. (2018). A framework for genomic sequencing on clusters of multicore and manycore processors. International Journal of High Performance Computing Applications. 32(3):393-406. https://doi.org/10.1177/1094342016653243S393406323Biesecker, L. G. (2010). Exome sequencing makes medical genomics a reality. Nature Genetics, 42(1), 13-14. doi:10.1038/ng0110-13Burrows M, Wheeler D (1994) A block sorting lossless data compression algorithm. Technical report 124, Palo Alto: Digital Equipment Corporation.Cock, P. J. A., Fields, C. J., Goto, N., Heuer, M. L., & Rice, P. M. (2009). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 38(6), 1767-1771. doi:10.1093/nar/gkp1137Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., … Gingeras, T. R. (2012). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15-21. doi:10.1093/bioinformatics/bts635Ferragina, P., & Manzini, G. (s. f.). Opportunistic data structures with applications. Proceedings 41st Annual Symposium on Foundations of Computer Science. doi:10.1109/sfcs.2000.892127Garber, M., Grabherr, M. G., Guttman, M., & Trapnell, C. (2011). Computational methods for transcriptome annotation and quantification using RNA-seq. Nature Methods, 8(6), 469-477. doi:10.1038/nmeth.1613Grant, G. R., Farkas, M. H., Pizarro, A. D., Lahens, N. F., Schug, J., Brunk, B. P., … Pierce, E. A. (2011). Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics, 27(18), 2518-2528. doi:10.1093/bioinformatics/btr427Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., & Salzberg, S. L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, 14(4), R36. doi:10.1186/gb-2013-14-4-r36Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357-359. doi:10.1038/nmeth.1923Langmead, B., Trapnell, C., Pop, M., & Salzberg, S. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3), R25. doi:10.1186/gb-2009-10-3-r25Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., … Homer, N. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078-2079. doi:10.1093/bioinformatics/btp352Li, H., & Homer, N. (2010). A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics, 11(5), 473-483. doi:10.1093/bib/bbq015Yongchao Liu, & Schmidt, B. (2014). CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing. IEEE Design & Test, 31(1), 31-39. doi:10.1109/mdat.2013.2284198Liu, Y., Popp, B., & Schmidt, B. (2014). CUSHAW3: Sensitive and Accurate Base-Space and Color-Space Short-Read Alignment with Hybrid Seeding. PLoS ONE, 9(1), e86869. doi:10.1371/journal.pone.0086869Manber, U., & Myers, G. (1993). Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing, 22(5), 935-948. doi:10.1137/0222058Martinez, H., Barrachina, S., Castillo, M., Tarraga, J., Medina, I., Dopazo, J., & Quintana-Orti, E. S. (2015). Scalable RNA Sequencing on Clusters of Multicore Processors. 2015 IEEE Trustcom/BigDataSE/ISPA. doi:10.1109/trustcom.2015.631Martínez, H., Tárraga, J., Medina, I., Barrachina, S., Castillo, M., Dopazo, J., & Quintana-Ortí, E. S. (2013). A dynamic pipeline for RNA sequencing on multicore processors. Proceedings of the 20th European MPI Users’ Group Meeting on - EuroMPI ’13. doi:10.1145/2488551.2488581Martinez, H., Tarraga, J., Medina, I., Barrachina, S., Castillo, M., Dopazo, J., & Quintana-Orti, E. S. (2015). Concurrent and Accurate Short Read Mapping on Multicore Processors. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(5), 995-1007. doi:10.1109/tcbb.2015.2392077Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195-197. doi:10.1016/0022-2836(81)90087-5Tárraga, J., Arnau, V., Martínez, H., Moreno, R., Cazorla, D., Salavert-Torres, J., … Medina, I. (2014). Acceleration of short and long DNA read mapping without loss of accuracy using suffix array. Bioinformatics, 30(23), 3396-3398. doi:10.1093/bioinformatics/btu553Wang, K., Singh, D., Zeng, Z., Coleman, S. J., Huang, Y., Savich, G. L., … Liu, J. (2010). MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Research, 38(18), e178-e178. doi:10.1093/nar/gkq62
    • …
    corecore