2,391 research outputs found
Accelerating Short Read Mapping Using A DSP Based Coprocessor
Advances in next generation sequencing technologies have allowed short reads to be generated at an increasing rate, shifting the bottleneck of the sequencing process to the short read mapping computations. High costs and extended processing times drive researchers to pursue more efficient solutions with an overall goal of a short read mapping architecture capable of processing short reads as they are generated. Digital signal processors have shown high performance capabilities while maintaining low power consumption in a wide field of applications. This thesis explores the use of a DSP accelerated exact match short read mapping algorithm, focusing on a performance metric to increase the number of mapped bases per watt-second. The design is implemented and tested for CPU and alternate coprocessor implementation comparisons to analyze the potential benefit of accelerating a memory bound application
Accelerating pairwise sequence alignment on GPUs using the Wavefront Algorithm
Advances in genomics and sequencing technologies demand faster and more scalable analysis methods that can process longer sequences with higher accuracy. However, classical pairwise alignment methods, based on dynamic programming (DP), impose impractical computational requirements to align long and noisy sequences like those produced by PacBio, and Nanopore technologies. The recently proposed Wavefront Alignment (WFA) algorithm paves the way for more efficient alignment tools, improving time and memory complexity over previous methods. Notwithstanding the advantages of the WFA algorithm, modern high performance computing (HPC) platforms rely on accelerator-based architectures that exploit parallel computing resources to improve over classical computing CPUs. Hence, a GPU-enabled implementation of the WFA could exploit the hardware resources of modern GPUs and further accelerate sequence alignment in current genome analysis pipelines. This thesis presents two GPU-accelerated implementations based on the WFA for fast pairwise DNA sequence alignment: eWFA-GPU and WFA-GPU. Our first proposal, eWFA-GPU, computes the exact edit-distance alignment between two short sequences (up to a few thousand bases), taking full advantage of the massive parallel capabilities of modern GPUs. We propose a succinct representation of the alignment data that successfully reduces the overall amount of memory required, allowing the exploitation of the fast on-chip memory of a GPU. Our results show that eWFA-GPU outperforms by 3-9X the edit-distance WFA implementation running on a 20 core machine. Compared to other state-of-the-art tools computing the edit-distance, eWFA-GPU is up to 265X faster than CPU tools and up to 56 times faster than other GPU-enabled implementations. Our second contribution, the WFA-GPU tool, extends the work of eWFA-GPU to compute the exact gap-affine distance (i.e., a more general alignment problem) between arbitrary long sequences. In this work, we propose a CPU-GPU co-design capable of performing inter and intra-sequence parallel alignment of multiple sequences, combining a succinct WFA-data representation with an efficient GPU implementation. As a result, we demonstrate that our implementation outperforms the original WFA implementation between 1.5-7.7X times when computing the alignment path, and between 2.6-16X when computing only the alignment score. Moreover, compared to other state-of-the-art tools, the WFA-GPU is up to 26.7X faster than other GPU implementations and up to four orders of magnitude faster than other CPU implementations
High-Throughput SNP Genotyping by SBE/SBH
Despite much progress over the past decade, current Single Nucleotide
Polymorphism (SNP) genotyping technologies still offer an insufficient degree
of multiplexing when required to handle user-selected sets of SNPs. In this
paper we propose a new genotyping assay architecture combining multiplexed
solution-phase single-base extension (SBE) reactions with sequencing by
hybridization (SBH) using universal DNA arrays such as all -mer arrays. In
addition to PCR amplification of genomic DNA, SNP genotyping using SBE/SBH
assays involves the following steps: (1) Synthesizing primers complementing the
genomic sequence immediately preceding SNPs of interest; (2) Hybridizing these
primers with the genomic DNA; (3) Extending each primer by a single base using
polymerase enzyme and dideoxynucleotides labeled with 4 different fluorescent
dyes; and finally (4) Hybridizing extended primers to a universal DNA array and
determining the identity of the bases that extend each primer by hybridization
pattern analysis. Our contributions include a study of multiplexing algorithms
for SBE/SBH genotyping assays and preliminary experimental results showing the
achievable tradeoffs between the number of array probes and primer length on
one hand and the number of SNPs that can be assayed simultaneously on the
other. Simulation results on datasets both randomly generated and extracted
from the NCBI dbSNP database suggest that the SBE/SBH architecture provides a
flexible and cost-effective alternative to genotyping assays currently used in
the industry, enabling genotyping of up to hundreds of thousands of
user-specified SNPs per assay.Comment: 19 page
ClaPIM: Scalable Sequence CLAssification using Processing-In-Memory
DNA sequence classification is a fundamental task in computational biology
with vast implications for applications such as disease prevention and drug
design. Therefore, fast high-quality sequence classifiers are significantly
important. This paper introduces ClaPIM, a scalable DNA sequence classification
architecture based on the emerging concept of hybrid in-crossbar and
near-crossbar memristive processing-in-memory (PIM). We enable efficient and
high-quality classification by uniting the filter and search stages within a
single algorithm. Specifically, we propose a custom filtering technique that
drastically narrows the search space and a search approach that facilitates
approximate string matching through a distance function. ClaPIM is the first
PIM architecture for scalable approximate string matching that benefits from
the high density of memristive crossbar arrays and the massive computational
parallelism of PIM. Compared with Kraken2, a state-of-the-art software
classifier, ClaPIM provides significantly higher classification quality (up to
20x improvement in F1 score) and also demonstrates a 1.8x throughput
improvement. Compared with EDAM, a recently-proposed SRAM-based accelerator
that is restricted to small datasets, we observe both a 30.4x improvement in
normalized throughput per area and a 7% increase in classification precision
Efficient Computation of Sequence Mappability
Sequence mappability is an important task in genome re-sequencing. In the
-mappability problem, for a given sequence of length , our goal
is to compute a table whose th entry is the number of indices such
that length- substrings of starting at positions and have at
most mismatches. Previous works on this problem focused on heuristic
approaches to compute a rough approximation of the result or on the case of
. We present several efficient algorithms for the general case of the
problem. Our main result is an algorithm that works in time and space for
. It requires a carefu l adaptation of the technique of Cole
et al.~[STOC 2004] to avoid multiple counting of pairs of substrings. We also
show -time algorithms to compute all results for a fixed
and all or a fixed and all . Finally we show
that the -mappability problem cannot be solved in strongly subquadratic
time for unless the Strong Exponential Time Hypothesis
fails.Comment: Accepted to SPIRE 201
Read alignment using deep neural networks
2019 Spring.Includes bibliographical references.Read alignment is the process of mapping short DNA sequences into the reference genome. With the advent of consecutively evolving "next generation" sequencing technologies, the need for sequence alignment tools appeared. Many scientific communities and the companies marketing the sequencing technologies developed a whole spectrum of read aligners/mappers for different error profiles and read length characteristics. Among the most recent successfully marketed sequencing technologies are Oxford Nanopore and PacBio SMRT sequencing, which are considered top players because of their extremely long reads and low cost. However, the reads may contain error up to 20% that are not generally uniformly distributed. To deal with that level of error rate and read length, proximity preserving hashing techniques, such as Minhash and Minimizers, were utilized to quickly map a read to the target region of the reference sequence. Subsequently, a variant of global or local alignment dynamic programming is then used to give the final alignment. In this research work, we train a Deep Neural Network (DNN) to yield a hashing scheme for the highly erroneous long reads, which is deemed superior to Minhash for mapping the reads. We implemented that idea to build a read alignment tool: DNNAligner. We evaluated the performance of our aligner against the popular read aligners in the bioinformatics community currently — minimap2, bwa-mem and graphmap. Our results show that the performance of DNNAligner is comparable to other tools without any code optimization or integration of other advanced features. Moreover, DNN exhibits superior performance in comparison with Minhashon neighborhood classification
- …