Search CORE

227 research outputs found

Hardware acceleration of the pair HMM algorithm for DNA variant calling

Author: Huang Sitao
Publication venue
Publication date: 01/05/2017
Field of study

With the advent of several accurate and sophisticated statistical algorithms and pipelines for DNA sequence analysis, it is becoming increasingly possible to translate raw sequencing data into biologically meaningful information for further clinical analysis and processing. However, given the large volume of the data involved, even modestly complex algorithms would require a prohibitively long time to complete. Hence it is urgent to explore non-conventional implementation platforms to accelerate genomics research. In this thesis, we present a Field-Programmable Gate Array (FPGA) accelerated implementation of the Pair Hidden Markov Model (Pair HMM) forward algorithm, the performance bottleneck in the HaplotypeCaller, a critical function in the popular Genome Analysis Toolkit (GATK) variant calling tool. We introduce the PE ring structure which, thanks to the fine-grained parallelism allowed by the FPGA, can be built into various configurations striking a trade-off between Instruction-Level Parallelism (ILP) and data parallelism. We investigate the resource utilization and performance of different configurations. Our solution can achieve a speed-up of up to 487x compared to the C++ baseline implementation on CPU and 1.56x compared to the previous best hardware implementation

Illinois Digital Environment for Access to Learning and Scholarship Repository

Exploration of GPU acceleration for pair-HMM algorithm and its application in the DNA alignment problem

Author: Li Enliang
Publication venue
Publication date: 01/05/2019
Field of study

The hidden Markov model, known as HMM, is an important type of statistical model with extensive application in estimating hidden parameters and decoding observed Markov chains. On top of the HMM, the Pair-HMM Algorithm with Halotype-Caller is developed as a popular solution for the DNA alignment problem. For two aligned sequences of DNA observations, one named as reference, and the other one named as read, there are only three possible hidden states, i.e. match (A , A), insertion (- , A), and deletion (A , -). However, what we could observe by DNA sequencing in real-life is the summation of the possibilities for match, insertion, and deletion as macrostates. In order to determine the alignment with maximum probability, we need to score each possible pairwise alignment and which leads to a computationally intensive problem that usually contributes to the most latency in a variant calling with the GATK HaplotypeCaller. In the CPU implementation of a proper Pair-HMM forward algorithm, there are 7 multiply-accumulate operations for each ( i , j ) location on the read-reference matrix. Moreover, since transitions and emission matrices are fixed throughout a single alignment process, a CUDA implementation with single-precision floating-point is proposed to accelerate the Pair-HMM forward algorithm. CUDA implementation with minibatch and states-parallelization, along with the use of float32, gives us an around 22.6x speedup compared to the CPU implementation. While it comes with a price, using single-precision instead of double-precision floating-point introduces a more serious under flow problem at the beginning of the alignment scoring process. A normalization technique is used to help fix this problem.Ope

Illinois Digital Environment for Access to Learning and Scholarship Repository

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

Author: Alser Mohammed
Cali Damla Senol
Cavlak Meryem Banu
Firtina Can
Kalsi Gurpreet S.
Kim Jeremie
Lindegger Joel
Luna Juan Gómez
Mutlu Onur
Pillai Kamlesh
Shahroodi Taha
Subramoney Sreenivas
Suresh Bharathwaj
Publication venue
Publication date: 21/10/2023
Field of study

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM tackles the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out negligible computations using a hardware-based filter, and 4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55x - 260.03x, 1.83x - 5.34x, and 27.97x when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x - 59.94x, 1.03x - 1.75x, and 1.03x - 1.95x, respectively, while improving their energy efficiency by 64.24x - 115.46x, 1.75x, 1.96x.Comment: Accepted to ACM TAC

arXiv.org e-Print Archive

Recommended from our members

Spatial intratumoral heterogeneity and temporal clonal evolution in esophageal squamous cell carcinoma.

Author: Berman Benjamin P
Cai Yan
Chang Chen
Dinh Huy Q
Hao Jia-Jie
Jiang Yan-Yi
Jiang Ye
Koeffler H Phillip
Lin De-Chen
Lu Chen-Chen
Mayakonda Anand
Shi Zhi-Zhou
Wang Jin-Wu
Wang Ming-Rong
Wei Wen-Qiang
Xu Xin
Zhan Qi-Min
Zhang Yu
Publication venue: eScholarship, University of California
Publication date: 01/12/2016
Field of study

Esophageal squamous cell carcinoma (ESCC) is among the most common malignancies, but little is known about its spatial intratumoral heterogeneity (ITH) and temporal clonal evolutionary processes. To address this, we performed multiregion whole-exome sequencing on 51 tumor regions from 13 ESCC cases and multiregion global methylation profiling for 3 of these 13 cases. We found an average of 35.8% heterogeneous somatic mutations with strong evidence of ITH. Half of the driver mutations located on the branches of tumor phylogenetic trees targeted oncogenes, including PIK3CA, NFE2L2 and MTOR, among others. By contrast, the majority of truncal and clonal driver mutations occurred in tumor-suppressor genes, including TP53, KMT2D and ZNF750, among others. Interestingly, phyloepigenetic trees robustly recapitulated the topological structures of the phylogenetic trees, indicating a possible relationship between genetic and epigenetic alterations. Our integrated investigations of spatial ITH and clonal evolution provide an important molecular foundation for enhanced understanding of tumorigenesis and progression in ESCC

eScholarship - University of California

Recommended from our members

Genome variation over multiple timescales and dimensions

Author: Keough Kathleen Coll
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Genomic variation does not only include nucleotide changes, it also comprises changes in DNA shape, structure, epigenetic marks, and expression, all of which can occur over generations, cellular differentiation, the span of a few hours or a few millennia. This doctoral thesis explores the implications and opportunities presented by these multiple forms of genomic variation for genome editing, cellular differentiation, genome regulation and comparative genomics, all towards improving our understanding of genome evolution and development and benefiting human health

eScholarship - University of California

Decomposing Genomics Algorithms: Core Computations for Accelerating Genomics

Author: Athreya Arjun P.
Banerjee Subho S.
Iyer Ravishankar K.
Jongeneel C. Victor
Kalbarczyk Zbigniew T.
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/02/2014
Field of study

Technological advances in genomic analyses and computing sciences has led to a burst in genomics data. With those advances, there has also been parallel growth in dedicated accelerators for specific genomic analyses. However, biologists are in need of a reconfigurable machine that can allow them to perform multiple analyses without needing to go for dedicated compute platforms for each analysis. This work addresses the first steps in the design of such a reconfigurable machine. We hypothesize that this machine design can consist of some accelerators of computations common across various genomic analyses. This work studies a subset of genomic analyses and identifies such core computations. We further investigate the possibility of further accelerating through a deeper analysis of the computation primitives.National Science Foundation (NSF CNS 13-37732); Infosys; IBM Faculty Award; Office of the Vice Chancellor for Research, University of Illinois at Urbana-ChampaignOpe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Hidden Markov Models and their Applications in Biological Sequence Analysis

Author: Yoon Byung-Jun
Publication venue: Bentham Science Publishers Ltd.
Publication date
Field of study

Hidden Markov models (HMMs) have been extensively used in biological sequence analysis. In this paper, we give a tutorial review of HMMs and their applications in a variety of problems in molecular biology. We especially focus on three types of HMMs: the profile-HMMs, pair-HMMs, and context-sensitive HMMs. We show how these HMMs can be used to solve various sequence analysis problems, such as pairwise and multiple sequence alignments, gene annotation, classification, similarity search, and many others

Crossref

PubMed Central

Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions

Author: Alkan Can
Cali Damla Senol
Ghose Saugata
Kim Jeremie S.
Mutlu Onur
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages, and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we 1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and 2) provide guidelines for determining the appropriate tools for each step. We analyze various combinations of different tools and expose the tradeoffs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, in order to overcome the high error rates of the nanopore sequencing technology.Comment: To appear in Briefings in Bioinformatics (BIB), 201

arXiv.org e-Print Archive

Crossref

Bilkent University Institutional Repository