1,670 research outputs found

    ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

    Full text link
    Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM tackles the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out negligible computations using a hardware-based filter, and 4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55x - 260.03x, 1.83x - 5.34x, and 27.97x when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x - 59.94x, 1.03x - 1.75x, and 1.03x - 1.95x, respectively, while improving their energy efficiency by 64.24x - 115.46x, 1.75x, 1.96x.Comment: Accepted to ACM TAC

    Likelihood-Based Inference for Discretely Observed Birth-Death-Shift Processes, with Applications to Evolution of Mobile Genetic Elements

    Full text link
    Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements - important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a mutli-type branching process approximation to BDS processes and develop a corresponding expectation maximization (EM) algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply more broadly to multi-type branching processes where rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a frequently used element during estimation of epidemiological clusters of Mycobacterium tuberculosis infections.Comment: 31 pages, 7 figures, 1 tabl

    Quantitative PET image reconstruction employing nested expectation-maximization deconvolution for motion compensation

    Get PDF
    Bulk body motion may randomly occur during PET acquisitions introducing blurring, attenuation-emission mismatches and, in dynamic PET, discontinuities in the measured time activity curves between consecutive frames. Meanwhile, dynamic PET scans are longer, thus increasing the probability of bulk motion. In this study, we propose a streamlined 3D PET motion-compensated image reconstruction (3D-MCIR) framework, capable of robustly deconvolving intra-frame motion from a static or dynamic 3D sinogram. The presented 3D-MCIR methods need not partition the data into multiple gates, such as 4D MCIR algorithms, or access list-mode (LM) data, such as LM MCIR methods, both associated with increased computation or memory resources. The proposed algorithms can support compensation for any periodic and non-periodic motion, such as cardio-respiratory or bulk motion, the latter including rolling, twisting or drifting. Inspired from the widely adopted point-spread function (PSF) deconvolution 3D PET reconstruction techniques, here we introduce an image-based 3D generalized motion deconvolution method within the standard 3D maximum-likelihood expectation-maximization (ML-EM) reconstruction framework. In particular, we initially integrate a motion blurring kernel, accounting for every tracked motion within a frame, as an additional MLEM modeling component in the image space (integrated 3D-MCIR). Subsequently, we replaced the integrated model component with a nested iterative Richardson-Lucy (RL) image-based deconvolution method to accelerate the MLEM algorithm convergence rate (RL-3D-MCIR). The final method was evaluated with realistic simulations of whole-body dynamic PET data employing the XCAT phantom and real human bulk motion profiles, the latter estimated from volunteer dynamic MRI scans. In addition, metabolic uptake rate Ki parametric images were generated with the standard Patlak method. Our results demonstrate significant improvement in contrast-to-noise ratio (CNR) and noise-bias performance in both dynamic and parametric images. The proposed nested RL-3D-MCIR method is implemented on the Software for Tomographic Image Reconstruction (STIR) open-source platform and is scheduled for public release

    Fitting and Interpreting Continuous-Time Latent Markov Models for Panel Data

    Get PDF
    Multistate models are used to characterize disease processes within an individual. Clinical studies often observe the disease status of individuals at discrete time points, making exact times of transitions between disease states unknown. Such panel data pose considerable modeling challenges. Assuming the disease process progresses according a standard continuous-time Markov chain (CTMC) yields tractable likelihoods, but the assumption of exponential sojourn time distributions is typically unrealistic. More flexible semi-Markov models permit generic sojourn distributions yet yield intractable likelihoods for panel data in the presence of reversible transitions. One attractive alternative is to assume that the disease process is characterized by an underlying latent CTMC, with multiple latent states mapping to each disease state. These models retain analytic tractability due to the CTMC framework but allow for flexible, duration-dependent disease state sojourn distributions. We have developed a robust and efficient expectation-maximization (EM) algorithm in this context. Our complete data state space consists of the observed data and the underlying latent trajectory, yielding computationally efficient expectation and maximization steps. Our algorithm outperforms alternative methods measured in terms of time to convergence and robustness. We also examine the frequentist performance of latent CTMC point and interval estimates of disease process functionals based on simulated data. The performance of estimates depends on time, functional, and data-generating scenario. Finally, we illustrate the interpretive power of latent CTMC models for describing disease processes on a data-set of lung transplant patients. We hope our work will encourage wider use of these models in the biomedical setting

    Estimation and Control of Traffic Relying on Vehicular Connectivity

    Get PDF
    Vehicular traļ¬ƒc ļ¬‚ow is essential, yet complicated to analyze. It describes the interplay among vehicles and with the infrastructure. A better understanding of traf-ļ¬c would beneļ¬t both individuals and the whole society in terms of improving safety, energy eļ¬ƒciency, and reducing environmental impacts. A large body of research ex-ists on estimation and control of vehicular traļ¬ƒc in which, however, vehicles were assumed not to be able to share information due to the limits of technology. With the development of wireless communication and various sensor devices, Connected Vehicles(CV) are emerging which are able to detect, access, and share information with each other and with the infrastructure in real time. Connected Vehicle Technology (CVT) has been attracting more and more attentions from diļ¬€erent ļ¬elds. The goal of this dissertation is to develop approaches to estimate and control vehicular traļ¬ƒc as well as individual vehicles relying on CVT. On one hand, CVT sig-niļ¬cantly enriches the data from individuals and the traļ¬ƒc, which contributes to the accuracy of traļ¬ƒc estimation algorithms. On the other hand, CVT enables commu-nication and information sharing between vehicles and infrastructure, and therefore allows vehicles to achieve better control and/or coordination among themselves and with smart infrastructure. The ļ¬rst part of this dissertation focused on estimation of traļ¬ƒc on freeways and city streets. We use data available from on road sensors and also from probe One of the most important traļ¬ƒc performance measures is travel time. How-ever it is aļ¬€ected by various factors, and freeways and arterials have diļ¬€erent travel time characteristics. In this dissertation we ļ¬rst propose a stochastic model-based approach to freeway travel-time prediction. The approach uses the Link-Node Cell Transmission Model (LN-CTM) to model traļ¬ƒc and provides a probability distribu-tion for travel time. The probability distribution is generated using a Monte Carlo simulation and an Online Expectation Maximization clustering algorithm. Results show that the approach is able to generate a reasonable multimodal distribution for travel-time. For arterials, this dissertation presents methods for estimating statistics of travel time by utilizing sparse vehicular probe data. A public data feed from transit buses in the City of San Francisco is used. We divide each link into shorter segments, and propose iterative methods for allocating travel time statistics to each segment. Inspired by K-mean and Expectation Maximization (EM) algorithms, we iteratively update the mean and variance of travel time for each segment based on historical probe data until convergence. Based on segment travel time statistics, we then pro-pose a method to estimate the maximum likelihood trajectory (MLT) of a probe vehicle in between two data updates on arterial roads. The results are compared to high frequency ground truth data in multiple scenarios, which demonstrate the eļ¬€ectiveness of the proposed approach. The second part of this dissertation emphasize on control approaches enabled by vehicular connectivity. Estimation and prediction of surrounding vehicle behaviors and upcoming traļ¬ƒc makes it possible to improve driving performance. We ļ¬rst propose a Speed Advisory System for arterial roads, which utilizes upcoming traļ¬ƒ
    • ā€¦
    corecore