29,944 research outputs found

    Likelihood-Based Inference for Discretely Observed Birth-Death-Shift Processes, with Applications to Evolution of Mobile Genetic Elements

    Full text link
    Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements - important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a mutli-type branching process approximation to BDS processes and develop a corresponding expectation maximization (EM) algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply more broadly to multi-type branching processes where rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a frequently used element during estimation of epidemiological clusters of Mycobacterium tuberculosis infections.Comment: 31 pages, 7 figures, 1 tabl

    Detection of recombination in DNA multiple alignments with hidden markov models

    Get PDF
    CConventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected

    Efficient Transition Probability Computation for Continuous-Time Branching Processes via Compressed Sensing

    Full text link
    Branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous applications. A general difficulty in statistical inference under partially observed CTMC models arises in computing transition probabilities when the discrete state space is large or uncountable. Classical methods such as matrix exponentiation are infeasible for large or countably infinite state spaces, and sampling-based alternatives are computationally intensive, requiring a large integration step to impute over all possible hidden events. Recent work has successfully applied generating function techniques to computing transition probabilities for linear multitype branching processes. While these techniques often require significantly fewer computations than matrix exponentiation, they also become prohibitive in applications with large populations. We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of transitions is sparse. We demonstrate accurate and efficient transition probability computations in branching process models for hematopoiesis and transposable element evolution.Comment: 18 pages, 4 figures, 2 table

    Hybrid Iterative Multiuser Detection for Channel Coded Space Division Multiple Access OFDM Systems

    No full text
    Space division multiple access (SDMA) aided orthogonal frequency division multiplexing (OFDM) systems assisted by efficient multiuser detection (MUD) techniques have recently attracted intensive research interests. The maximum likelihood detection (MLD) arrangement was found to attain the best performance, although this was achieved at the cost of a computational complexity, which increases exponentially both with the number of users and with the number of bits per symbol transmitted by higher order modulation schemes. By contrast, the minimum mean-square error (MMSE) SDMA-MUD exhibits a lower complexity at the cost of a performance loss. Forward error correction (FEC) schemes such as, for example, turbo trellis coded modulation (TTCM), may be efficiently combined with SDMA-OFDM systems for the sake of improving the achievable performance. Genetic algorithm (GA) based multiuser detection techniques have been shown to provide a good performance in MUD-aided code division multiple access (CDMA) systems. In this contribution, a GA-aided MMSE MUD is proposed for employment in a TTCM assisted SDMA-OFDM system, which is capable of achieving a similar performance to that attained by its optimum MLD-aided counterpart at a significantly lower complexity, especially at high user loads. Moreover, when the proposed biased Q-function based mutation (BQM) assisted iterative GA (IGA) MUD is employed, the GA-aided system’s performance can be further improved, for example, by reducing the bit error ratio (BER) measured at 3 dB by about five orders of magnitude in comparison to the TTCM assisted MMSE-SDMA-OFDM benchmarker system, while still maintaining modest complexity

    A Mutagenetic Tree Hidden Markov Model for Longitudinal Clonal HIV Sequence Data

    Full text link
    RNA viruses provide prominent examples of measurably evolving populations. In HIV infection, the development of drug resistance is of particular interest, because precise predictions of the outcome of this evolutionary process are a prerequisite for the rational design of antiretroviral treatment protocols. We present a mutagenetic tree hidden Markov model for the analysis of longitudinal clonal sequence data. Using HIV mutation data from clinical trials, we estimate the order and rate of occurrence of seven amino acid changes that are associated with resistance to the reverse transcriptase inhibitor efavirenz.Comment: 20 pages, 6 figure

    The impact of mutation and gene conversion on the local diversification of antigen genes in African trypanosomes

    Get PDF
    Patterns of genetic diversity in parasite antigen gene families hold important information about their potential to generate antigenic variation within and between hosts. The evolution of such gene families is typically driven by gene duplication, followed by point mutation and gene conversion. There is great interest in estimating the rates of these processes from molecular sequences for understanding the evolution of the pathogen and its significance for infection processes. In this study, a series of models are constructed to investigate hypotheses about the nucleotide diversity patterns between closely related gene sequences from the antigen gene archive of the African trypanosome, the protozoan parasite causative of human sleeping sickness in Equatorial Africa. We use a hidden Markov model approach to identify two scales of diversification: clustering of sequence mismatches, a putative indicator of gene conversion events with other lower-identity donor genes in the archive, and at a sparser scale, isolated mismatches, likely arising from independent point mutations. In addition to quantifying the respective probabilities of occurrence of these two processes, our approach yields estimates for the gene conversion tract length distribution and the average diversity contributed locally by conversion events. Model fitting is conducted using a Bayesian framework. We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long. However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members. We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair
    corecore