29,944 research outputs found
Likelihood-Based Inference for Discretely Observed Birth-Death-Shift Processes, with Applications to Evolution of Mobile Genetic Elements
Continuous-time birth-death-shift (BDS) processes are frequently used in
stochastic modeling, with many applications in ecology and epidemiology. In
particular, such processes can model evolutionary dynamics of transposable
elements - important genetic markers in molecular epidemiology. Estimation of
the effects of individual covariates on the birth, death, and shift rates of
the process can be accomplished by analyzing patient data, but inferring these
rates in a discretely and unevenly observed setting presents computational
challenges. We propose a mutli-type branching process approximation to BDS
processes and develop a corresponding expectation maximization (EM) algorithm,
where we use spectral techniques to reduce calculation of expected sufficient
statistics to low dimensional integration. These techniques yield an efficient
and robust optimization routine for inferring the rates of the BDS process, and
apply more broadly to multi-type branching processes where rates can depend on
many covariates. After rigorously testing our methodology in simulation
studies, we apply our method to study intrapatient time evolution of IS6110
transposable element, a frequently used element during estimation of
epidemiological clusters of Mycobacterium tuberculosis infections.Comment: 31 pages, 7 figures, 1 tabl
Detection of recombination in DNA multiple alignments with hidden markov models
CConventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected
Efficient Transition Probability Computation for Continuous-Time Branching Processes via Compressed Sensing
Branching processes are a class of continuous-time Markov chains (CTMCs) with
ubiquitous applications. A general difficulty in statistical inference under
partially observed CTMC models arises in computing transition probabilities
when the discrete state space is large or uncountable. Classical methods such
as matrix exponentiation are infeasible for large or countably infinite state
spaces, and sampling-based alternatives are computationally intensive,
requiring a large integration step to impute over all possible hidden events.
Recent work has successfully applied generating function techniques to
computing transition probabilities for linear multitype branching processes.
While these techniques often require significantly fewer computations than
matrix exponentiation, they also become prohibitive in applications with large
populations. We propose a compressed sensing framework that significantly
accelerates the generating function method, decreasing computational cost up to
a logarithmic factor by only assuming the probability mass of transitions is
sparse. We demonstrate accurate and efficient transition probability
computations in branching process models for hematopoiesis and transposable
element evolution.Comment: 18 pages, 4 figures, 2 table
Hybrid Iterative Multiuser Detection for Channel Coded Space Division Multiple Access OFDM Systems
Space division multiple access (SDMA) aided orthogonal frequency division multiplexing (OFDM) systems assisted by efficient multiuser detection (MUD) techniques have recently attracted intensive research interests. The maximum likelihood detection (MLD) arrangement was found to attain the best performance, although this was achieved at the cost of a computational complexity, which increases exponentially both with the number of users and with the number of bits per symbol transmitted by higher order modulation schemes. By contrast, the minimum mean-square error (MMSE) SDMA-MUD exhibits a lower complexity at the cost of a performance loss. Forward error correction (FEC) schemes such as, for example, turbo trellis coded modulation (TTCM), may be efficiently combined with SDMA-OFDM systems for the sake of improving the achievable performance. Genetic algorithm (GA) based multiuser detection techniques have been shown to provide a good performance in MUD-aided code division multiple access (CDMA) systems. In this contribution, a GA-aided MMSE MUD is proposed for employment in a TTCM assisted SDMA-OFDM system, which is capable of achieving a similar performance to that attained by its optimum MLD-aided counterpart at a significantly lower complexity, especially at high user loads. Moreover, when the proposed biased Q-function based mutation (BQM) assisted iterative GA (IGA) MUD is employed, the GA-aided system’s performance can be further improved, for example, by reducing the bit error ratio (BER) measured at 3 dB by about five orders of magnitude in comparison to the TTCM assisted MMSE-SDMA-OFDM benchmarker system, while still maintaining modest complexity
A Mutagenetic Tree Hidden Markov Model for Longitudinal Clonal HIV Sequence Data
RNA viruses provide prominent examples of measurably evolving populations. In
HIV infection, the development of drug resistance is of particular interest,
because precise predictions of the outcome of this evolutionary process are a
prerequisite for the rational design of antiretroviral treatment protocols. We
present a mutagenetic tree hidden Markov model for the analysis of longitudinal
clonal sequence data. Using HIV mutation data from clinical trials, we estimate
the order and rate of occurrence of seven amino acid changes that are
associated with resistance to the reverse transcriptase inhibitor efavirenz.Comment: 20 pages, 6 figure
The impact of mutation and gene conversion on the local diversification of antigen genes in African trypanosomes
Patterns of genetic diversity in parasite antigen gene families hold important information about their potential to generate antigenic variation within and between hosts. The evolution of such gene families is typically driven by gene duplication, followed by point mutation and gene conversion. There is great interest in estimating the rates of these processes from molecular sequences for understanding the evolution of the pathogen and its significance for infection processes. In this study, a series of models are constructed to investigate hypotheses about the nucleotide diversity patterns between closely related gene sequences from the antigen gene archive of the African trypanosome, the protozoan parasite causative of human sleeping sickness in Equatorial Africa. We use a hidden Markov model approach to identify two scales of diversification: clustering of sequence mismatches, a putative indicator of gene conversion events with other lower-identity donor genes in the archive, and at a sparser scale, isolated mismatches, likely arising from independent point mutations. In addition to quantifying the respective probabilities of occurrence of these two processes, our approach yields estimates for the gene conversion tract length distribution and the average diversity contributed locally by conversion events. Model fitting is conducted using a Bayesian framework. We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long. However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members. We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair
- …