74 research outputs found

    Identifying dramatic selection shifts in phylogenetic trees

    Get PDF
    BACKGROUND: The rate of evolution varies spatially along genomes and temporally in time. The presence of evolutionary rate variation is an informative signal that often marks functional regions of genomes and historical selection events. There exist many tests for temporal rate variation, or heterotachy, that start by partitioning sampled sequences into two or more groups and testing rate homogeneity among the groups. I develop a Bayesian method to infer phylogenetic trees with a divergence point, or dramatic temporal shifts in selection pressure that affect many nucleotide sites simultaneously, located at an unknown position in the tree. RESULTS: Simulation demonstrates that the method is most able to detect divergence points when rate variation and the number of affected sites is high, but not beyond biologically relevant values. The method is applied to two viral data sets. A divergence point is identified separating the B and C subtypes, two genetically distinct variants of HIV that have spread into different human populations with the AIDS epidemic. In contrast, no strong signal of temporal rate variation is found in a sample of F and H genotypes, two genetic variants of HBV that have likely evolved with humans during their immigration and expansion into the Americas. CONCLUSION: Temporal shifts in evolutionary rate of sufficient magnitude are detectable in the history of sampled sequences. The ability to detect such divergence points without the need to specify a prior hypothesis about the location or timing of the divergence point should help scientists identify historically important selection events and decipher mechanisms of evolution

    PREMIER - PRobabilistic Error-correction using Markov Inference in Errored Reads

    Get PDF
    In this work we present a flexible, probabilistic and reference-free method of error correction for high throughput DNA sequencing data. The key is to exploit the high coverage of sequencing data and model short sequence outputs as independent realizations of a Hidden Markov Model (HMM). We pose the problem of error correction of reads as one of maximum likelihood sequence detection over this HMM. While time and memory considerations rule out an implementation of the optimal Baum-Welch algorithm (for parameter estimation) and the optimal Viterbi algorithm (for error correction), we propose low-complexity approximate versions of both. Specifically, we propose an approximate Viterbi and a sequential decoding based algorithm for the error correction. Our results show that when compared with Reptile, a state-of-the-art error correction method, our methods consistently achieve superior performances on both simulated and real data sets.Comment: Submitted to ISIT 201

    An Efficient kk-modes Algorithm for Clustering Categorical Datasets

    Get PDF
    Mining clusters from data is an important endeavor in many applications. The kk-means method is a popular, efficient, and distribution-free approach for clustering numerical-valued data, but does not apply for categorical-valued observations. The kk-modes method addresses this lacuna by replacing the Euclidean with the Hamming distance and the means with the modes in the kk-means objective function. We provide a novel, computationally efficient implementation of kk-modes, called OTQT. We prove that OTQT finds updates to improve the objective function that are undetectable to existing kk-modes algorithms. Although slightly slower per iteration due to algorithmic complexity, OTQT is always more accurate per iteration and almost always faster (and only barely slower on some datasets) to the final optimum. Thus, we recommend OTQT as the preferred, default algorithm for kk-modes optimization.Comment: 16 pages, 10 figures, 5 table

    Rev Variation during Persistent Lentivirus Infection

    Get PDF
    The ability of lentiviruses to continually evolve and escape immune control is the central impediment in developing an effective vaccine for HIV-1 and other lentiviruses. Equine infectious anemia virus (EIAV) is considered a useful model for immune control of lentivirus infection. Virus-specific cytotoxic T lymphocytes (CTL) and broadly neutralizing antibody effectively control EIAV replication during inapparent stages of disease, but after years of low-level replication, the virus is still able to produce evasion genotypes that lead to late re-emergence of disease. There is a high rate of genetic variation in the EIAV surface envelope glycoprotein (SU) and in the region of the transmembrane protein (TM) overlapped by the major exon of Rev. This review examines genetic and phenotypic variation in Rev during EIAV disease and a possible role for Rev in immune evasion and virus persistence

    In the Garden of Branching Processes

    Get PDF
    The current paper surveys and develops numerical methods for Markovian multitype branching processes in continuous time. Particular attention is paid to the calculation of means, variances, extinction probabilities, and marginal distributions in the presence of a Poisson stream of immigrant particles. The Poisson process assumption allows for temporally complex patterns of immigration and facilitates application of marked Poisson processes and Campbell’s formulas. The methods and formulas derived are applied to four models: two population genetics models, a model for vaccination against an infectious disease in a community of households, and a model for the growth of resistant HIV virus in patients undergoing drug therap

    Quantitative exposure assessment for confinement of maize biogenic systems

    Get PDF
    The development of transgenic crops as production platforms for biogenic agents will largely depend on the success of efforts to confine the genes and their expressed proteins in field environments. We have used quantitative exposure assessment to evaluate how management practices affect materials escape due to outcrossing by pollen flow or grain loss during harvest operations. Specifically, we study the use of maize to produce biogenic agents within field-confined systems. Decision trees representing simplified schemes of fully conforming (designed to comply with current regulatory standards for field confined trials), partially conforming, and non-conforming management practices were developed. Exemplifying assumptions and published data for conformance and material fate probabilities were used in Monte Carlo simulations to forecast materials escape by pollen outcrossing and harvest operations from a 1 ha source field. Deterministic analyses showed fully conforming confinement management restricted materials loss to low levels (for this example, outcrossing produced \u3c1 in 106 kernels in receptor fields). The corresponding high-end (90th percentile) probabilistic result was 16- and 4333-fold higher (relative to deterministic outcrossing = 1) for outcrossing and harvest loss, respectively. For partially conforming practice, high-end outcrossing ranged from 100- to \u3e15 000-fold over the base result in receptor fields, and harvest loss was \u3e10 000-fold over the base result. For non-conforming practice, high-end outcrossing produced \u3e15 000-fold greater kernels in receptor fields and high-end harvest loss was at least 19 000-fold greater. Deterministic estimates of off-field loss by machine transfer are as much as 30 000-fold higher for non-conforming operations relative to the base case of pollen outcrossing. Better knowledge of failure frequencies for confinement management practices, improved physical models of materials flows, refined analysis of confinement loss probabilities using quantitative tools, and decision analysis to improve and audit management system performance are all needed to extend understanding of confinement integrity beyond the exemplifying case used here

    Universal primers for the amplification and sequence analysis pf actin-1 from diverse mosquito species

    Get PDF
    We report the development of universal primers for the reverse-transcription polymerase chain reaction (RT-PCR) amplification and nucleotide sequence analysis of actin cDNAs from taxonomically diverse mosquito species. Primers specific to conserved regions of the invertebrate actin-1 gene were designed after actin cDNA sequences of Anopheles gambiae, Bombyx mori, Drosophila melanogaster, and Caenorhabditis elegans. The efficacy of these primers was determined by RT-PCR with the use of total RNA from mosquitoes belonging to 30 species and 8 genera (Aedes, Anopheles, Culex, Deinocerites, Mansonia, Psorophora, Toxorhynchites, and Wyeomyia). The RT-PCR products were sequenced, and sequence data were used to design additional primers. One primer pair, denoted as Act-2F (5′-ATGGTCGGYATGGGNCAGAAGGACTC-3′) and Act-8R (5′-GATTCCATACCCAGGAAG-GADGG-3′), successfully amplified an RT-PCR product of the expected size (683-nt) in all mosquito spp. tested. We propose that this primer pair can be used as an internal control to test the quality of RNA from mosquitoes collected in vector surveillance studies. These primers can also be used in molecular experiments in which the detection, amplification or silencing of a ubiquitously expressed mosquito housekeeping gene is necessary. Sequence and phylogenetic data are also presented in this report
    • …
    corecore