532 research outputs found

    Distinguishing regional from within-codon rate heterogeneity in DNA sequence alignments

    Get PDF
    We present an improved phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to (1) recombination and (2) rate heterogeneity. The focus of the present work is on improving the modelling of the latter aspect. Earlier papers have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. This approach fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. We propose an improved model that explicitly distinguishes between these two effects, and we assess its performance on a set of simulated DNA sequence alignments

    Fully Bayesian tests of neutrality using genealogical summary statistics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome.</p> <p>Results</p> <p>Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size.</p> <p>Conclusion</p> <p>Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.</p

    Determinants of dengue virus dispersal in the Americas

    Get PDF
    Dengue viruses (DENVs) are classified into four serotypes, each of which contains multiple genotypes. DENV genotypes introduced into the Americas over the past five decades have exhibited different rates and patterns of spatial dispersal. In order to understand factors underlying these patterns, we utilized a statistical framework that allows for the integration of ecological, socioeconomic, and air transport mobility data as predictors of viral diffusion while inferring the phylogeographic history. Predictors describing spatial diffusion based on several covariates were compared using a generalized linear model approach, where the support for each scenario and its contribution is estimated simultaneously from the data set. Although different predictors were identified for different serotypes, our analysis suggests that overall diffusion of DENV-1, -2, and -3 in the Americas was associated with airline traffic. The other significant predictors included human population size, the geographical distance between countries and between urban centers and the density of people living in urban environments

    Unifying the spatial epidemiology and molecular evolution of emerging epidemics

    Get PDF
    We introduce a conceptual bridge between the previously unlinked fields of phylogenetics and mathematical spatial ecology, which enables the spatial parameters of an emerging epidemic to be directly estimated from sampled pathogen genome sequences. By using phylogenetic history to correct for spatial autocorrelation, we illustrate how a fundamental spatial variable, the diffusion coefficient, can be estimated using robust nonparametric statistics, and how heterogeneity in dispersal can be readily quantified. We apply this framework to the spread of the West Nile virus across North America, an important recent instance of spatial invasion by an emerging infectious disease. We demonstrate that the dispersal of West Nile virus is greater and far more variable than previously measured, such that its dissemination was critically determined by rare, long-range movements that are unlikely to be discerned during field observations. Our results indicate that, by ignoring this heterogeneity, previous models of the epidemic have substantially overestimated its basic reproductive number. More generally, our approach demonstrates that easily obtainable genetic data can be used to measure the spatial dynamics of natural populations that are otherwise difficult or costly to quantify

    Leptospira interrogans Endostatin-Like Outer Membrane Proteins Bind Host Fibronectin, Laminin and Regulators of Complement

    Get PDF
    The pathogenic spirochete Leptospira interrogans disseminates throughout its hosts via the bloodstream, then invades and colonizes a variety of host tissues. Infectious leptospires are resistant to killing by their hosts' alternative pathway of complement-mediated killing, and interact with various host extracellular matrix (ECM) components. The LenA outer surface protein (formerly called LfhA and Lsa24) was previously shown to bind the host ECM component laminin and the complement regulators factor H and factor H-related protein-1. We now demonstrate that infectious L. interrogans contain five additional paralogs of lenA, which we designated lenB, lenC, lenD, lenE and lenF. All six genes encode domains predicted to bear structural and functional similarities with mammalian endostatins. Sequence analyses of genes from seven infectious L. interrogans serovars indicated development of sequence diversity through recombination and intragenic duplication. LenB was found to bind human factor H, and all of the newly-described Len proteins bound laminin. In addition, LenB, LenC, LenD, LenE and LenF all exhibited affinities for fibronectin, a distinct host extracellular matrix protein. These characteristics suggest that Len proteins together facilitate invasion and colonization of host tissues, and protect against host immune responses during mammalian infection

    An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration

    Full text link
    While statisticians are well-accustomed to performing exploratory analysis in the modeling stage of an analysis, the notion of conducting preliminary general-purpose exploratory analysis in the Monte Carlo stage (or more generally, the model-fitting stage) of an analysis is an area which we feel deserves much further attention. Towards this aim, this paper proposes a general-purpose algorithm for automatic density exploration. The proposed exploration algorithm combines and expands upon components from various adaptive Markov chain Monte Carlo methods, with the Wang-Landau algorithm at its heart. Additionally, the algorithm is run on interacting parallel chains -- a feature which both decreases computational cost as well as stabilizes the algorithm, improving its ability to explore the density. Performance is studied in several applications. Through a Bayesian variable selection example, the authors demonstrate the convergence gains obtained with interacting chains. The ability of the algorithm's adaptive proposal to induce mode-jumping is illustrated through a trimodal density and a Bayesian mixture modeling application. Lastly, through a 2D Ising model, the authors demonstrate the ability of the algorithm to overcome the high correlations encountered in spatial models.Comment: 33 pages, 20 figures (the supplementary materials are included as appendices

    Sequence-based prediction for vaccine strain selection and identification of antigenic variability in foot-and-mouth disease virus

    Get PDF
    Identifying when past exposure to an infectious disease will protect against newly emerging strains is central to understanding the spread and the severity of epidemics, but the prediction of viral cross-protection remains an important unsolved problem. For foot-and-mouth disease virus (FMDV) research in particular, improved methods for predicting this cross-protection are critical for predicting the severity of outbreaks within endemic settings where multiple serotypes and subtypes commonly co-circulate, as well as for deciding whether appropriate vaccine(s) exist and how much they could mitigate the effects of any outbreak. To identify antigenic relationships and their predictors, we used linear mixed effects models to account for variation in pairwise cross-neutralization titres using only viral sequences and structural data. We identified those substitutions in surface-exposed structural proteins that are correlates of loss of cross-reactivity. These allowed prediction of both the best vaccine match for any single virus and the breadth of coverage of new vaccine candidates from their capsid sequences as effectively as or better than serology. Sub-sequences chosen by the model-building process all contained sites that are known epitopes on other serotypes. Furthermore, for the SAT1 serotype, for which epitopes have never previously been identified, we provide strong evidence - by controlling for phylogenetic structure - for the presence of three epitopes across a panel of viruses and quantify the relative significance of some individual residues in determining cross-neutralization. Identifying and quantifying the importance of sites that predict viral strain cross-reactivity not just for single viruses but across entire serotypes can help in the design of vaccines with better targeting and broader coverage. These techniques can be generalized to any infectious agents where cross-reactivity assays have been carried out. As the parameterization uses pre-existing datasets, this approach quickly and cheaply increases both our understanding of antigenic relationships and our power to control disease

    Assessing the role of live poultry trade in community-structured transmission of avian influenza in China

    Get PDF
    The live poultry trade is thought to play an important role in the spread and maintenance of highly pathogenic avian influenza A viruses (HP AIVs) in Asia. Despite an abundance of small-scale observational studies, the role of the poultry trade in disseminating AIV over large geographic areas is still unclear, especially for developing countries with complex poultry production systems. Here we combine virus genomes and reconstructed poultry transportation data to measure and compare the spatial spread in China of three key subtypes of AIV: H5N1, H7N9, and H5N6. Although it is difficult to disentangle the contribution of confounding factors, such as bird migration and spatial distance, we find evidence that the dissemination of these subtypes among domestic poultry is geographically continuous and likely associated with the intensity of the live poultry trade in China. Using two independent data sources and network analysis methods, we report a regional-scale community structure in China that might explain the spread of AIV subtypes in the country. The identification of this structure has the potential to inform more targeted strategies for the prevention and control of AIV in China

    Evolutionary distances in the twilight zone -- a rational kernel approach

    Get PDF
    Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON
    corecore