1,696 research outputs found

    A nonparametric HMM for genetic imputation and coalescent inference

    Full text link
    Genetic sequence data are well described by hidden Markov models (HMMs) in which latent states correspond to clusters of similar mutation patterns. Theory from statistical genetics suggests that these HMMs are nonhomogeneous (their transition probabilities vary along the chromosome) and have large support for self transitions. We develop a new nonparametric model of genetic sequence data, based on the hierarchical Dirichlet process, which supports these self transitions and nonhomogeneity. Our model provides a parameterization of the genetic process that is more parsimonious than other more general nonparametric models which have previously been applied to population genetics. We provide truncation-free MCMC inference for our model using a new auxiliary sampling scheme for Bayesian nonparametric HMMs. In a series of experiments on male X chromosome data from the Thousand Genomes Project and also on data simulated from a population bottleneck we show the benefits of our model over the popular finite model fastPHASE, which can itself be seen as a parametric truncation of our model. We find that the number of HMM states found by our model is correlated with the time to the most recent common ancestor in population bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics applied to large and complex genetic data

    The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2

    Get PDF
    Understanding the circumstances that lead to pandemics is important for their prevention. Here, we analyze the genomic diversity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) early in the coronavirus disease 2019 (COVID-19) pandemic. We show that SARS-CoV-2 genomic diversity before February 2020 likely comprised only two distinct viral lineages, denoted A and B. Phylodynamic rooting methods, coupled with epidemic simulations, reveal that these lineages were the result of at least two separate cross-species transmission events into humans. The first zoonotic transmission likely involved lineage B viruses around 18 November 2019 (23 October–8 December), while the separate introduction of lineage A likely occurred within weeks of this event. These findings indicate that it is unlikely that SARS-CoV-2 circulated widely in humans prior to November 2019 and define the narrow window between when SARS-CoV-2 first jumped into humans and when the first cases of COVID-19 were reported. As with other coronaviruses, SARS-CoV-2 emergence likely resulted from multiple zoonotic events

    Inferring the evolutionary history of divergence despite gene flow in a lizard species, Scincella lateralis (Scincidae), composed of cryptic lineages

    Get PDF
    Although recent radiations are fruitful for studying the process of speciation, they are difficult to characterize and require the use of multiple loci and analytical methods that account for processes such as gene flow and genetic drift. Using multilocus sequence data, we combine hierarchical cluster analysis, coalescent species tree inference, and isolation-with-migration analysis to investigate evolutionary relationships among cryptic lineages of North American ground skinks. We also estimate the extent that gene flow has accompanied or followed diversification, and also attempt to account for and minimize the influence of gene flow when reconstructing relationships. The data best support seven largely parapatric populations that are broadly concordant with mitochondrial (mt)DNA phylogeography throughout most of the species range, although they fail to fully represent extensive mtDNA divergence along the Gulf Coast. Relationships within and among three broad geographical groups are well supported, despite evidence of gene flow among them. Rejection of an allopatric divergence model partially depends on the inclusion of samples from near parapatric boundaries in the analyses, suggesting that allopatric divergence followed by recent migration may best explain migration rate estimates. Accounting for geographical variation in patterns of gene flow can improve estimates of migration-divergence parameters and minimize the influence of contemporary gene flow on phylogenetic inference. © 2012 The Linnean Society of London

    The date of interbreeding between Neandertals and modern humans

    Get PDF
    Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000-86,000 years before the present (BP), and most likely 47,000-65,000 years ago. This supports the recent interbreeding hypothesis, and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa
    • …
    corecore