105 research outputs found

    Lie Markov models with purine/pyrimidine symmetry

    Get PDF
    Continuous-time Markov chains are a standard tool in phylogenetic inference. If homogeneity is assumed, the chain is formulated by specifying time-independent rates of substitutions between states in the chain. In applications, there are usually extra constraints on the rates, depending on the situation. If a model is formulated in this way, it is possible to generalise it and allow for an inhomogeneous process, with time-dependent rates satisfying the same constraints. It is then useful to require that there exists a homogeneous average of this inhomogeneous process within the same model. This leads to the definition of "Lie Markov models", which are precisely the class of models where such an average exists. These models form Lie algebras and hence concepts from Lie group theory are central to their derivation. In this paper, we concentrate on applications to phylogenetics and nucleotide evolution, and derive the complete hierarchy of Lie Markov models that respect the grouping of nucleotides into purines and pyrimidines -- that is, models with purine/pyrimidine symmetry. We also discuss how to handle the subtleties of applying Lie group methods, most naturally defined over the complex field, to the stochastic case of a Markov process, where parameter values are restricted to be real and positive. In particular, we explore the geometric embedding of the cone of stochastic rate matrices within the ambient space of the associated complex Lie algebra. The whole list of Lie Markov models with purine/pyrimidine symmetry is available at http://www.pagines.ma1.upc.edu/~jfernandez/LMNR.pdf.Comment: 32 page

    The strength and timing of the mitochondrial bottleneck in salmon suggests a conserved mechanism in vertebrates

    Get PDF
    In most species mitochondrial DNA (mtDNA) is inherited maternally in an apparently clonal fashion, although how this is achieved remains uncertain. Population genetic studies show not only that individuals can harbor more than one type of mtDNA (heteroplasmy) but that heteroplasmy is common and widespread across a diversity of taxa. Females harboring a mixture of mtDNAs may transmit varying proportions of each mtDNA type (haplotype) to their offspring. However, mtDNA variants are also observed to segregate rapidly between generations despite the high mtDNA copy number in the oocyte, which suggests a genetic bottleneck acts during mtDNA transmission. Understanding the size and timing of this bottleneck is important for interpreting population genetic relationships and for predicting the inheritance of mtDNA based disease, but despite its importance the underlying mechanisms remain unclear. Empirical studies, restricted to mice, have shown that the mtDNA bottleneck could act either at embryogenesis, oogenesis or both. To investigate whether the size and timing of the mitochondrial bottleneck is conserved between distant vertebrates, we measured the genetic variance in mtDNA heteroplasmy at three developmental stages (female, ova and fry) in chinook salmon and applied a new mathematical model to estimate the number of segregating units (N(e)) of the mitochondrial bottleneck between each stage. Using these data we estimate values for mtDNA Ne of 88.3 for oogenesis, and 80.3 for embryogenesis. Our results confirm the presence of a mitochondrial bottleneck in fish, and show that segregation of mtDNA variation is effectively complete by the end of oogenesis. Considering the extensive differences in reproductive physiology between fish and mammals, our results suggest the mechanism underlying the mtDNA bottleneck is conserved in these distant vertebrates both in terms of it magnitude and timing. This finding may lead to improvements in our understanding of mitochondrial disorders and population interpretations using mtDNA data

    Novel Distances for Dollo Data

    Full text link
    We investigate distances on binary (presence/absence) data in the context of a Dollo process, where a trait can only arise once on a phylogenetic tree but may be lost many times. We introduce a novel distance, the Additive Dollo Distance (ADD), which is consistent for data generated under a Dollo model, and show that it has some useful theoretical properties including an intriguing link to the LogDet distance. Simulations of Dollo data are used to compare a number of binary distances including ADD, LogDet, Nei Li and some simple, but to our knowledge previously unstudied, variations on common binary distances. The simulations suggest that ADD outperforms other distances on Dollo data. Interestingly, we found that the LogDet distance performs poorly in the context of a Dollo process, which may have implications for its use in connection with conditioned genome reconstruction. We apply the ADD to two Diversity Arrays Technology (DArT) datasets, one that broadly covers Eucalyptus species and one that focuses on the Eucalyptus series Adnataria. We also reanalyse gene family presence/absence data on bacteria from the COG database and compare the results to previous phylogenies estimated using the conditioned genome reconstruction approach

    Modelling mitochondrial site polymorphisms to infer the number of segregating units and mutation rate

    Get PDF
    We present a mathematical model of mitochondrial inheritance evolving under neutral evolution to interpret the heteroplasmies observed at some sites. A comparison of the levels of heteroplasmies transmitted from mother to her offspring allows us to estimate the number Nx of inherited mitochondrial genomes (segregating units). The model demonstrates the necessity of accounting for both the multiplicity of an unknown number Nx, and the threshold θ, below which heteroplasmy cannot be detected reliably, in order to estimate the mitochondrial mutation rate μm in the maternal line of descent. Our model is applicable to pedigree studies of any eukaryotic species where site heteroplasmies are observed in regions of the mitochondria, provided neutrality can be assumed. The model is illustrated with an analysis of site heteroplasmies in the first hypervariable region of mitochondrial sequence data sampled from Adélie penguin families, providing an estimate Nx and μm. This estimate of μm was found to be consistent with earlier estimates from ancient DNA analysis

    RNase MRP and the RNA processing cascade in the eukaryotic ancestor

    Get PDF
    BACKGROUND: Within eukaryotes there is a complex cascade of RNA-based macromolecules that process other RNA molecules, especially mRNA, tRNA and rRNA. An example is RNase MRP processing ribosomal RNA (rRNA) in ribosome biogenesis. One hypothesis is that this complexity was present early in eukaryotic evolution; an alternative is that an initial simpler network later gained complexity by gene duplication in lineages that led to animals, fungi and plants. Recently there has been a rapid increase in support for the complexity-early theory because the vast majority of these RNA-processing reactions are found throughout eukaryotes, and thus were likely to be present in the last common ancestor of living eukaryotes, herein called the Eukaryotic Ancestor. RESULTS: We present an overview of the RNA processing cascade in the Eukaryotic Ancestor and investigate in particular, RNase MRP which was previously thought to have evolved later in eukaryotes due to its apparent limited distribution in fungi and animals and plants. Recent publications, as well as our own genomic searches, find previously unknown RNase MRP RNAs, indicating that RNase MRP has a wide distribution in eukaryotes. Combining secondary structure and promoter region analysis of RNAs for RNase MRP, along with analysis of the target substrate (rRNA), allows us to discuss this distribution in the light of eukaryotic evolution. CONCLUSION: We conclude that RNase MRP can now be placed in the RNA-processing cascade of the Eukaryotic Ancestor, highlighting the complexity of RNA-processing in early eukaryotes. Promoter analyses of MRP-RNA suggest that regulation of the critical processes of rRNA cleavage can vary, showing that even these key cellular processes (for which we expect high conservation) show some species-specific variability. We present our consensus MRP-RNA secondary structure as a useful model for further searches

    RNase MRP and the RNA processing cascade in the eukaryotic ancestor

    Get PDF
    Background Within eukaryotes there is a complex cascade of RNA-based macromolecules that process other RNA molecules, especially mRNA, tRNA and rRNA. A simple example is the RNase MRP processing of ribosomal RNA (rRNA) in ribosome biogenesis. One hypothesis is that this complexity was present early in eukaryotic evolution; an alternative is that an initial simplified network later gained complexity by gene duplication in lineages that led to animals, fungi and plants. Recently there has been a rapid increase in support for the complexity-early theory because the vast majority of these RNA-processing reactions are found throughout eukaryotes, and thus were likely to be present in the last common ancestor of living eukaryotes, named here as the Eukaryotic Ancestor. Results We present an overview of the RNA processing cascade in the Eukaryotic Ancestor and investigate in particular, RNase MRP which was previously thought to have evolved later in eukaryotes due to its apparent limited distribution in fungi and animals and plants. Recent publications, as well as our own genomic searches have uncovered previously unknown RNase MRP RNAs, indicating that RNase MRP has a wide distribution in eukaryotes. Combining secondary structure and promoter region analysis of new and previously discovered RNase MRP RNAs along with analysis of the primary substrate (rRNA), allows us to discuss this distribution in the light of eukaryotic evolution. Conclusions We conclude that RNase MRP can now be placed in the RNA-processing cascade present in the Eukaryotic Ancestor. This highlights the complexity of RNAprocessing in early eukaryotes

    IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era

    Get PDF
    IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.This work was supported by the Austrian Science Fund (Grant No. I-2805-B29) to A.v.H. and by the Australian National University Futures Scheme grant to R.L

    Riding the Wave: Reconciling the Roles of Disease and Climate Change in Amphibian Declines

    Get PDF
    We review the evidence for the role of climate change in triggering disease outbreaks of chytridiomycosis, an emerging infectious disease of amphibians. Both climatic anomalies and disease-related extirpations are recent phenomena, and effects of both are especially noticeable at high elevations in tropical areas, making it difficult to determine whether they are operating separately or synergistically. We compiled reports of amphibian declines from Lower Central America and Andean South America to create maps and statistical models to test our hypothesis of spatiotemporal spread of the pathogen Batrachochytrium dendrobatidis (Bd), and to update the elevational patterns of decline in frogs belonging to the genus Atelopus. We evaluated claims of climate change influencing the spread of Bd by including error into estimates of the relationship between air temperature and last year observed. Available data support the hypothesis of multiple introductions of this invasive pathogen into South America and subsequent spread along the primary Andean cordilleras. Additional analyses found no evidence to support the hypothesis that climate change has been driving outbreaks of amphibian chytridiomycosis, as has been posited in the climate-linked epidemic hypothesis. Future studies should increase retrospective surveys of museum specimens from throughout the Andes and should study the landscape genetics of Bd to map fine-scale patterns of geographic spread to identify transmission routes and processes
    corecore