138 research outputs found

    A simulation study comparing supertree and combined analysis methods using SMIDGen

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Supertree methods comprise one approach to reconstructing large molecular phylogenies given multi-marker datasets: trees are estimated on each marker and then combined into a tree (the "supertree") on the entire set of taxa. Supertrees can be constructed using various algorithmic techniques, with the most common being matrix representation with parsimony (MRP). When the data allow, the competing approach is a combined analysis (also known as a "supermatrix" or "total evidence" approach) whereby the different sequence data matrices for each of the different subsets of taxa are concatenated into a single supermatrix, and a tree is estimated on that supermatrix.</p> <p>Results</p> <p>In this paper, we describe an extensive simulation study we performed comparing two supertree methods, MRP and weighted MRP, to combined analysis methods on large model trees. A key contribution of this study is our novel simulation methodology (Super-Method Input Data Generator, or <it>SMIDGen</it>) that better reflects biological processes and the practices of systematists than earlier simulations. We show that combined analysis based upon maximum likelihood outperforms MRP and weighted MRP, giving especially big improvements when the largest subtree does not contain most of the taxa.</p> <p>Conclusions</p> <p>This study demonstrates that MRP and weighted MRP produce distinctly less accurate trees than combined analyses for a given base method (maximum parsimony or maximum likelihood). Since there are situations in which combined analyses are not feasible, there is a clear need for better supertree methods. The source tree and combined datasets used in this study can be used to test other supertree and combined analysis methods.</p

    Reconstructing pedigrees: some identifiability questions for a recombination-mutation model

    Full text link
    Pedigrees are directed acyclic graphs that represent ancestral relationships between individuals in a population. Based on a schematic recombination process, we describe two simple Markov models for sequences evolving on pedigrees - Model R (recombinations without mutations) and Model RM (recombinations with mutations). For these models, we ask an identifiability question: is it possible to construct a pedigree from the joint probability distribution of extant sequences? We present partial identifiability results for general pedigrees: we show that when the crossover probabilities are sufficiently small, certain spanning subgraph sequences can be counted from the joint distribution of extant sequences. We demonstrate how pedigrees that earlier seemed difficult to distinguish are distinguished by counting their spanning subgraph sequences.Comment: 40 pages, 9 figure

    Quarnet Inference Rules for Level-1 Networks

    Get PDF
    An important problem in phylogenetics is the construction of phylogenetic trees. One way to approach this problem, known as the supertree method, involves inferring a phylogenetic tree with leaves consisting of a set X of species from a collection of trees, each having leaf-set some subset of X. In the 1980s, Colonius and Schulze gave certain inference rules for deciding when a collection of 4-leaved trees, one for each 4-element subset of X, can be simultaneously displayed by a single supertree with leaf-set X. Recently, it has become of interest to extend this and related results to phylogenetic networks. These are a generalization of phylogenetic trees which can be used to represent reticulate evolution (where species can come together to form a new species). It has recently been shown that a certain type of phylogenetic network, called a (unrooted) level-1 network, can essentially be constructed from 4-leaved trees. However, the problem of providing appropriate inference rules for such networks remains unresolved. Here, we show that by considering 4-leaved networks, called quarnets, as opposed to 4-leaved trees, it is possible to provide such rules. In particular, we show that these rules can be used to characterize when a collection of quarnets, one for each 4-element subset of X, can all be simultaneously displayed by a level-1 network with leaf-set X. The rules are an intriguing mixture of tree inference rules, and an inference rule for building up a cyclic ordering of X from orderings on subsets of X of size 4. This opens up several new directions of research for inferring phylogenetic networks from smaller ones, which could yield new algorithms for solving the supernetwork problem in phylogenetics

    Investigation of the Origin and Spread of a Mammalian Transposable Element Based on Current Sequence Diversity

    Get PDF
    Almost half the human genome consists of mobile DNA elements, and their analysis is a vital part of understanding the human genome as a whole. Many of these elements are ancient and have persisted in the genome for tens or hundreds of millions of years, providing a window into the evolution of modern mammals. The Golem family have been used as model transposons to highlight computational analyses which can be used to investigate these elements, particularly the use of molecular dating with large transposon families. Whole-genome searches found Golem sequences in 20 mammalian species. Golem A and B subsequences were only found in primates and squirrel. Interestingly, the full-length Golem, found as a few copies in many mammalian genomes, was found abundantly in horse. A phylogenetic profile suggested that Golem originated after the eutherian–metatherian divergence and that the A and B subfamilies originated at a much later date. Molecular dating based on sequence diversity suggests an early age, of 175 Mya, for the origin of the family and that the A and B lineages originated much earlier than expected from their current taxonomic distribution and have subsequently been lost in some lineages. Using publically available data, it is possible to investigate the evolutionary history of transposon families. Determining in which organisms a transposon can be found is often used to date the origin and expansion of the families. However, in this analysis, molecular dating, commonly used for determining the age of gene sequences, has been used, reducing the likelihood of errors from deleted lineages

    Evidence, Content and Corroboration and the Tree of Life

    Get PDF
    We examine three critical aspects of Popper’s formulation of the ‘Logic of Scientific Discovery’—evidence, content and degree of corroboration—and place these concepts in the context of the Tree of Life (ToL) problem with particular reference to molecular systematics. Content, in the sense discussed by Popper, refers to the breadth and scope of existence that a hypothesis purports to explain. Content, in conjunction with the amount of available and relevant evidence, determines the testability, or potential degree of corroboration, of a statement; content distinguishes scientific hypotheses from metaphysical assertions. Degree of corroboration refers to the relative and tentative confidence assigned to one hypothesis over another, based upon the performance of each under critical tests. Here we suggest that systematists attempt to maximize content and evidence to increase the potential degree of corroboration in all phylogenetic endeavors. Discussion of this “total evidence” approach leads to several interesting conclusions about generating ToL hypotheses

    Genome analysis and comparative genomics of a Giardia intestinalis assemblage E isolate

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Giardia intestinalis </it>is a protozoan parasite that causes diarrhea in a wide range of mammalian species. To further understand the genetic diversity between the <it>Giardia intestinalis </it>species, we have performed genome sequencing and analysis of a wild-type <it>Giardia intestinalis </it>sample from the assemblage E group, isolated from a pig.</p> <p>Results</p> <p>We identified 5012 protein coding genes, the majority of which are conserved compared to the previously sequenced genomes of the WB and GS strains in terms of microsynteny and sequence identity. Despite this, there is an unexpectedly large number of chromosomal rearrangements and several smaller structural changes that are present in all chromosomes. Novel members of the VSP, NEK Kinase and HCMP gene families were identified, which may reveal possible mechanisms for host specificity and new avenues for antigenic variation. We used comparative genomics of the three diverse <it>Giardia intestinalis </it>isolates P15, GS and WB to define a core proteome for this species complex and to identify lineage-specific genes. Extensive analyses of polymorphisms in the core proteome of <it>Giardia </it>revealed differential rates of divergence among cellular processes.</p> <p>Conclusions</p> <p>Our results indicate that despite a well conserved core of genes there is significant genome variation between <it>Giardia </it>isolates, both in terms of gene content, gene polymorphisms, structural chromosomal variations and surface molecule repertoires. This study improves the annotation of the <it>Giardia </it>genomes and enables the identification of functionally important variation.</p

    Threatened reef corals of the world

    Get PDF
    10.1371/journal.pone.0034459PLoS ONE73

    Evolution records a Mx tape for anti-viral immunity

    Get PDF
    Viruses impose diverse and dynamic challenges on host defenses. Diversifying selection of codons and gene copy number variation are two hallmarks of genetic innovation in antiviral genes engaged in host-virus genetic conflicts. The myxovirus resistance (Mx) genes encode interferon-inducible GTPases that constitute a major arm of the cell-autonomous defense against viral infection. Unlike the broad antiviral activity of MxA, primate MxB was recently shown to specifically inhibit lentiviruses including HIV-1. We carried out detailed evolutionary analyses to investigate whether genetic conflict with lentiviruses has shaped MxB evolution in primates. We found strong evidence for diversifying selection in the MxB N-terminal tail, which contains molecular determinants of MxB anti-lentivirus specificity. However, we found no overlap between previously-mapped residues that dictate lentiviral restriction and those that have evolved under diversifying selection. Instead, our findings are consistent with MxB having a long-standing and important role in the interferon response to viral infection against a broader range of pathogens than is currently appreciated. Despite its critical role in host innate immunity, we also uncovered multiple functional losses of MxB during mammalian evolution, either by pseudogenization or by gene conversion from MxA genes. Thus, although the majority of mammalian genomes encode two Mx genes, this apparent stasis masks the dramatic effects that recombination and diversifying selection have played in shaping the evolutionary history of Mx genes. Discrepancies between our study and previous publications highlight the need to account for recombination in analyses of positive selection, as well as the importance of using sequence datasets with appropriate depth of divergence. Our study also illustrates that evolutionary analyses of antiviral gene families are critical towards understanding molecular principles that govern host-virus interactions and species-specific susceptibility to viral infection
    corecore