3,299 research outputs found

    Species delimitation and phylogeny of a New Zealand plant species radiation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Delimiting species boundaries and reconstructing the evolutionary relationships of late Tertiary and Quaternary species radiations is difficult. One recent approach emphasizes the use of genome-wide molecular markers, such as amplified fragment length polymorphisms (AFLPs) and single nucleotide polymorphisms (SNPs), to identify distinct metapopulation lineages as taxonomic species. Here we investigate the properties of AFLP data, and the usefulness of tree-based and non-tree-based clustering methods to delimit species and reconstruct evolutionary relationships among high-elevation <it>Ourisia </it>species (Plantaginaceae) in the New Zealand archipelago.</p> <p>Results</p> <p>New Zealand <it>Ourisia </it>are shown to comprise a geologically recent species radiation based on molecular dating analyses of ITS sequences (0.4–1.3 MY). Supernetwork analyses indicate that separate tree-based clustering analyses of four independent AFLP primer combinations and 193 individuals of <it>Ourisia </it>produced similar trees. When combined and analysed using tree building methods, 15 distinct metapopulations could be identified. These clusters corresponded very closely to species and subspecies identified on the basis of diagnostic morphological characters. In contrast, Structure and PCO-MC analyses of the same data identified a maximum of 12 and 8 metapopulations, respectively. All approaches resolved a large-leaved group and a small-leaved group, as well as a lineage of three alpine species within the small-leaved group. We were unable to further resolve relationships within these groups as corrected and uncorrected distances derived from AFLP profiles had limited tree-like properties.</p> <p>Conclusion</p> <p><it>Ourisia </it>radiated into a range of alpine and subalpine habitats in New Zealand during the Pleistocene, resulting in 13 morphologically and ecologically distinct species, including one reinstated from subspecies rank. Analyses of AFLP identified distinct metapopulations consistent with morphological characters allowing species boundaries to be delimited in <it>Ourisia</it>. Importantly, Structure analyses suggest some degree of admixture with most species, which may also explain why the AFLP data do not exhibit sufficient tree-like properties necessary for reconstructing some species relationships. We discuss this feature and highlight the importance of improving models for phylogenetic analyses of species radiations using AFLP and SNP data.</p

    A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction

    Get PDF
    Supplementary tables S1, sections 2.1–2.3, and figures S1–S11 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).Recent sequencing and computing advances have enabled phylogenetic analyses to expand to both entire genomes and large clades, thus requiring more efficient and accurate methods designed specifically for the phylogenomic context. Here, we present SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss (DL) rates, speciation times, and correlated substitution rate variation across both species and loci. We have implemented and applied this method on two clades of fully sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies and find dramatic improvements in reconstruction accuracy as compared with the most popular existing methods, including those that take the species tree into account. We find that reconstruction inaccuracies of traditional phylogenetic methods overestimate the number of DL events by as much as 2–3-fold, whereas our method achieves significantly higher accuracy. We feel that the results and methods presented here will have many important implications for future investigations of gene evolution.National Science Foundation (U.S.) (CAREER award NSF 0644282

    Minimal Conflicting Sets for the Consecutive Ones Property in ancestral genome reconstruction

    Full text link
    A binary matrix has the Consecutive Ones Property (C1P) if its columns can be ordered in such a way that all 1's on each row are consecutive. A Minimal Conflicting Set is a set of rows that does not have the C1P, but every proper subset has the C1P. Such submatrices have been considered in comparative genomics applications, but very little is known about their combinatorial structure and efficient algorithms to compute them. We first describe an algorithm that detects rows that belong to Minimal Conflicting Sets. This algorithm has a polynomial time complexity when the number of 1's in each row of the considered matrix is bounded by a constant. Next, we show that the problem of computing all Minimal Conflicting Sets can be reduced to the joint generation of all minimal true clauses and maximal false clauses for some monotone boolean function. We use these methods on simulated data related to ancestral genome reconstruction to show that computing Minimal Conflicting Set is useful in discriminating between true positive and false positive ancestral syntenies. We also study a dataset of yeast genomes and address the reliability of an ancestral genome proposal of the Saccahromycetaceae yeasts.Comment: 20 pages, 3 figure

    A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes

    Get PDF
    The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a long-standing problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes with diverging points of view regarding the performance of these two approaches. We describe a general methodological framework for reconstructing ancestral genome segments from conserved syntenies in extant genomes. We show that this problem, from a computational point of view, is naturally related to physical mapping of chromosomes and benefits from using combinatorial tools developed in this scope. We develop this framework into a new reconstruction method considering conserved gene clusters with similar gene content, mimicking principles used in most cytogenetic studies, although on a different kind of data. We implement and apply it to datasets of mammalian genomes. We perform intensive theoretical and experimental comparisons with other bioinformatics methods for ancestral genome segments reconstruction. We show that the method that we propose is stable and reliable: it gives convergent results using several kinds of data at different levels of resolution, and all predicted ancestral regions are well supported. The results come eventually very close to cytogenetics studies. It suggests that the comparison of methods for ancestral genome reconstruction should include the algorithmic aspects of the methods as well as the disciplinary differences in data aquisition

    The evolutionary dynamics of protein-protein interaction networks inferred from the reconstruction of ancient networks

    Get PDF
    Cellular functions are based on the complex interplay of proteins, therefore the structure and dynamics of these protein-protein interaction (PPI) networks are the key to the functional understanding of cells. In the last years, large-scale PPI networks of several model organisms were investigated. Methodological improvements now allow the analysis of PPI networks of multiple organisms simultaneously as well as the direct modeling of ancestral networks. This provides the opportunity to challenge existing assumptions on network evolution. We utilized present-day PPI networks from integrated datasets of seven model organisms and developed a theoretical and bioinformatic framework for studying the evolutionary dynamics of PPI networks. A novel filtering approach using percolation analysis was developed to remove low confidence interactions based on topological constraints. We then reconstructed the ancient PPI networks of different ancestors, for which the ancestral proteomes, as well as the ancestral interactions, were inferred. Ancestral proteins were reconstructed using orthologous groups on different evolutionary levels. A stochastic approach, using the duplication-divergence model, was developed for estimating the probabilities of ancient interactions from today's PPI networks. The growth rates for nodes, edges, sizes and modularities of the networks indicate multiplicative growth and are consistent with the results from independent static analysis. Our results support the duplication-divergence model of evolution and indicate fractality and multiplicative growth as general properties of the PPI network structure and dynamics

    Reliable ABC model choice via random forests

    Full text link
    Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table
    corecore