3,299 research outputs found
Species delimitation and phylogeny of a New Zealand plant species radiation
<p>Abstract</p> <p>Background</p> <p>Delimiting species boundaries and reconstructing the evolutionary relationships of late Tertiary and Quaternary species radiations is difficult. One recent approach emphasizes the use of genome-wide molecular markers, such as amplified fragment length polymorphisms (AFLPs) and single nucleotide polymorphisms (SNPs), to identify distinct metapopulation lineages as taxonomic species. Here we investigate the properties of AFLP data, and the usefulness of tree-based and non-tree-based clustering methods to delimit species and reconstruct evolutionary relationships among high-elevation <it>Ourisia </it>species (Plantaginaceae) in the New Zealand archipelago.</p> <p>Results</p> <p>New Zealand <it>Ourisia </it>are shown to comprise a geologically recent species radiation based on molecular dating analyses of ITS sequences (0.4–1.3 MY). Supernetwork analyses indicate that separate tree-based clustering analyses of four independent AFLP primer combinations and 193 individuals of <it>Ourisia </it>produced similar trees. When combined and analysed using tree building methods, 15 distinct metapopulations could be identified. These clusters corresponded very closely to species and subspecies identified on the basis of diagnostic morphological characters. In contrast, Structure and PCO-MC analyses of the same data identified a maximum of 12 and 8 metapopulations, respectively. All approaches resolved a large-leaved group and a small-leaved group, as well as a lineage of three alpine species within the small-leaved group. We were unable to further resolve relationships within these groups as corrected and uncorrected distances derived from AFLP profiles had limited tree-like properties.</p> <p>Conclusion</p> <p><it>Ourisia </it>radiated into a range of alpine and subalpine habitats in New Zealand during the Pleistocene, resulting in 13 morphologically and ecologically distinct species, including one reinstated from subspecies rank. Analyses of AFLP identified distinct metapopulations consistent with morphological characters allowing species boundaries to be delimited in <it>Ourisia</it>. Importantly, Structure analyses suggest some degree of admixture with most species, which may also explain why the AFLP data do not exhibit sufficient tree-like properties necessary for reconstructing some species relationships. We discuss this feature and highlight the importance of improving models for phylogenetic analyses of species radiations using AFLP and SNP data.</p
A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction
Supplementary tables S1, sections 2.1–2.3, and figures S1–S11 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).Recent sequencing and computing advances have enabled phylogenetic analyses to expand to both entire genomes and large clades, thus requiring more efficient and accurate methods designed specifically for the phylogenomic context. Here, we present SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss (DL) rates, speciation times, and correlated substitution rate variation across both species and loci. We have implemented and applied this method on two clades of fully sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies and find dramatic improvements in reconstruction accuracy as compared with the most popular existing methods, including those that take the species tree into account. We find that reconstruction inaccuracies of traditional phylogenetic methods overestimate the number of DL events by as much as 2–3-fold, whereas our method achieves significantly higher accuracy. We feel that the results and methods presented here will have many important implications for future investigations of gene evolution.National Science Foundation (U.S.) (CAREER award NSF 0644282
Minimal Conflicting Sets for the Consecutive Ones Property in ancestral genome reconstruction
A binary matrix has the Consecutive Ones Property (C1P) if its columns can be
ordered in such a way that all 1's on each row are consecutive. A Minimal
Conflicting Set is a set of rows that does not have the C1P, but every proper
subset has the C1P. Such submatrices have been considered in comparative
genomics applications, but very little is known about their combinatorial
structure and efficient algorithms to compute them. We first describe an
algorithm that detects rows that belong to Minimal Conflicting Sets. This
algorithm has a polynomial time complexity when the number of 1's in each row
of the considered matrix is bounded by a constant. Next, we show that the
problem of computing all Minimal Conflicting Sets can be reduced to the joint
generation of all minimal true clauses and maximal false clauses for some
monotone boolean function. We use these methods on simulated data related to
ancestral genome reconstruction to show that computing Minimal Conflicting Set
is useful in discriminating between true positive and false positive ancestral
syntenies. We also study a dataset of yeast genomes and address the reliability
of an ancestral genome proposal of the Saccahromycetaceae yeasts.Comment: 20 pages, 3 figure
A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes
The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a long-standing problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes with diverging points of view regarding the performance of these two approaches. We describe a general methodological framework for reconstructing ancestral genome segments from conserved syntenies in extant genomes. We show that this problem, from a computational point of view, is naturally related to physical mapping of chromosomes and benefits from using combinatorial tools developed in this scope. We develop this framework into a new reconstruction method considering conserved gene clusters with similar gene content, mimicking principles used in most cytogenetic studies, although on a different kind of data. We implement and apply it to datasets of mammalian genomes. We perform intensive theoretical and experimental comparisons with other bioinformatics methods for ancestral genome segments reconstruction. We show that the method that we propose is stable and reliable: it gives convergent results using several kinds of data at different levels of resolution, and all predicted ancestral regions are well supported. The results come eventually very close to cytogenetics studies. It suggests that the comparison of methods for ancestral genome reconstruction should include the algorithmic aspects of the methods as well as the disciplinary differences in data aquisition
The evolutionary dynamics of protein-protein interaction networks inferred from the reconstruction of ancient networks
Cellular functions are based on the complex interplay of proteins, therefore
the structure and dynamics of these protein-protein interaction (PPI) networks
are the key to the functional understanding of cells. In the last years,
large-scale PPI networks of several model organisms were investigated.
Methodological improvements now allow the analysis of PPI networks of multiple
organisms simultaneously as well as the direct modeling of ancestral networks.
This provides the opportunity to challenge existing assumptions on network
evolution. We utilized present-day PPI networks from integrated datasets of
seven model organisms and developed a theoretical and bioinformatic framework
for studying the evolutionary dynamics of PPI networks. A novel filtering
approach using percolation analysis was developed to remove low confidence
interactions based on topological constraints. We then reconstructed the
ancient PPI networks of different ancestors, for which the ancestral proteomes,
as well as the ancestral interactions, were inferred. Ancestral proteins were
reconstructed using orthologous groups on different evolutionary levels. A
stochastic approach, using the duplication-divergence model, was developed for
estimating the probabilities of ancient interactions from today's PPI networks.
The growth rates for nodes, edges, sizes and modularities of the networks
indicate multiplicative growth and are consistent with the results from
independent static analysis. Our results support the duplication-divergence
model of evolution and indicate fractality and multiplicative growth as general
properties of the PPI network structure and dynamics
Reliable ABC model choice via random forests
Approximate Bayesian computation (ABC) methods provide an elaborate approach
to Bayesian inference on complex models, including model choice. Both
theoretical arguments and simulation experiments indicate, however, that model
posterior probabilities may be poorly evaluated by standard ABC techniques. We
propose a novel approach based on a machine learning tool named random forests
to conduct selection among the highly complex models covered by ABC algorithms.
We thus modify the way Bayesian model selection is both understood and
operated, in that we rephrase the inferential goal as a classification problem,
first predicting the model that best fits the data with random forests and
postponing the approximation of the posterior probability of the predicted MAP
for a second stage also relying on random forests. Compared with earlier
implementations of ABC model choice, the ABC random forest approach offers
several potential improvements: (i) it often has a larger discriminative power
among the competing models, (ii) it is more robust against the number and
choice of statistics summarizing the data, (iii) the computing effort is
drastically reduced (with a gain in computation efficiency of at least fifty),
and (iv) it includes an approximation of the posterior probability of the
selected model. The call to random forests will undoubtedly extend the range of
size of datasets and complexity of models that ABC can handle. We illustrate
the power of this novel methodology by analyzing controlled experiments as well
as genuine population genetics datasets. The proposed methodologies are
implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table
- …