85 research outputs found

    Polynomial algorithms for the Maximal Pairing Problem: efficient phylogenetic targeting on arbitrary trees

    Get PDF
    Background: The Maximal Pairing Problem (MPP) is the prototype of a class of combinatorial optimization problems that are of considerable interest in bioinformatics: Given an arbitrary phylogenetic tree T and weights ωxy for the paths between any two pairs of leaves (x, y), what is the collection of edge-disjoint paths between pairs of leaves that maximizes the total weight? Special cases of the MPP for binary trees and equal weights have been described previously; algorithms to solve the general MPP are still missing, however. Results: We describe a relatively simple dynamic programming algorithm for the special case of binary trees. We then show that the general case of multifurcating trees can be treated by interleaving solutions to certain auxiliary Maximum Weighted Matching problems with an extension of this dynamic programming approach, resulting in an overall polynomial-time solution of complexity (n^4 log n) w.r.t. the number n of leaves. The source code of a C implementation can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/Targeting. For binary trees, we furthermore discuss several constrained variants of the MPP as well as a partition function approach to the probabilistic version of the MPP. Conclusions: The algorithms introduced here make it possible to solve the MPP also for large trees with high-degree vertices. This has practical relevance in the field of comparative phylogenetics and, for example, in the context of phylogenetic targeting, i.e., data collection with resource limitations.Human Evolutionary Biolog

    An experimental study of Quartets MaxCut and other supertree methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Supertree methods represent one of the major ways by which the Tree of Life can be estimated, but despite many recent algorithmic innovations, matrix representation with parsimony (MRP) remains the main algorithmic supertree method.</p> <p>Results</p> <p>We evaluated the performance of several supertree methods based upon the Quartets MaxCut (QMC) method of Snir and Rao and showed that two of these methods usually outperform MRP and five other supertree methods that we studied, under many realistic model conditions. However, the QMC-based methods have scalability issues that may limit their utility on large datasets. We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled. Finally, we showed that the popular optimality criterion of minimizing the total topological distance of the supertree to the source trees is only weakly correlated with supertree topological accuracy. Therefore evaluating supertree methods on biological datasets is problematic.</p> <p>Conclusions</p> <p>Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods. Also, because topological accuracy depends upon taxon sampling strategies, attempts to construct very large phylogenetic trees using supertree methods should consider the selection of source tree datasets, as well as supertree methods. Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.</p

    A Functional Phylogenomic View of the Seed Plants

    Get PDF
    A novel result of the current research is the development and implementation of a unique functional phylogenomic approach that explores the genomic origins of seed plant diversification. We first use 22,833 sets of orthologs from the nuclear genomes of 101 genera across land plants to reconstruct their phylogenetic relationships. One of the more salient results is the resolution of some enigmatic relationships in seed plant phylogeny, such as the placement of Gnetales as sister to the rest of the gymnosperms. In using this novel phylogenomic approach, we were also able to identify overrepresented functional gene ontology categories in genes that provide positive branch support for major nodes prompting new hypotheses for genes associated with the diversification of angiosperms. For example, RNA interference (RNAi) has played a significant role in the divergence of monocots from other angiosperms, which has experimental support in Arabidopsis and rice. This analysis also implied that the second largest subunit of RNA polymerase IV and V (NRPD2) played a prominent role in the divergence of gymnosperms. This hypothesis is supported by the lack of 24nt siRNA in conifers, the maternal control of small RNA in the seeds of flowering plants, and the emergence of double fertilization in angiosperms. Our approach takes advantage of genomic data to define orthologs, reconstruct relationships, and narrow down candidate genes involved in plant evolution within a phylogenomic view of species' diversification

    Inferring Phylogenies from RAD Sequence Data

    Get PDF
    Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD) – the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct “known” phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for “total evidence” phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient numbers of orthologous restriction sites are retained across species

    A chemical survey of exoplanets with ARIEL

    Get PDF
    Thousands of exoplanets have now been discovered with a huge range of masses, sizes and orbits: from rocky Earth-like planets to large gas giants grazing the surface of their host star. However, the essential nature of these exoplanets remains largely mysterious: there is no known, discernible pattern linking the presence, size, or orbital parameters of a planet to the nature of its parent star. We have little idea whether the chemistry of a planet is linked to its formation environment, or whether the type of host star drives the physics and chemistry of the planet’s birth, and evolution. ARIEL was conceived to observe a large number (~1000) of transiting planets for statistical understanding, including gas giants, Neptunes, super-Earths and Earth-size planets around a range of host star types using transit spectroscopy in the 1.25–7.8 μm spectral range and multiple narrow-band photometry in the optical. ARIEL will focus on warm and hot planets to take advantage of their well-mixed atmospheres which should show minimal condensation and sequestration of high-Z materials compared to their colder Solar System siblings. Said warm and hot atmospheres are expected to be more representative of the planetary bulk composition. Observations of these warm/hot exoplanets, and in particular of their elemental composition (especially C, O, N, S, Si), will allow the understanding of the early stages of planetary and atmospheric formation during the nebular phase and the following few million years. ARIEL will thus provide a representative picture of the chemical nature of the exoplanets and relate this directly to the type and chemical environment of the host star. ARIEL is designed as a dedicated survey mission for combined-light spectroscopy, capable of observing a large and well-defined planet sample within its 4-year mission lifetime. Transit, eclipse and phase-curve spectroscopy methods, whereby the signal from the star and planet are differentiated using knowledge of the planetary ephemerides, allow us to measure atmospheric signals from the planet at levels of 10–100 part per million (ppm) relative to the star and, given the bright nature of targets, also allows more sophisticated techniques, such as eclipse mapping, to give a deeper insight into the nature of the atmosphere. These types of observations require a stable payload and satellite platform with broad, instantaneous wavelength coverage to detect many molecular species, probe the thermal structure, identify clouds and monitor the stellar activity. The wavelength range proposed covers all the expected major atmospheric gases from e.g. H2O, CO2, CH4 NH3, HCN, H2S through to the more exotic metallic compounds, such as TiO, VO, and condensed species. Simulations of ARIEL performance in conducting exoplanet surveys have been performed – using conservative estimates of mission performance and a full model of all significant noise sources in the measurement – using a list of potential ARIEL targets that incorporates the latest available exoplanet statistics. The conclusion at the end of the Phase A study, is that ARIEL – in line with the stated mission objectives – will be able to observe about 1000 exoplanets depending on the details of the adopted survey strategy, thus confirming the feasibility of the main science objectives.Peer reviewedFinal Published versio

    A simulation study comparing supertree and combined analysis methods using SMIDGen

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Supertree methods comprise one approach to reconstructing large molecular phylogenies given multi-marker datasets: trees are estimated on each marker and then combined into a tree (the "supertree") on the entire set of taxa. Supertrees can be constructed using various algorithmic techniques, with the most common being matrix representation with parsimony (MRP). When the data allow, the competing approach is a combined analysis (also known as a "supermatrix" or "total evidence" approach) whereby the different sequence data matrices for each of the different subsets of taxa are concatenated into a single supermatrix, and a tree is estimated on that supermatrix.</p> <p>Results</p> <p>In this paper, we describe an extensive simulation study we performed comparing two supertree methods, MRP and weighted MRP, to combined analysis methods on large model trees. A key contribution of this study is our novel simulation methodology (Super-Method Input Data Generator, or <it>SMIDGen</it>) that better reflects biological processes and the practices of systematists than earlier simulations. We show that combined analysis based upon maximum likelihood outperforms MRP and weighted MRP, giving especially big improvements when the largest subtree does not contain most of the taxa.</p> <p>Conclusions</p> <p>This study demonstrates that MRP and weighted MRP produce distinctly less accurate trees than combined analyses for a given base method (maximum parsimony or maximum likelihood). Since there are situations in which combined analyses are not feasible, there is a clear need for better supertree methods. The source tree and combined datasets used in this study can be used to test other supertree and combined analysis methods.</p

    Gene-rich UV sex chromosomes harbor conserved regulators of sexual development

    Get PDF
    Nonrecombining sex chromosomes, like the mammalian Y, often lose genes and accumulate transposable elements, a process termed degeneration. The correlation between suppressed recombination and degeneration is clear in animal XY systems, but the absence of recombination is confounded with other asymmetries between the X and Y. In contrast, UV sex chromosomes, like those found in bryophytes, experience symmetrical population genetic conditions. Here, we generate nearly gapless female and male chromosome-scale reference genomes of the moss Ceratodon purpureus to test for degeneration in the bryophyte UV sex chromosomes. We show that the moss sex chromosomes evolved over 300 million years ago and expanded via two chromosomal fusions. Although the sex chromosomes exhibit weaker purifying selection than autosomes, we find that suppressed recombination alone is insufficient to drive degeneration. Instead, the U and V sex chromosomes harbor thousands of broadly expressed genes, including numerous key regulators of sexual development across land plants

    Distribution of the anther-smut pathogen Microbotryum on species of the Caryophyllaceae

    Get PDF
    Artículo de publicación ISIUnderstanding disease distributions is of fundamental and applied importance, yet few studies benefit from integrating broad sampling with ecological and phylogenetic data. Here, anther-smut disease, caused by the fungus Microbotryum, was assessed using herbarium specimens of Silene and allied genera of the Caryophyllaceae. • A total of 42 000 herbarium specimens were examined, and plant geographical distributions and morphological and life history characteristics were tested as correlates of disease occurrence. Phylogenetic comparative methods were used to determine the association between disease and plant life-span. • Disease was found on 391 herbarium specimens from 114 species and all continents with native Silene. Anther smut occurred exclusively on perennial plants, consistent with the pathogen requiring living hosts to overwinter. The disease was estimated to occur in 80% of perennial species of Silene and allied genera. The correlation between plant life-span and disease was highly significant while controlling for the plant phylogeny, but the disease was not correlated with differences in floral morphology. • Using resources available in natural history collections, this study illustrates how disease distribution can be determined, not by restriction to a clade of susceptible hosts or to a limited geographical region, but by association with host life-span, a trait that has undergone frequent evolutionary transitions.We acknowledge grant support from the John Simon Guggenheim Memorial Foundation and the National Science Foundation (DEB-0747222) to MEH, the National Science Foundation Minority Postdoctoral Fellowship (DBI-0706721) to JIMA, University of Chile awards PFB-23 and ICM P05-002 to MTKA, and The Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS) support to BO, and Royal Society Incoming Fellowship and Center for Infection, Immunity, and Evolution Advanced Fellowship to ABP

    Increased gene sampling strengthens support for higher-level groups within leaf-mining moths and relatives (Lepidoptera: Gracillariidae)

    Get PDF
    Background: Researchers conducting molecular phylogenetic studies are frequently faced with the decision of what to do when weak branch support is obtained for key nodes of importance. As one solution, the researcher may choose to sequence additional orthologous genes of appropriate evolutionary rate for the taxa in the study. However, generating large, complete data matrices can become increasingly difficult as the number of characters increases. A few empirical studies have shown that augmenting genes even for a subset of taxa can improve branch support. However, because each study differs in the number of characters and taxa, there is still a need for additional studies that examine whether incomplete sampling designs are likely to aid at increasing deep node resolution. We target Gracillariidae, a Cretaceous-age (similar to 100 Ma) group of leaf-mining moths to test whether the strategy of adding genes for a subset of taxa can improve branch support for deep nodes. We initially sequenced ten genes (8,418 bp) for 57 taxa that represent the major lineages of Gracillariidae plus outgroups. After finding that many deep divergences remained weakly supported, we sequenced eleven additional genes (6,375 bp) for a 27-taxon subset. We then compared results from different data sets to assess whether one sampling design can be favored over another. The concatenated data set comprising all genes and all taxa and three other data sets of different taxon and gene sub-sampling design were analyzed with maximum likelihood. Each data set was subject to five different models and partitioning schemes of non-synonymous and synonymous changes. Statistical significance of non-monophyly was examined with the Approximately Unbiased (AU) test. Results: Partial augmentation of genes led to high support for deep divergences, especially when non-synonymous changes were analyzed alone. Increasing the number of taxa without an increase in number of characters led to lower bootstrap support; increasing the number of characters without increasing the number of taxa generally increased bootstrap support. More than three-quarters of nodes were supported with bootstrap values greater than 80% when all taxa and genes were combined. Gracillariidae, Lithocolletinae + Leucanthiza, and Acrocercops and Parectopa groups were strongly supported in nearly every analysis. Gracillaria group was well supported in some analyses, but less so in others. We find strong evidence for the exclusion of Douglasiidae from Gracillarioidea sensu Davis and Robinson (1998). Our results strongly support the monophyly of a G.B.R.Y. clade, a group comprised of Gracillariidae + Bucculatricidae + Roeslerstammiidae + Yponomeutidae, when analyzed with non-synonymous changes only, but this group was frequently split when synonymous and non-synonymous substitutions were analyzed together. Conclusions: 1) Partially or fully augmenting a data set with more characters increased bootstrap support for particular deep nodes, and this increase was dramatic when non-synonymous changes were analyzed alone. Thus, the addition of sites that have low levels of saturation and compositional heterogeneity can greatly improve results. 2) Gracillarioidea, as defined by Davis and Robinson (1998), clearly do not include Douglasiidae, and changes to current classification will be required. 3) Gracillariidae were monophyletic in all analyses conducted, and nearly all species can be placed into one of six strongly supported clades though relationships among these remain unclear. 4) The difficulty in determining the phylogenetic placement of Bucculatricidae is probably attributable to compositional heterogeneity at the third codon position. From our tests for compositional heterogeneity and strong bootstrap values obtained when synonymous changes are excluded, we tentatively conclude that Bucculatricidae is closely related to Gracillariidae + Roeslerstammiidae + Yponomeutidae
    corecore