17 research outputs found
A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering
Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5-15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself
A nuclear phylogenomic study of the angiosperm order Myrtales, exploring the potential and limitations of the universal Angiosperms353 probe set
Premise: To further advance the understanding of the species-rich, economically and ecologically important angiosperm order Myrtales in the rosid clade, comprising nine families, approximately 400 genera and almost 14,000 species occurring on all continents (except Antarctica), we tested the Angiosperms353 probe kit. Methods: We combined high-throughput sequencing and target enrichment with the Angiosperms353 probe kit to evaluate a sample of 485 species across 305 genera (76 of all genera in the order). Results: Results provide the most comprehensive phylogenetic hypothesis for the order to date. Relationships at all ranks, such as the relationship of the early-diverging families, often reflect previous studies, but gene conflict is evident, and relationships previously found to be uncertain often remain so. Technical considerations for processing HTS data are also discussed. Conclusions: High-throughput sequencing and the Angiosperms353 probe kit are powerful tools for phylogenomic analysis, but better understanding of the genetic data available is required to identify genes and gene trees that account for likely incomplete lineage sorting and/or hybridization events
Phylogenomics and the rise of the angiosperms
Angiosperms are the cornerstone of most terrestrial ecosystems and human livelihoods1,2. A robust understanding of angiosperm evolution is required to explain their rise to ecological dominance. So far, the angiosperm tree of life has been determined primarily by means of analyses of the plastid genome3,4. Many studies have drawn on this foundational work, such as classification and first insights into angiosperm diversification since their Mesozoic origins5,6,7. However, the limited and biased sampling of both taxa and genomes undermines confidence in the tree and its implications. Here, we build the tree of life for almost 8,000 (about 60%) angiosperm genera using a standardized set of 353 nuclear genes8. This 15-fold increase in genus-level sampling relative to comparable nuclear studies9 provides a critical test of earlier results and brings notable change to key groups, especially in rosids, while substantiating many previously predicted relationships. Scaling this tree to time using 200 fossils, we discovered that early angiosperm evolution was characterized by high gene tree conflict and explosive diversification, giving rise to more than 80% of extant angiosperm orders. Steady diversification ensued through the remaining Mesozoic Era until rates resurged in the Cenozoic Era, concurrent with decreasing global temperatures and tightly linked with gene tree conflict. Taken together, our extensive sampling combined with advanced phylogenomic methods shows the deep history and full complexity in the evolution of a megadiverse clade
Tackling Rapid Radiations With Targeted Sequencing
In phylogenetic studies across angiosperms, at various taxonomic levels, polytomies have persisted despite efforts to resolve them by increasing sampling of taxa and loci. The large amount of genomic data now available and statistical tools to analyze them provide unprecedented power for phylogenetic inference. Targeted sequencing has emerged as a strong tool for estimating species trees in the face of rapid radiations, lineage sorting, and introgression. Evolutionary relationships in Cyperaceae have been studied mostly using Sanger sequencing until recently. Despite ample taxon sampling, relationships in many genera remain poorly understood, hampered by diversification rates that outpace mutation rates in the loci used. The C4 Cyperus clade of the genus Cyperus has been particularly difficult to resolve. Previous studies based on a limited set of markers resolved relationships among Cyperus species using the C3 photosynthetic pathway, but not among C4 Cyperus clade taxa. We test the ability of two targeted sequencing kits to resolve relationships in the C4 Cyperus clade, the universal Angiosperms-353 kit and a Cyperaceae-specific kit. Sequences of the targeted loci were recovered from data generated with both kits and used to investigate overlap in data between kits and relative efficiency of the general and custom approaches. The power to resolve shallow-level relationships was tested using a summary species tree method and a concatenated maximum likelihood approach. High resolution and support are obtained using both approaches, but high levels of missing data disproportionately impact the latter. Targeted sequencing provides new insights into the evolution of morphology in the C4 Cyperus clade, demonstrating for example that the former segregate genus Alinula is polyphyletic despite its seeming morphological integrity. An unexpected result is that the Cyperus margaritaceus-Cyperus niveus complex comprises a clade separate from and sister to the core C4 Cyperus clade. Our results demonstrate that data generated with a family-specific kit do not necessarily have more power than those obtained with a universal kit, but that data generated with different targeted sequencing kits can often be merged for downstream analyses. Moreover, our study contributes to the growing consensus that targeted sequencing data are a powerful tool in resolving rapid radiations
Continental-scale patterns of nutrient and fish effects on shallow lakes: introduction to a pan-European mesocosm experiment
1. Shallow lake ecosystems are normally dominated by submerged and emergent plants. Biological stabilising mechanisms help preserve this dominance. The systems may switch to dominance by phytoplankton, however, with loss of submerged plants. This process usually takes place against a background of increasing nutrient loadings but also requires additional switch mechanisms, which damage the plants or interfere with their stabilising mechanisms. 2. The extent to which the details or even major features of this general model may change with geographical location are not clear. Manipulation of the fish community (biomanipulation) has often been used to clear the water of algae and restore the aquatic plants in northerly locations, but it is again not clear whether this is equally appropriate at lower latitudes. 3. Eleven parallel experiments (collectively the International Mesocosm Experiment, IME) were carried out in six lakes in Finland, Sweden, England, the Netherlands and Spain in 1998 and 1999 to investigate the between-year and large-scale spatial variation in relationships between nutrient loading and zooplanktivorous fish on submerged plant and plankton communities in shallow lakes. 4. Comparability of experiments in different locations was achieved to a high degree. Cross-laboratory comparisons of chemical analyses revealed some systematic differences between laboratories. These are unlikely to lead to major misinterpretations. 5. Nutrient addition, overall, had its greatest effect on water chemistry then substantial effects on phytoplankton and zooplankton. Fish addition had its major effect on zooplankton and did not systematically change the water chemistry. There was no trend in the relative importance of fish effects with latitude, but nutrient addition affected more variables with decreasing latitude. 6. The relative importance of top-down and bottom-up influences on the plankton differed in different locations and between years at the same location. The outcome of the experiments in different years was more predictable with decreasing latitude and this was attributed to more variable weather at higher latitudes that created more variable starting conditions for the experiments. [KEYWORDS: alternative stable states ; community structure ; eutrophication ; fish ; large-scale variation ; nutrients]