69 research outputs found
Research applications of primary biodiversity databases in the digital age
Our world is in the midst of unprecedented change-climate shifts and sustained, widespread habitat degradation have led to dramatic declines in biodiversity rivaling historical extinction events. At the same time, new approaches to publishing and integrating previously disconnected data resources promise to help provide the evidence needed for more efficient and effective conservation and management. Stakeholders have invested considerable resources to contribute to online databases of species occurrences. However, estimates suggest that only 10% of biocollections are available in digital form. The biocollections community must therefore continue to promote digitization efforts, which in part requires demonstrating compelling applications of the data. Our overarching goal is therefore to determine trends in use of mobilized species occurrence data since 2010, as online systems have grown and now provide over one billion records. To do this, we characterized 501 papers that use openly accessible biodiversity databases. Our standardized tagging protocol was based on key topics of interest, including: database(s) used, taxa addressed, general uses of data, other data types linked to species occurrence data, and data quality issues addressed
A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering
Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5-15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself
Phylogenomics and the rise of the angiosperms
Angiosperms are the cornerstone of most terrestrial ecosystems and human livelihoods1,2. A robust understanding of angiosperm evolution is required to explain their rise to ecological dominance. So far, the angiosperm tree of life has been determined primarily by means of analyses of the plastid genome3,4. Many studies have drawn on this foundational work, such as classification and first insights into angiosperm diversification since their Mesozoic origins5,6,7. However, the limited and biased sampling of both taxa and genomes undermines confidence in the tree and its implications. Here, we build the tree of life for almost 8,000 (about 60%) angiosperm genera using a standardized set of 353 nuclear genes8. This 15-fold increase in genus-level sampling relative to comparable nuclear studies9 provides a critical test of earlier results and brings notable change to key groups, especially in rosids, while substantiating many previously predicted relationships. Scaling this tree to time using 200 fossils, we discovered that early angiosperm evolution was characterized by high gene tree conflict and explosive diversification, giving rise to more than 80% of extant angiosperm orders. Steady diversification ensued through the remaining Mesozoic Era until rates resurged in the Cenozoic Era, concurrent with decreasing global temperatures and tightly linked with gene tree conflict. Taken together, our extensive sampling combined with advanced phylogenomic methods shows the deep history and full complexity in the evolution of a megadiverse clade
- …