12 research outputs found

    Digitization Workflows for Flat Sheets and Packets of Plants, Algae, and Fungi

    Get PDF
    Effective workflows are essential components in the digitization of biodiversity specimen collections. To date, no comprehensive, community-vetted workflows have been published for digitizing flat sheets and packets of plants, algae, and fungi, even though latest estimates suggest that only 33% of herbarium specimens have been digitally transcribed, 54% of herbaria use a specimen database, and 24% are imaging specimens. In 2012, iDigBio, the U.S. National Science Foundation’s (NSF) coordinating center and national resource for the digitization of public, nonfederal U.S. collections, launched several working groups to address this deficiency. Here, we report the development of 14 workflow modules with 7–36 tasks each. These workflows represent the combined work of approximately 35 curators, directors, and collections managers representing more than 30 herbaria, including 15 NSF-supported plant-related Thematic Collections Networks and collaboratives. The workflows are provided for download as Portable Document Format (PDF) and Microsoft Word files. Customization of these workflows for specific institutional implementation is encouraged

    Spatial Phylogenetics of Florida Vascular Plants: The Effects of Calibration and Uncertainty on Diversity Estimates

    Get PDF
    Summary: Recent availability of biodiversity data resources has enabled an unprecedented ability to estimate phylogenetically based biodiversity metrics over broad scales. Such approaches elucidate ecological and evolutionary processes yielding a biota and help guide conservation efforts. However, the choice of appropriate phylogenetic resources and underlying input data uncertainties may affect interpretation. Here, we address how differences among phylogenetic source trees and levels of phylogenetic uncertainty affect these metrics and test existing hypotheses regarding geographic biodiversity patterns across the diverse vascular plant flora of Florida, US. Ecological niche models for 1,490 Florida species were combined with a “purpose-built” phylogenetic tree (phylogram and chronogram), as well as with trees derived from community resources (Phylomatic and Open Tree of Life). There were only modest differences in phylodiversity metrics given the phylogenetic source tree and taking into account the level of phylogenetic uncertainty; we identify similar areas of conservation interest across Florida regardless of the method used. : Spatial Phylogenetics; Plant Biology; Biogeography Subject Areas: Spatial Phylogenetics, Plant Biology, Biogeograph

    Data from: A new resource for the development of SSR markers: millions of loci from a thousand plant transcriptomes

    No full text
    Premise of the study: The One Thousand Plant Transcriptomes Project (1KP, 1000+ assembled plant transcriptomes) provides an enormous resource for developing microsatellite loci across the plant tree of life. We developed loci from these transcriptomes and tested their utility. Methods and Results: Using software packages and custom scripts, we identified microsatellite loci in 1KP transcriptomes. We assessed the potential for cross-amplification and whether loci were biased toward exons, as compared to markers derived from genomic DNA. We characterized over 5.7 million simple sequence repeat (SSR) loci from 1334 plant transcriptomes. Eighteen percent of loci substantially overlapped with open reading frames (ORFs), and electronic PCR revealed that over half the loci would amplify successfully in conspecific taxa. Transcriptomic SSRs were approximately three times more likely to map to translated regions than genomic SSRs. Conclusions: We believe microsatellites still have a place in the genomic age—they remain effective and cost-efficient markers. The loci presented here are a valuable resource for researchers

    GCF_000004515.3_V1.1_genomic

    No full text
    The script "BLAST_to_Coding_SSR.R” (https://github.com/soltislab/transcriptome_microsats/blob/master/BLAST_to_Coding_SSR.R) uses a .gff file (annotated Glycine max genome from NCBI), and a BLAST report for SSR Loci blasted against the Glycine max genome to prepare two files, which will be used in a subsequent script (Coding_SSR.py -- https://github.com/soltislab/transcriptome_microsats/blob/master/Coding_SSR.py) to determine which loci are in translated regions of the genome (i.e., regions that are annotated as "CDS"). The output of this script is two files (one contains the SSR loci identified from the BLAST search, with some unncessary columns and duplicates removed, and the other contains the regions of the Glycine max genome that are annotated as "CDS"). This file is the gff file needed for the script

    G_R_0803-0253-HitTable

    No full text
    The script "BLAST_to_Coding_SSR.R” (https://github.com/soltislab/transcriptome_microsats/blob/master/BLAST_to_Coding_SSR.R) uses a .gff file (annotated Glycine max genome from NCBI), and a BLAST report for SSR Loci blasted against the Glycine max genome to prepare two files, which will be used in a subsequent script (Coding_SSR.py -- https://github.com/soltislab/transcriptome_microsats/blob/master/Coding_SSR.py) to determine which loci are in translated regions of the genome (i.e., regions that are annotated as "CDS"). The output of this script is two files (one contains the SSR loci identified from the BLAST search, with some unncessary columns and duplicates removed, and the other contains the regions of the Glycine max genome that are annotated as "CDS"). This is the third of three files needed to run the script

    Scaffolds

    No full text
    The Scaffolds.zip file is a compressed zip file containing a directory with the scaffolds corresponding to the microsatellite loci. The Scaffolds directory contains the sequence of the scaffolds that had microsatellites identified on them. The files are bz2 compressed zip files. The unzipped file is a fasta file with all of the scaffolds corresponding to the loci in the LocusInfo directory

    glycine_max_454_raw_all

    No full text
    A fasta file is necessary to run pal_finder. The file included here is a fasta file of raw 454 genomic reads from Glycine max (NCBI Trace Archive (TI 1732557604-1733276192; Swaminathan et al., 2007))

    G_R_0764-0510-HitTable

    No full text
    The script "BLAST_to_Coding_SSR.R” (https://github.com/soltislab/transcriptome_microsats/blob/master/BLAST_to_Coding_SSR.R) uses a .gff file (annotated Glycine max genome from NCBI), and a BLAST report for SSR Loci blasted against the Glycine max genome to prepare two files, which will be used in a subsequent script (Coding_SSR.py -- https://github.com/soltislab/transcriptome_microsats/blob/master/Coding_SSR.py) to determine which loci are in translated regions of the genome (i.e., regions that are annotated as "CDS"). The output of this script is two files (one contains the SSR loci identified from the BLAST search, with some unncessary columns and duplicates removed, and the other contains the regions of the Glycine max genome that are annotated as "CDS"). This is the second of three files needed to run the script
    corecore