10 research outputs found

    Herbarium-Derived Phenological Data in North America

    No full text
    We present infrastructure for developing large-scale and long-term phenological datasets across multiple herbaria, as well as a sample dataset that has been acquired from the digital archives of 440 distinct herbaria across North America and further processed to evaluate phenological status. This dataset contains 2,319,672 specimen records of plants collected while reproductively active. These data have been modified to explicitly codify the observed phenological status of each specimen at the time of collection, and to remove specimens for which information essential to assessing their phenology or the corresponding climate conditions in the year and location of collection were missing. As different collectors have used distinct taxonomic schema over space and time in documenting the specimens being collected, these data were also rectified into a single unified taxonomic schema to ensure that consistent taxon names were used throughout the dataset. Further, this data has been united with long-term and annual climate conditions in the year and location of collection, as derived from PRISM climate data (https://www.prism.oregonstate.edu/). To date, this data includes 2,319,672 specimens across 25,429 plant taxa. However, this represents a living dataset that will continue to be updated as digitization efforts proceed and additional digital specimen records become available.Code was written in python 3.7. Multiple python packages are required to run these packages (see attached .yml file for full list). We recommend the usage of Anaconda for constructing the python environment and installing the python packages required to produce this dataset, including the PhenoColl package that was developed for this project (https://doi.org/10.5281/zenodo.8323153) Funding provided by: National Science FoundationCrossref Funder Registry ID: http://dx.doi.org/10.13039/100000001Award Number: DEB-2105932Phenological data pertaining to flowering times in this dataset consist of 2,319,672 specimen records of plant species collected in flower, while strobilating, or while fertile (this last category primarily applied to graminoids). These data were derived from the digital archives of 440 herbaria (see Readme for full listing), and subsequently cleaned and modified using several criteria described below to facilitate their use in phenological assessment. To ensure the quality of the data used in this study, specimens were included in the dataset analyzed here only if, at the time of digitization, herbarium personnel had: verified that the specimens were collected when in flower, strobilating, or fertile; recorded GPS coordinates of the location from which the specimen was collected; and provided the precise date of collection (including month, date, and year). Only those specimens that were explicitly recorded reproductive status within either the DarwinCore "reproductivecondition" or "lifestage" fields of their source's database were included in this study. The taxonomic nomenclature used to describe each specimen was standardized using the Taxonomic Name Resolution Service iPlant Collaborative, Version 4.0 (Boyle et al., 2013, Accessed: 30 August 2021; https://tnrs.biendata.org/). Duplicate collections of a species at the same location, DOY, year, and location were also removed. The resulting dataset included 2,319,672 specimens distributed throughout North America. Climate data associated with the year and location of each specimen collection was then integrated into this data. All climate data was drawn from PRISM climate data (https://www.prism.oregonstate.edu/) and incorporated both long-term normal conditions at the location of each collection as well as the predicted conditions in the year and location of each collection

    Herbarium-Derived Phenological Data in North America

    No full text
    We present infrastructure for developing large-scale and long-term phenological datasets across multiple herbaria, as well as a sample dataset that has been acquired from the digital archives of 440 distinct herbaria across North America and further processed to evaluate phenological status. This dataset contains 2,319,672 specimen records of plants collected while reproductively active. These data have been modified to explicitly codify the observed phenological status of each specimen at the time of collection, and to remove specimens for which information essential to assessing their phenology or the corresponding climate conditions in the year and location of collection were missing. As different collectors have used distinct taxonomic schema over space and time in documenting the specimens being collected, these data were also rectified into a single unified taxonomic schema to ensure that consistent taxon names were used throughout the dataset. Further, this data has been united with long-term and annual climate conditions in the year and location of collection, as derived from PRISM climate data (https://www.prism.oregonstate.edu/). To date, this data includes 2,319,672 specimens across 25,429 plant taxa. However, this represents a living dataset that will continue to be updated as digitization efforts proceed and additional digital specimen records become available.Code was written in python 3.7. Multiple python packages are required to run these packages (see attached .yml file for full list). We recommend the usage of Anaconda for constructing the python environment and installing the python packages required to produce this dataset, including the PhenoColl package that was developed for this project (https://doi.org/10.5281/zenodo.8323153) Funding provided by: National Science FoundationCrossref Funder Registry ID: http://dx.doi.org/10.13039/100000001Award Number: DEB-2105932Phenological data pertaining to flowering times in this dataset consist of 2,319,672 specimen records of plant species collected in flower, while strobilating, or while fertile (this last category primarily applied to graminoids). These data were derived from the digital archives of 440 herbaria (see Readme for full listing), and subsequently cleaned and modified using several criteria described below to facilitate their use in phenological assessment. To ensure the quality of the data used in this study, specimens were included in the dataset analyzed here only if, at the time of digitization, herbarium personnel had: verified that the specimens were collected when in flower, strobilating, or fertile; recorded GPS coordinates of the location from which the specimen was collected; and provided the precise date of collection (including month, date, and year). Only those specimens that were explicitly recorded reproductive status within either the DarwinCore "reproductivecondition" or "lifestage" fields of their source's database were included in this study. The taxonomic nomenclature used to describe each specimen was standardized using the Taxonomic Name Resolution Service iPlant Collaborative, Version 4.0 (Boyle et al., 2013, Accessed: 30 August 2021; https://tnrs.biendata.org/). Duplicate collections of a species at the same location, DOY, year, and location were also removed. The resulting dataset included 2,319,672 specimens distributed throughout North America. Climate data associated with the year and location of each specimen collection was then integrated into this data. All climate data was drawn from PRISM climate data (https://www.prism.oregonstate.edu/) and incorporated both long-term normal conditions at the location of each collection as well as the predicted conditions in the year and location of each collection

    Herbarium-Derived Phenological Data in North America

    No full text
    We present infrastructure for developing large-scale and long-term phenological datasets across multiple herbaria, as well as a sample dataset that has been acquired from the digital archives of 440 distinct herbaria across North America and further processed to evaluate phenological status. This dataset contains 2,319,672 specimen records of plants collected while reproductively active. These data have been modified to explicitly codify the observed phenological status of each specimen at the time of collection, and to remove specimens for which information essential to assessing their phenology or the corresponding climate conditions in the year and location of collection were missing. As different collectors have used distinct taxonomic schema over space and time in documenting the specimens being collected, these data were also rectified into a single unified taxonomic schema to ensure that consistent taxon names were used throughout the dataset. Further, this data has been united with long-term and annual climate conditions in the year and location of collection, as derived from PRISM climate data (https://www.prism.oregonstate.edu/). To date, this data includes 2,319,672 specimens across 25,429 plant taxa. However, this represents a living dataset that will continue to be updated as digitization efforts proceed and additional digital specimen records become available.Code was written in python 3.7. Multiple python packages are required to run these packages (see attached .yml file for full list). We recommend the usage of Anaconda for constructing the python environment and installing the python packages required to produce this dataset, including the PhenoColl package that was developed for this project (https://doi.org/10.5281/zenodo.8323153) Funding provided by: National Science FoundationCrossref Funder Registry ID: http://dx.doi.org/10.13039/100000001Award Number: DEB-2105932Phenological data pertaining to flowering times in this dataset consist of 2,319,672 specimen records of plant species collected in flower, while strobilating, or while fertile (this last category primarily applied to graminoids). These data were derived from the digital archives of 440 herbaria (see Readme for full listing), and subsequently cleaned and modified using several criteria described below to facilitate their use in phenological assessment. To ensure the quality of the data used in this study, specimens were included in the dataset analyzed here only if, at the time of digitization, herbarium personnel had: verified that the specimens were collected when in flower, strobilating, or fertile; recorded GPS coordinates of the location from which the specimen was collected; and provided the precise date of collection (including month, date, and year). Only those specimens that were explicitly recorded reproductive status within either the DarwinCore "reproductivecondition" or "lifestage" fields of their source's database were included in this study. The taxonomic nomenclature used to describe each specimen was standardized using the Taxonomic Name Resolution Service iPlant Collaborative, Version 4.0 (Boyle et al., 2013, Accessed: 30 August 2021; https://tnrs.biendata.org/). Duplicate collections of a species at the same location, DOY, year, and location were also removed. The resulting dataset included 2,319,672 specimens distributed throughout North America. Climate data associated with the year and location of each specimen collection was then integrated into this data. All climate data was drawn from PRISM climate data (https://www.prism.oregonstate.edu/) and incorporated both long-term normal conditions at the location of each collection as well as the predicted conditions in the year and location of each collection
    corecore