8 research outputs found

    Applications of deep convolutional neural networks to digitized natural history collections

    Get PDF
    Natural history collections contain data that are critical for many scientific endeavors. Recent efforts in mass digitization are generating large datasets from these collections that can provide unprecedented insight. Here, we present examples of how deep convolutional neural networks can be applied in analyses of imaged herbarium specimens. We first demonstrate that a convolutional neural network can detect mercury-stained specimens across a collection with 90% accuracy. We then show that such a network can correctly distinguish two morphologically similar plant families 96% of the time. Discarding the most challenging specimen images increases accuracy to 94% and 99%, respectively. These results highlight the importance of mass digitization and deep learning approaches and reveal how they can together deliver powerful new investigative tools

    Parenting and child and adolescent mental health during the COVID-19 pandemic

    No full text
    Background: Early work indicates the significant impact of the COVID-19 pandemic on the mental health of children and adolescents. Understanding which children may be more at risk for mental health problems, and which risk factors are amenable to change is crucial. The importance of studying children’s mental health within the context of the family system is recognized. Methods: The current study investigated associations between parent factors, and children’s mental health during the early phase of the COVID-19 pandemic across a number of Western countries (primarily Australia and the United Kingdom). Parents (N = 385) reported on their pandemic-related stress, mental health, and parenting behaviors, in addition to mental health changes in their 5-17 year old children (N = 481) during April/May 2020. Results: Analyses revealed significant associations between parent COVID-19 pandemic stress, parent depression, anxiety and stress symptoms, and increases in child internalizing and externalizing problems. Harsh parenting behavior was associated with trauma symptoms and increases in externalizing problems. Further, some associations were more pronounced for children with existing mental health problems, and for disadvantaged and single parent families. Limitations: The data was cross-sectional, the majority of participant parents were female, and all data were parent-report. Conclusions: Findings suggest the importance of parents in influencing children’s mental health during the acute phase of the COVID-19 pandemic. Further work is needed to investigate longer-term impacts

    A Pipeline for Deep Learning with Specimen Images in iDigBio - Applying and Generalizing an Examination of Mercury Use in Preparing Herbarium Specimens

    No full text
    iDigBio Matsunaga et al. 2013 currently references over 22 million media files, and stores approximately 120 terabytes worth of those media files co-located with our compute infrastructure. Using these images for scientific research is a logistical and technical challenge. Transferring large numbers of images requires programming skill, bandwidth, and storage space. While simple image transformations such as resizing and generating histograms are approachable on desktops and laptops, the neural networks commonly used for learning from images require server-based graphical processing units (GPUs) to run effectively. Using the GUODA (Global Unified Open Data Access) infrastructure, we have built a model pipeline for applying user-defined processing to any subset of the images stored in iDigBio. This pipeline is run on servers located in the Advanced Computing and Information Systems lab (ACIS) alongside the iDigBio storage system. We use Apache Spark, the Hadoop File System (HDFS), and Mesos to perform the processing. We have placed a Jupyter notebook server in front of this architecture which provides an easy environment with deep learning libraries for Python already loaded for end users to write their own models. Users can access the stored data and images and manipulate them according to their requirements and make their work publicly available on GitHub. As an example of how this pipeline can be used in research, we applied a neural network developed at the Smithsonian Institution to identify herbarium sheets that were prepared with hazardous mercury containing solutions Schuettpelz et al. 2017. The model was trained with Smithsonian resources on their images and transferred to the GUODA infrastructure hosted at ACIS which also houses iDigBio. We then applied this model to additional images in iDigBio to classify them to illustrate the application of these techniques to broad image corpora potentially to notify other data publishers of contamination. We present the results of this classification not as a verified research result, but as an example of the collaborative and scalable workflows this pipeline and infrastructure enable

    A Pipeline for Deep Learning with Specimen Images in iDigBio - Applying and Generalizing an Examination of Mercury Use in Preparing Herbarium Specimens

    No full text
    iDigBio Matsunaga et al. 2013 currently references over 22 million media files, and stores approximately 120 terabytes worth of those media files co-located with our compute infrastructure. Using these images for scientific research is a logistical and technical challenge. Transferring large numbers of images requires programming skill, bandwidth, and storage space. While simple image transformations such as resizing and generating histograms are approachable on desktops and laptops, the neural networks commonly used for learning from images require server-based graphical processing units (GPUs) to run effectively. Using the GUODA (Global Unified Open Data Access) infrastructure, we have built a model pipeline for applying user-defined processing to any subset of the images stored in iDigBio. This pipeline is run on servers located in the Advanced Computing and Information Systems lab (ACIS) alongside the iDigBio storage system. We use Apache Spark, the Hadoop File System (HDFS), and Mesos to perform the processing. We have placed a Jupyter notebook server in front of this architecture which provides an easy environment with deep learning libraries for Python already loaded for end users to write their own models. Users can access the stored data and images and manipulate them according to their requirements and make their work publicly available on GitHub. As an example of how this pipeline can be used in research, we applied a neural network developed at the Smithsonian Institution to identify herbarium sheets that were prepared with hazardous mercury containing solutions Schuettpelz et al. 2017. The model was trained with Smithsonian resources on their images and transferred to the GUODA infrastructure hosted at ACIS which also houses iDigBio. We then applied this model to additional images in iDigBio to classify them to illustrate the application of these techniques to broad image corpora potentially to notify other data publishers of contamination. We present the results of this classification not as a verified research result, but as an example of the collaborative and scalable workflows this pipeline and infrastructure enable

    A Pipeline for Processing Specimen Images in iDigBio - Applying and Generalizing an Examination of Mercury Use in Preparing Herbarium Specimens

    No full text
    iDigBio currently references over 22 million media files, and stores approximately 120 terabytes worth of those media files co-located with our computing infrastructure (Matsunaga et al. 2013). Using these images for scientific research is a logistical and technical challenge. Transferring large numbers of images requires programming skill, bandwidth, and storage space. While simple image transformations such as resizing and generating histograms are approachable on desktops and laptops, the neural networks commonly used for learning from images require server-based graphical processing units (GPUs) to run effectively. Using the GUODA (Global Unified Open Data Access) infrastructure, we are building a model pipeline for applying user-defined processing to all or any subset of images stored in iDigBio on servers located in the Advanced Computing and Information Systems lab (ACIS) alongside the iDigBio storage system. This pipeline utilizes Apache Spark, the Hadoop File System (HDFS), and Mesos (Collins et al. 2017). We have placed a Jupyter notebook server in front of this architecture, which provides an easy environment for end users to write their own Python or R software programs. Users can access the stored data and images and manipulate them per their requirements and make their work publicly available on GitHub. As an example of how this pipeline can be used in research, we are applying a neural network developed at the Smithsonian Institution to identify herbarium sheets that were prepared with hazardous mercury-containing solutions (Schuettpelz, in preparation). The model was trained on Smithsonian servers using their herbarium images and it is being transferred to the GUODA infrastructure hosted at the ACIS lab. All herbarium images in iDigBio are being classified using this model to illustrate the application of these techniques to larger sets of images using a deep convolutional neural network that detects visible mercury crystallization present on digitized herbarium sheets. Such an automated detection process can potentially be used, for instance, to notify other data publishers of any contamination. We are presenting the results of this classification not as a verified research result, but as an example of the collaborative and scalable workflows this pipeline and infrastructure enable

    Digitization Coordination Workshop Report

    No full text
    Many larger museums and archives have begun to implement a centralized approach to digitization of collections by creating Digitization Coordinator positions. This new effort has initiated a singular vision for digitization that incorporates priorities, workflows, and resources to greatly improve the efficiency and throughput of digitization in collections. Smaller institutions are now starting to see the benefit of creating a more structured cross-disciplinary approach to digitization, allowing for better awareness and resourcing of digitization needs.The workshop brought together natural sciences digitization professionals from the USA and EU, highlighting lessons learned and best practices to realize the benefits of a coordinated approach including advocacy for digitization, accelerating digitization efficiency and, ultimately, increasing digital collections access and usability to address societal challenges, such as biodiversity decline. Insights, lessons learned and initial thoughts on best practices are described, and the supporting workshop resources are shared so that others can benefit

    The colonial legacy of herbaria

    No full text
    Herbarium collections shape our understanding of the world’s flora and are crucial for addressing global change and biodiversity conservation. The formation of such natural history collections, however, are not free from sociopolitical issues of immediate relevance. Despite increasing efforts addressing issues of representation and colonialism in natural history collections, herbaria have received comparatively less attention. While it has been noted that the majority of plant specimens are housed in the global North, the extent of this disparity has not been rigorously quantified to date. Here, by analyzing over 85 million specimen records and surveying herbaria across the globe, we assess the colonial legacy of botanical collections and how we may move towards a more inclusive future. We demonstrate that colonial exploitation has contributed to an inverse relationship between where plant biodiversity exists in nature and where it is housed in herbaria. Such disparities persist in herbaria across physical and digital realms despite overt colonialism having ended over half a century ago, suggesting ongoing digitization and decolonization efforts have yet to alleviate colonial-era discrepancies. We emphasize the need for acknowledging the inconvenient history of herbarium collections and the implementation of a more equitable, global paradigm for their collection, curation, and use

    The colonial legacy of herbaria

    No full text
    Herbarium collections shape our understanding of Earth’s flora and are crucial for addressing global change issues. Their formation, however, is not free from sociopolitical issues of immediate relevance. Despite increasing efforts addressing issues of representation and colonialism in natural history collections, herbaria have received comparatively less attention. While it has been noted that the majority of plant specimens are housed in the Global North, the extent and magnitude of this disparity have not been quantified. Here we examine the colonial legacy of botanical collections, analysing 85,621,930 specimen records and assessing survey responses from 92 herbarium collections across 39 countries. We find an inverse relationship between where plant diversity exists in nature and where it is housed in herbaria. Such disparities persist across physical and digital realms despite overt colonialism ending over half a century ago. We emphasize the need for acknowledging the colonial history of herbarium collections and implementing a more equitable global paradigm for their collection, curation and use
    corecore