11 research outputs found

    Machine learning as a service for DiSSCo’s digital specimen architecture

    Get PDF
    International mass digitization efforts through infrastructures like the European Distributed System of Scientific Collections (DiSSCo), the US resource for Digitization of Biodiversity Collections (iDigBio), the National Specimen Information Infrastructure (NSII) of China, and Australia’s digitization of National Research Collections (NRCA Digital) make geo- and biodiversity specimen data freely, fully and directly accessible. Complementary, overarching infrastructure initiatives like the European Open Science Cloud (EOSC) were established to enable mutual integration, interoperability and reusability of multidisciplinary data streams including biodiversity, Earth system and life sciences. Natural Science Collections (NSC) are of particular importance for such multidisciplinary and internationally linked infrastructures, since they provide hard scientific evidence by allowing direct traceability of derived data (e.g., images, sequences, measurements) to physical specimens and material samples in NSC. To open up the large amounts of trait and habitat data and to link these data to digital resources like sequence databases (e.g., ENA), taxonomic infrastructures (e.g., GBIF) or environmental repositories (e.g., PANGAEA), proper annotation of specimen data with rich (meta)data early in the digitization process is required, next to bridging technologies to facilitate the reuse of these data. This was addressed in recent studies, where we employed computational image processing and artificial intelligence technologies (Deep Learning) for the classification and extraction of features like organs and morphological traits from digitized collection data (with a focus on herbarium sheets). However, such applications of artificial intelligence are rarely—this applies both for (sub-symbolic) machine learning and (symbolic) ontology-based annotations—integrated in the workflows of NSC’s management systems, which are the essential repositories for the aforementioned integration of data streams. This was the motivation for the development of a Deep Learning-based trait extraction and coherent Digital Specimen (DS) annotation service providing “Machine learning as a Service” (MLaaS) with a special focus on interoperability with the core services of DiSSCo, notably the DS Repository (nsidr.org) and the Specimen Data Refinery, as well as reusability within the data fabric of EOSC. Taking up the use case to detect and classify regions of interest (ROI) on herbarium scans, we demonstrate a MLaaS prototype for DiSSCo involving the digital object framework, Cordra, for the management of DS as well as instant annotation of digital objects with extracted trait features (and ROIs) based on the DS specification openDS

    Extracting Trait Data from Digitized Herbarium Specimens Using Deep Convolutional Networks

    Get PDF
    Herbarium collections have been the foundation of taxonomical research for centuries and become increasingly important for related fields such as plant ecology or biogeography. Herbaria worldwide are estimated to include c. 400 million specimens, by inclusion of type specimens cover with few exceptions all known plant taxa (c. 350 000 species) and have a temporal dimension that is reached by only few other botanical data sources. Presently, c. 13.5 million digitized herbarium specimens are available online via institutional websites or aggregating websites like GBIF. We used these specimen images in combination with morphological trait data obtained from TRY and the FLOPO knowledge base in order to train deep convolutional networks to recognize these traits as well as phenological states from specimen images. To improve trait recognition, we expanded our approach to include high resolution scans to enable fine grain feature extraction. Furthermore we analyze differences in recognizability of traits depending on trait group (e.g. leaf traits) or higher taxa. Newly mobilized trait data will be used to improve our trait databases. Our approach is described in detail and performance in the recognition of different traits is analyzed and discussed

    Plant organ detections and annotations on digitized herbarium scans

    No full text
    This sample dataset contains digitized herbarium scans, taken from Herbarium Senckenbergianum collection, labeled with six types of plant organs. There are two types of labels, detections and annotations respectively. The detections were predicted using deep learning based object detection model and annotations were done manually based on those predictions

    A Workflow for Data Extraction from Digitized Herbarium Specimens

    No full text
    Based on own work on species and trait recognition and complementary studies from other working groups, we present a workflow for data extraction from digitized herbarium specimens using convolutional neural networks. Digitized herbarium sheets contain: preserved plant material as well as additional objects: the label containing information on the collection event, annotations such as revision labels, or notes on material extraction, identifiers such as barcodes or numbers, envelopes for loose plant material and often scale bars and color charts used in the digitization process. In order to treat these objects appropriately, segmentation techniques (Triki et al. 2018) will be applied to localize and identify the different kinds of objects for specific treatments. Detecting presence of plant organs such as leaves, flowers or fruits is already a first step in data extraction potentially useful for phenological studies. Plant organs will be subject to routines for quantitative (Gaikwad et al. 2018) and qualitative (Younis et al. 2018) trait recognition routines. Text-based objects can be treated as described by Kirchhoff et al. 2018, using OCR techniques and considering the many collection-specific terms and abbreviations as described in Schröder 2019. Additionally, species recognition (Younis et al. 2018) will be applied in order to help further identification of incompletely identified collection items or to detect possible misidentifications. All steps described above need sufficient training data including labelling that may be obtained from collection metadata and trait databases. In order to deal with new incoming digitized collections, unseen data or categories, we propose implementation of a new Deep Learning approach, so-called Lifelong Learning: Past knowledge of the network is dynamically saved in latent space using autoencoder and generatively replayed while the network is trained on new tasks which enables it to solve complex image processing tasks without forgetting former knowledge while incrementally learning new classes and knowledge

    Detection and annotation of plant organs from digitised herbarium scans using deep learning

    No full text
    As herbarium specimens are increasingly becoming digitised and accessible in online repositories, advanced computer vision techniques are being used to extract information from them. The presence of certain plant organs on herbarium sheets is useful information in various scientific contexts and automatic recognition of these organs will help mobilise such information. In our study, we use deep learning to detect plant organs on digitised herbarium specimens with Faster R-CNN. For our experiment, we manually annotated hundreds of herbarium scans with thousands of bounding boxes for six types of plant organs and used them for training and evaluating the plant organ detection model. The model worked particularly well on leaves and stems, while flowers were also present in large numbers in the sheets, but were not equally well recognised

    Nature 4.0: A networked sensor system for integrated biodiversity monitoring

    No full text
    Zeuss D, Bald L, Gottwald J, et al. Nature 4.0: A networked sensor system for integrated biodiversity monitoring. Global Change Biology. 2024;30(1): e17056.**Abstract** Ecosystem functions and services are severely threatened by unprecedented global loss in biodiversity. To counteract these trends, it is essential to develop systems to monitor changes in biodiversity for planning, evaluating, and implementing conservation and mitigation actions. However, the implementation of monitoring systems suffers from a trade‐off between grain (i.e., the level of detail), extent (i.e., the number of study sites), and temporal repetition. Here, we present an applied and realized networked sensor system for integrated biodiversity monitoring in the Nature 4.0 project as a solution to these challenges, which considers plants and animals not only as targets of investigation, but also as parts of the modular sensor network by carrying sensors. Our networked sensor system consists of three main closely interlinked components with a modular structure: sensors, data transmission, and data storage, which are integrated into pipelines for automated biodiversity monitoring. We present our own real‐world examples of applications, share our experiences in operating them, and provide our collected open data. Our flexible, low‐cost, and open‐source solutions can be applied for monitoring individual and multiple terrestrial plants and animals as well as their interactions. Ultimately, our system can also be applied to area‐wide ecosystem mapping tasks, thereby providing an exemplary cost‐efficient and powerful solution for biodiversity monitoring. Building upon our experiences in the Nature 4.0 project, we identified ten key challenges that need to be addressed to better understand and counteract the ongoing loss of biodiversity using networked sensor systems. To tackle these challenges, interdisciplinary collaboration, additional research, and practical solutions are necessary to enhance the capability and applicability of networked sensor systems for researchers and practitioners, ultimately further helping to ensure the sustainable management of ecosystems and the provision of ecosystem services

    Nature 4.0: A networked sensor system for integrated biodiversity monitoring

    Get PDF
    Ecosystem functions and services are severely threatened by unprecedented global loss in biodiversity. To counteract these trends, it is essential to develop systems to monitor changes in biodiversity for planning, evaluating, and implementing conservation and mitigation actions. However, the implementation of monitoring systems suffers from a trade‐off between grain (i.e., the level of detail), extent (i.e., the number of study sites), and temporal repetition. Here, we present an applied and realized networked sensor system for integrated biodiversity monitoring in the Nature 4.0 project as a solution to these challenges, which considers plants and animals not only as targets of investigation, but also as parts of the modular sensor network by carrying sensors. Our networked sensor system consists of three main closely interlinked components with a modular structure: sensors, data transmission, and data storage, which are integrated into pipelines for automated biodiversity monitoring. We present our own real‐world examples of applications, share our experiences in operating them, and provide our collected open data. Our flexible, low‐cost, and open‐source solutions can be applied for monitoring individual and multiple terrestrial plants and animals as well as their interactions. Ultimately, our system can also be applied to area‐wide ecosystem mapping tasks, thereby providing an exemplary cost‐efficient and powerful solution for biodiversity monitoring. Building upon our experiences in the Nature 4.0 project, we identified ten key challenges that need to be addressed to better understand and counteract the ongoing loss of biodiversity using networked sensor systems. To tackle these challenges, interdisciplinary collaboration, additional research, and practical solutions are necessary to enhance the capability and applicability of networked sensor systems for researchers and practitioners, ultimately further helping to ensure the sustainable management of ecosystems and the provision of ecosystem services
    corecore