3,075 research outputs found
GinJinn: An objectâdetection pipeline for automated feature extraction from herbarium specimens
Premise
The generation of morphological data in evolutionary, taxonomic, and ecological studies of plants using herbarium material has traditionally been a laborâintensive task. Recent progress in machine learning using deep artificial neural networks (deep learning) for image classification and object detection has facilitated the establishment of a pipeline for the automatic recognition and extraction of relevant structures in images of herbarium specimens.
Methods and Results
We implemented an extendable pipeline based on stateâofâtheâart deepâlearning objectâdetection methods to collect leaf images from herbarium specimens of two species of the genus Leucanthemum . Using 183 specimens as the training data set, our pipeline extracted one or more intact leaves in 95% of the 61 test images.
Conclusions
We establish GinJinn as a deepâlearning objectâdetection tool for the automatic recognition and extraction of individual leaves or other structures from herbarium specimens. Our pipeline offers greater flexibility and a lower entrance barrier than previous imageâprocessing approaches based on handâcrafted features
Extraction and parsing of herbarium specimen data: Exploring the use of the Dublin core application profile framework
Herbaria around the world house millions of plant specimens; botanists and other researchers value these resources as ingredients in biodiversity research. Even when the specimen sheets are digitized and made available online, the critical information about the specimen stored on the sheet are not in a usable (i.e., machine-processible) form. This paper describes a current research and development project that is designing and testing high-throughput workflows that combine machine- and human-processes to extract and parse the specimen label data. The primary focus of the paper is the metadata needs for the workflow and the creation of the structured metadata records describing the plant specimen. In the project, we are exploring the use of the new Dublin Core Metadata Initiative framework for application profiles. First articulated as the Singapore Framework for Dublin Core Application Profiles in 2007, the use of this framework is in its infancy. The promises of this framework for maximum interoperability and for documenting the use of metadata for maximum reusability, and for supporting metadata applications that are in conformance with Web architectural principles provide the incentive to explore and add implementation experience regarding this new framework
Specimens as research objects: reconciliation across distributed repositories to enable metadata propagation
Botanical specimens are shared as long-term consultable research objects in a
global network of specimen repositories. Multiple specimens are generated from
a shared field collection event; generated specimens are then managed
individually in separate repositories and independently augmented with research
and management metadata which could be propagated to their duplicate peers.
Establishing a data-derived network for metadata propagation will enable the
reconciliation of closely related specimens which are currently dispersed,
unconnected and managed independently. Following a data mining exercise applied
to an aggregated dataset of 19,827,998 specimen records from 292 separate
specimen repositories, 36% or 7,102,710 specimens are assessed to participate
in duplication relationships, allowing the propagation of metadata among the
participants in these relationships, totalling: 93,044 type citations,
1,121,865 georeferences, 1,097,168 images and 2,191,179 scientific name
determinations. The results enable the creation of networks to identify which
repositories could work in collaboration. Some classes of annotation
(particularly those regarding scientific name determinations) represent units
of scientific work: appropriate management of this data would allow the
accumulation of scholarly credit to individual researchers: potential further
work in this area is discussed.Comment: 9 pages, 1 table, 3 figure
High-Throughput Workflow for Computer-Assisted Human Parsing of Biological Specimen Label Data
4th International Conference on Open RepositoriesThis presentation was part of the session : Conference PostersHundreds of thousands of specimens in herbaria and natural history museums worldwide are potential candidates for digitization, making them more accessible to researchers. An herbarium contains collections of preserved plant specimens created for scientific use. Herbarium specimens are ideal natural history objects for digitization, as the plants are pressed flat and dried, and mounted on individual sheets of paper, creating a nearly two-dimensional object. Building digital repositories of herbarium specimens can increase use and exposure of the collections while simultaneously reducing physical handling. As important as the digitized specimens are, the data contained on the associated specimen labels provide critical information about each specimen (e.g., scientific name, geographic location of specimen, etc.). The volume and heterogeneity of these printed label data present challenges in transforming them into meaningful digital form to support research. The Apiary Project is addressing these challenges by exploring and developing transformation processes in a systematic workflow that yields high-quality machine-processable label data in a cost- and time-efficient manner. The University of North Texas's Texas Center for Digital Knowledge (TxCDK) and the Botanical Research Institute of Texas (BRIT), with funding from an Institute of Museum and Library Services National Leadership Grant, are conducting fundamental research with the goal of identifying how human intelligence can be combined with machine processes for effective and efficient transformation of specimen label information. The results of this research will yield a new workflow model for effective and efficient label data transformation, correction, and enhancement.Institute of Museum and Library Services, National Leadership Gran
Digitization workflows for flat sheets and packets of plants, algae, and fungi
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/141708/1/aps31500065.pd
- âŚ