14 research outputs found

    Landscape Analysis for the Specimen Data Refinery

    Get PDF
    This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisation of natural history specimens

    D3.2 DiSSCo Digitisation Guides Website - Consolidating Knowledge on Collections Mobilisation

    Get PDF
    In order to support the digitisation activities of DiSSCo, we have considered how best to prepare collections for digitisation, digitise them, curate their associated data, publish those data, and measure the outputs of projects and programmes. We have examined options and approaches for different types and sizes of collections, when outsourcing should be considered, and what different project management approaches are most appropriate in this range of circumstances. This report describes the approach we have taken to developing an online community-edited manual, our guidelines, other relevant resources and platforms, and a set of recommendations on how to develop and this work to enhance future digitisation capacity across DiSSCo collectionholding organisations.info:eu-repo/semantics/publishedVersio

    Utilising the Crowd to Unlock the Data on Herbarium Specimens at the Royal Botanic Garden Edinburgh

    No full text
    Digitisation of specimens at the Royal Botanic Garden Edinburgh (RBGE) has created nearly half a million imaged specimens. With data entry from the specimen labels on herbarium sheets identified as the rate-limiting step in the digitisation workflow, the majority of specimens are databased with minimal data (filing name and geographical region), leaving a need to add further label data (collector, collecting locality, collection date etc.) to make the specimens research ready. We are exploring a number of different ways to complete data entry for specimens that have been imaged. These have included Optical Character Recognition (OCR), to identify meaningful specimen groupings to increase the speed of data entry and more recently citizen science platforms to provide accurate crowd-sourced transcriptions of specimen label data. We sent specimen images of the Australian flowering plants held at RBGE herbarium to DigiVol (https://volunteer.ala.org.au/institution/index/21309224), the citizen science platform developed alongside The Atlas of Living Australia. In 29 expeditions, 156 citizen scientists completed collection label data entry for RBGE’s 41,000 specimens of Australian flowering plants. We found that 95% of the transcriptions were completed by less than a third (27%) of the volunteers. Of the four volunteer experience levels in DigiVol we found that the middle two, Collection Managers and Scientists, transcribed fewer specimens, but also made fewer mistakes. We found that by removing the filing name from the information provided with the expedition the number of errors in the Museum Details section of the transcription decreased, as the filing name was often added as the label name, regardless of whether this is the case. The feedback we provided for each expedition was used to highlight common errors to try and reduce their occurrence as well as to inform the volunteers of what their transcriptions had revealed about this part of the collection. We explore the citizen science transcription workflow, its rate-limiting steps and how we have worked to include the citizen science and OCR data on our online herbarium catalogue

    The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels

    Get PDF
    At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a herbarium specimen digitisation process with two stages of data entry. Records were initially batch-processed to add data extracted from the OCR text prior to being sorted based on Collector and/or Country. Using images of the specimens, a team of six digitisers then added data to the specimen records. To investigate whether the data from OCR aid the digitisation process, they completed a series of trials which compared the efficiency of data entry between sorted and unsorted batches of specimens. A survey was carried out to explore the opinion of the digitisation staff to the different sorting options. In total 7,200 specimens were processed.When compared to an unsorted, random set of specimens, those which were sorted based on data added from the OCR were quicker to digitise. Of the methods tested here, the most successful in terms of efficiency used a protocol which required entering data into a limited set of fields and where the records were filtered by Collector and Country. The survey and subsequent discussions with the digitisation staff highlighted their preference for working with sorted specimens, in which label layout, locations and handwriting are likely to be similar, and so a familiarity with the Collector or Country is rapidly established

    Digitisation at Three UK Herbaria Contributes Towards Food Security and Sustainable Timber Use

    No full text
    The digitisation of herbarium collections has shown to provide a growing resource in conservation science. Mobilising the data on portals such as GBIF allows researchers to access key taxonomic, habitat and geographical data that would otherwise be unavailable unless institutions are physically visited. These data are used notably in conservation assessments, distribution studies and publication of new species (Canteiro et al. 2019). The herbarium specimens held in Royal Botanic Gardens Kew, Natural History Museum, London, and the Royal Botanic Garden Edinburgh are an unparalleled resource, estimated to hold representatives of around 85% of known plant species. By working collectively for the first time on a non-type material digitisation project, the three institutions collaborated to generate data for the subtribe Phaseolinae and rosewoods totalling 37,000 legume specimens. This pilot project was made possible through Department for Environment Food & Rural Affairs (DEFRA)-allocated, Official Development Assistance (ODA) funding. This aid money is distributed by the UK government in its “global efforts to defeat poverty, tackle instability and create prosperity in developing countries”. This project focused on two case-studies: Study i. Supporting development of dry beans as a sustainable and resilient crop. Beans from the subtribe Phaseolinae, including cowpeas, lablab and wild beans, are extremely tolerant of poor-quality soils and drought. As a consequence they are particularly suitable for the low-input agricultural production systems. An estimated 14.5 million hectares of land is used for planting of cowpea each year with around 80% of that in Development Assistance Committee countries in sub Saharan Africa. Study ii. Aiding conservation and sustainable use of rosewoods and padauk (Dalbergia L.f. and Pterocarpus Jacq.). Dalbergia is distributed throughout tropical Asia, Africa and the Americas with many species being regionally endemic. Species also vary in habit from shrubs and trees to robust lianas. Pterocarpus is also pantropically distributed in a wide variety of habitats. However, suitable habitat across the natural range of these genera is now limited for many species due to a range of threats, namely deforestation, forest conversion for agriculture/human development, and logging. The timber from many species of Dalbergia and Pterocarpus has long been prized for its high-quality wood used for construction, fine furniture, cabinet work, marquetry and inlay, ethnic carvings, pianos, guitars and other musical instruments. All Dalbergia and most of the timber species of Pterocarpus are now listed on the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES)  Appendix II and the Brazilian D. nigra is listed on Appendix I. There is a huge illegal trade in these genera and serial depletion across the globe is a real and substantial risk to their survival (Winfield et al. 2016). This project used novel high-throughput methodologies and acted as a pilot study for future collaborative mass digitisation efforts. Specimens were taken from the collections, barcoded and minimal data fields captured, before high resolution images were created and the specimens returned. A subset of these was further subjected to full or partial label transcription via the use of the Atlas of Living Australia's DigiVol crowdsourcing platform or via in-house data capture. The resulting datasets will be made available via GBIF and partner sites and will be used to perform gap analyses on the collections across the institutions. We will examine the benefits of combined institutional data for these groups, assess how many species are represented in total and the geographic coverage of these collections. Use of the data will be measured by the number of downloads from GBIF and observing in-house use cases. Two research projects have just begun within Kew, using the data gathered for Pterocarpus and Lablab Adans., georeferencing for which is already underway and will contribute to conservation assessments and other measurable outputs. A data paper is planned which will also assist with tracking future use of the data set and help demonstrate the impact of the digitisation

    State of Digitisation and Gap Analysis Surveys

    No full text
    Recent developments in digitisation technologies and equipment have enabled advances in the rate of natural history specimen digitisation. However Europe’s Natural History Collection Institutions are home to over one billion specimens and currently only a small fraction of these have been digitally catalogued with fewer imaged. It is clear that institutions still face huge challenges when digitising the vast number of specimens in their collections. I will present the results of two surveys that aimed to discover the main successes and challenges facing institutions in their digitisation programmes. The first survey was undertaken in 2014 within the SYNTHESYS 3 project and gathered information from project partners on their current digitisation facilities, equipment and workflows providing some key recommendations based on these findings. The second survey was completed more recently in 2017, through the Consortium of European Taxonomic Facilities (CETAF) Digitisation Working Group. This survey aimed to discover the successful protocols and implementation of digitisation, and to identify the shortfalls in resources and protocols. Results from both surveys will be fed into the future programme of the CETAF Digitisation Working Group as well as forthcoming and proposed EU projects, including Innovation and Consolidation for large-scale Digitisation of natural heritage (ICEDIG)
    corecore