108 research outputs found

    A cost analysis of transcription systems

    Get PDF
    We compare different approaches to transcribing natural history data and summarise the advantages and disadvantages of each approach using six case studies from four different natural history collections. We summarise the main cost considerations when planning a transcription project and discuss the limitations we current have in understanding the costs behind transcription and data quality.Non peer reviewe

    D3.2 DiSSCo Digitisation Guides Website - Consolidating Knowledge on Collections Mobilisation

    Get PDF
    In order to support the digitisation activities of DiSSCo, we have considered how best to prepare collections for digitisation, digitise them, curate their associated data, publish those data, and measure the outputs of projects and programmes. We have examined options and approaches for different types and sizes of collections, when outsourcing should be considered, and what different project management approaches are most appropriate in this range of circumstances. This report describes the approach we have taken to developing an online community-edited manual, our guidelines, other relevant resources and platforms, and a set of recommendations on how to develop and this work to enhance future digitisation capacity across DiSSCo collectionholding organisations.info:eu-repo/semantics/publishedVersio

    Understanding the users and uses of UK Natural History Collections

    Get PDF
    UK natural science collections hold over 137 million items, an unrivalled source of data about 4.56 billion years of planetary development and hundreds of years of biological change, including the differences made by humans — but the scientific, commercial, and societal benefits of these collections are constrained by the limits of physical access, and by highly fragmented digitisation efforts with less than 10% digitally available. Following work with Frontier Economics in 2021, which showed potential for £2 billion in benefits to the UK economy from digitising all UK natural science collections, in 2022–23 the Natural History Museum London worked, with analytical support from McKinsey and Company, to understand the impact of what has already been digitised and shared by UK natural science collections — what is the demand for these data, what are they used for, and how does this deliver efficient, effective and impactful research?This study focuses on usage via the Global Biodiversity Information Facility, the largest source of relevant usage data, examining 7.6 million records from twelve UK institutions. While these UK collections data are just 0.3% of total GBIF occurrences, they are cited in 12% of peer reviewed publications citing GBIF data, showing the disproportionate impact of UK collections data and the historical, geographical, and taxonomic richness that they bring. Researchers have already benefited from more than £18 million of efficiency savings from digital UK specimen data. Data from natural science collections held in the UK are uniquely impactful resources, vital to a future in which people and planet thrive, and a step change in the pace of digitisation is needed to unlock their potential for researchers, policymakers, and society

    DiSSCo Prepare Project: Increasing the Implementation Readiness Levels of the European Research Infrastructure

    Get PDF
    The Distributed System of Scientific Collections (DiSSCo) is a new world-class Research Infrastructure (RI) for Natural Science Collections. The DiSSCo RI aims to create a new business model for one European collection that digitally unifies all European natural science assets under common access, curation, policies and practices that ensure that all the data is easily Findable, Accessible, Interoperable and Reusable (FAIR principles). DiSSCo represents the largest ever formal agreement between natural history museums, botanic gardens and collection-holding institutions in the world.DiSSCo entered the European Roadmap for Research Infrastructures in 2018 and launched its main preparatory phase project (DiSSCo Prepare) in 2020. DiSSCo Prepare is the primary vehicle through which DiSSCo reaches the overall maturity necessary for its construction and eventual operation. DiSSCo Prepare raises DiSSCo’s implementation readiness level (IRL) across the five dimensions: technical, scientific, data, organisational and financial. Each dimension of implementation readiness is separately addressed by specific Work Packages (WP) with distinct targets, actions and tasks that will deliver DiSSCo’s Construction Masterplan. This comprehensive and integrated Masterplan will be the product of the outputs of all of its content related tasks and will be the project’s final output. It will serve as the blueprint for construction of the DiSSCo RI, including establishing it as a legal entity.DiSSCo Prepare builds on the successful completion of DiSSCo’s design study, ICEDIG and the outcomes of other DiSSCo-linked projects such as SYNTHESYS+ and MOBILISE.This paper is an abridged version of the original DiSSCo Prepare grant proposal. It contains the overarching scientific case for DiSSCo Prepare, alongside a description of our major activities

    Towards a Roadmap for Advancing the Catalogue of the World’s Natural History Collections

    Get PDF
    Natural history collections are the foundations upon which all knowledge of natural history is constructed. Biological specimens are the best documentation of variation within each species, increasingly serve as curated sources for reference DNA, and are frequently our only evidence for historical species distribution. Collections represent an enormous multigenerational investment in research infrastructure for the biological sciences, but despite this importance most of the holdings of these institutions remain invisible on the Internet, inaccessible to taxonomists from other countries and hidden from computational biodiversity research.Although comprehensive digitisation of the complete holdings of each natural history collection is the long-term goal, this is an expensive and labor-intensive task and will not be completed in the near future for all collections. However, many benefits could quickly be achieved by publishing high-quality metadata on each collection to increase its visibility, provide the foundations for further digitisation and enable researchers to discover and communicate with collections of interest.This paper summarises the results from a consultation activity carried out in 2020 as part of the SYNTHESYS+ (Synthesys of Systematic Resources), “Developing implementation roadmaps for priority infrastructure areas as part of cooperative RI for biodiversity” project. This consultation was primed through an ideas paper, and introductory webinars and conducted as a facilitated two-week online multilingual discussion around 26 topics grouped under four broad headings (Users, Content, Technology and Governance). The results of these discussions are summarised here, along with the wider context of existing and planned initiatives

    Landscape Analysis for the Specimen Data Refinery

    Get PDF
    This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisation of natural history specimens

    Towards a scientific workflow featuring Natural Language Processing for the digitisation of natural history collections [Version 1]

    Get PDF
    We describe an effective approach to automated text digitisation with respect to natural history specimen labels. These labels contain much useful data about the specimen including its collector, country of origin, and collection date. Our approach to automatically extracting these data takes the form of a pipeline. Recommendations are made for the pipeline's component parts based on some of the state-of-the-art technologies.Optical Character Recognition (OCR) can be used to digitise text on images of specimens. However, recognising text quickly and accurately from these images can be a challenge for OCR. We show that OCR performance can be improved by prior segmentation of specimen images into their component parts. This ensures that only text-bearing labels are submitted for OCR processing as opposed to whole specimen images, which inevitably contain non-textual information that may lead to false positive readings. In our testing Tesseract OCR version 4.0.0 offers promising text recognition accuracy with segmented images.Not all the text on specimen labels is printed. Handwritten text varies much more and does not conform to standard shapes and sizes of individual characters, which poses an additional challenge for OCR. Recently, deep learning has allowed for significant advances in this area. Google's Cloud Vision, which is based on deep learning, is trained on large-scale datasets, and is shown to be quite adept at this task. This may take us some way towards negating the need for humans to routinely transcribe handwritten text.Determining the countries and collectors of specimens has been the goal of previous automated text digitisation research activities. Our approach also focuses on these two pieces of information. An area of Natural Language Processing (NLP) known as Named Entity Recognition (NER) has matured enough to semi-automate this task. Our experiments demonstrated that existing approaches can accurately recognise location and person names within the text extracted from segmented images via Tesseract version 4.0.0. Potentially, NER could be used in conjunction with other online services, such as those of the Biodiversity Heritage Library to map the named entities to entities in the biodiversity literature (https://www.biodiversitylibrary.org/docs/api3.html).We have highlighted the main recommendations for potential pipeline components. The document also provides guidance on selecting appropriate software solutions. These include automatic language identification, terminology extraction, and integrating all pipeline components into a scientific workflow to automate the overall digitisation process

    Management evaluation of metastasis in the brain (MEMBRAIN)—a United Kingdom and Ireland prospective, multicenter observational study

    Get PDF
    Background:In recent years an increasing number of patients with cerebral metastasis (CM) have been referred to the neuro-oncology multidisciplinary team (NMDT). Our aim was to obtain a national picture of CM referrals to assess referral volume and quality and factors affecting NMDT decision making. Methods:A prospective multicenter cohort study including all adult patients referred to NMDT with 1 or more CM was conducted. Data were collected in neurosurgical units from November 2017 to February 2018. Demographics, primary disease, KPS, imaging, and treatment recommendation were entered into an online database. Results:A total of 1048 patients were analyzed from 24 neurosurgical units. Median age was 65 years (range, 21-93 years) with a median number of 3 referrals (range, 1-17 referrals) per NMDT. The most common primary malignancies were lung (36.5%, n = 383), breast (18.4%, n = 193), and melanoma (12.0%, n = 126). A total of 51.6% (n = 541) of the referrals were for a solitary metastasis and resulted in specialist intervention being offered in 67.5% (n = 365) of cases. A total of 38.2% (n = 186) of patients being referred with multiple CMs were offered specialist treatment. NMDT decision making was associated with number of CMs, age, KPS, primary disease status, and extent of extracranial disease (univariate logistic regression, P < .001) as well as sentinel location and tumor histology (P < .05). A delay in reaching an NMDT decision was identified in 18.6% (n = 195) of cases. Conclusions:This study demonstrates a changing landscape of metastasis management in the United Kingdom and Ireland, including a trend away from adjuvant whole-brain radiotherapy and specialist intervention being offered to a significant proportion of patients with multiple CMs. Poor quality or incomplete referrals cause delay in NMDT decision making

    Inselect: Automating the Digitization of Natural History Collections

    Get PDF
    Copyright: © 2015 Hudson et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. The attached file is the published version of the article
    corecore