14 research outputs found

    A benchmark dataset of herbarium specimen images with label data

    Get PDF
    More and more herbaria are digitising their collections. Images of specimens are made available online to facilitate access to them and allow extraction of information from them. Transcription of the data written on specimens is critical for general discoverability and enables incorporation into large aggregated research datasets. Different methods, such as crowdsourcing and artificial intelligence, are being developed to optimise transcription, but herbarium specimens pose difficulties in data extraction for many reasons. To provide developers of transcription methods with a means of optimisation, we have compiled a benchmark dataset of 1,800 herbarium specimen images with corresponding transcribed data. These images originate from nine different collections and include specimens that reflect the multiple potential obstacles that transcription methods may encounter, such as differences in language, text format (printed or handwritten), specimen age and nomenclatural type status. We are making these specimens available with a Creative Commons Zero licence waiver and with permanent online storage of the data. By doing this, we are minimising the obstacles to the use of these images for transcription training. This benchmark dataset of images may also be used where a defined and documented set of herbarium specimens is needed, such as for the extraction of morphological traits, handwriting recognition and colour analysis of specimens

    Conceptual design blueprint for the DiSSCo digitization infrastructure - DELIVERABLE D8.1

    Get PDF
    DiSSCo, the Distributed System of Scientific Collections, is a pan-European Research Infrastructure (RI) mobilising, unifying bio- and geo-diversity information connected to the specimens held in natural science collections and delivering it to scientific communities and beyond. Bringing together 120 institutions across 21 countries and combining earlier investments in data interoperability practices with technological advancements in digitisation, cloud services and semantic linking, DiSSCo makes the data from natural science collections available as one virtual data cloud, connected with data emerging from new techniques and not already linked to specimens. These new data include DNA barcodes, whole genome sequences, proteomics and metabolomics data, chemical data, trait data, and imaging data (Computer-assisted Tomography (CT), Synchrotron, etc.), to name but a few; and will lead to a wide range of end-user services that begins with finding, accessing, using and improving data. DiSSCo will deliver the diagnostic information required for novel approaches and new services that will transform the landscape of what is possible in ways that are hard to imagine today. With approximately 1.5 billion objects to be digitised, bringing natural science collections to the information age is expected to result in many tens of petabytes of new data over the next decades, used on average by 5,000 – 15,000 unique users every day. This requires new skills, clear policies and robust procedures and new technologies to create, work with and manage large digital datasets over their entire research data lifecycle, including their long-term storage and preservation and open access. Such processes and procedures must match and be derived from the latest thinking in open science and data management, realising the core principles of 'findable, accessible, interoperable and reusable' (FAIR). Synthesised from results of the ICEDIG project ('Innovation and Consolidation for Large Scale Digitisation of Natural Heritage', EU Horizon 2020 grant agreement No. 777483) the DiSSCo Conceptual Design Blueprint covers the organisational arrangements, processes and practices, the architecture, tools and technologies, culture, skills and capacity building and governance and business model proposals for constructing the digitisation infrastructure of DiSSCo. In this context, the digitisation infrastructure of DiSSCo must be interpreted as that infrastructure (machinery, processing, procedures, personnel, organisation) offering Europe-wide capabilities for mass digitisation and digitisation-on-demand, and for the subsequent management (i.e., curation, publication, processing) and use of the resulting data. The blueprint constitutes the essential background needed to continue work to raise the overall maturity of the DiSSCo Programme across multiple dimensions (organisational, technical, scientific, data, financial) to achieve readiness to begin construction. Today, collection digitisation efforts have reached most collection-holding institutions across Europe. Much of the leadership and many of the people involved in digitisation and working with digital collections wish to take steps forward and expand the efforts to benefit further from the already noticeable positive effects. The collective results of examining technical, financial, policy and governance aspects show the way forward to operating a large distributed initiative i.e., the Distributed System of Scientific Collections (DiSSCo) for natural science collections across Europe. Ample examples, opportunities and need for innovation and consolidation for large scale digitisation of natural heritage have been described. The blueprint makes one hundred and four (104) recommendations to be considered by other elements of the DiSSCo Programme of linked projects (i.e., SYNTHESYS+, COST MOBILISE, DiSSCo Prepare, and others to follow) and the DiSSCo Programme leadership as the journey towards organisational, technical, scientific, data and financial readiness continues. Nevertheless, significant obstacles must be overcome as a matter of priority if DiSSCo is to move beyond its Design and Preparatory Phases during 2024. Specifically, these include: Organisational: Strengthen common purpose by adopting a common framework for policy harmonisation and capacity enhancement across broad areas, especially in respect of digitisation strategy and prioritisation, digitisation processes and techniques, data and digital media publication and open access, protection of and access to sensitive data, and administration of access and benefit sharing. Pursue the joint ventures and other relationships necessary to the successful delivery of the DiSSCo mission, especially ventures with GBIF and other international and regional digitisation and data aggregation organisations, in the context of infrastructure policy frameworks, such as EOSC. Proceed with the explicit aim of avoiding divergences of approach in global natural science collections data management and research. Technical: Adopt and enhance the DiSSCo Digital Specimen Architecture and, specifically as a matter of urgency, establish the persistent identifier scheme to be used by DiSSCo and (ideally) other comparable regional initiatives. Establish (software) engineering development and (infrastructure) operations team and direction essential to the delivery of services and functionalities expected from DiSSCo such that earnest engineering can lead to an early start of DiSSCo operations. Scientific: Establish a common digital research agenda leveraging Digital (extended) Specimens as anchoring points for all specimen-associated and -derived information, demonstrating to research institutions and policy/decision-makers the new possibilities, opportunities and value of participating in the DiSSCo research infrastructure. Data: Adopt the FAIR Digital Object Framework and the International Image Interoperability Framework as the low entropy means to achieving uniform access to rich data (image and non-image) that is findable, accessible, interoperable and reusable (FAIR). Develop and promote best practice approaches towards achieving the best digitisation results in terms of quality (best, according to agreed minimum information and other specifications), time (highest throughput, fast), and cost (lowest, minimal per specimen). Financial Broaden attractiveness (i.e., improve bankability) of DiSSCo as an infrastructure to invest in. Plan for finding ways to bridge the funding gap to avoid disruptions in the critical funding path that risks interrupting core operations; especially when the gap opens between the end of preparations and beginning of implementation due to unsolved political difficulties. Strategically, it is vital to balance the multiple factors addressed by the blueprint against one another to achieve the desired goals of the DiSSCo programme. Decisions cannot be taken on one aspect alone without considering other aspects, and here the various governance structures of DiSSCo (General Assembly, advisory boards, and stakeholder forums) play a critical role over the coming years

    Paide ümbruse jõgede kallaste sookärblased (Diptera, Ephydridae)

    Get PDF
    https://www.ester.ee/record=b5467831*es

    Võsaritsika Pholidoptera griseoaptera lauluelementide muutlikkus

    Get PDF
    Teadusmagistritö

    "My naturesound" - nature observations with sound recordings

    Get PDF
    Online systems for observation reporting by citizen scientists have been operating for many years. iNaturalist (California Academy of Sciences 2016), eBird (Cornell Lab of Ornithology 2016) and Observado (Observation International 2016) are well-known international systems, Artportalen (Swedish Species Information Centre 2016) and Artsobservasjoner (Norwegian Biodiversity Information Centre 2016) are Scandinavian. In addition, databases and online solutions exist that are more directly research-oriented but still offer participation by citizen scientists, such as the PlutoF (University of Tartu Natural History Museum 2016) platform. The University of Tartu Natural History Museum maintains the PlutoF platform (Abarenkov et al. 2010) for storing and managing biodiversity data, including taxon observations. In 2014, development was started to integrate an observation app "Minu loodusheli"/"My naturesound" (University of Tartu Natural History Museum 2017b) (My naturesound, Fig. 1) within PlutoF system. In 2017, an English language version of the app (University of Tartu Natural History Museum 2017c) was launched that includes nearly all major sound-producing taxon groups in its taxonomy. The application also acts as a practical tool for collecting and publishing occurrence data for the Global Biodiversity Information Facility (Global Biodiversity Information Facility 2017) in standardized Darwin Core format together with download links to the multimedia files. Although the sound recording ability of mobile phones opens new opportunities to validate taxon occurrences, current technological solutions limit the use of recordings in biodiversity research. The "My naturesound" allows the user to record taxon occurrences and to provide audio recordings as evidence. After installing the application, the user is promted to login with PlutoF system credentials or to register with PlutoF. The application is targeted primarely to citizen scientists, but researchers themselves can also use it as a tool for easy annotation of taxon occurrences. The dataset consists observation data of birds, amphibians and insects by citizen scientists with on site audio recordings. The dataset gives the possibility to analyze the suitablility of mobile devices for recording animal vocalizations and their use in reporting

    Digitisation of private collections

    No full text
    Results are presented of a study investigating solutions and procedures to incorporate private natural history collections into the international collections data infrastructure. Results are based on pilot projects carried out in three European countries aimed at approaches on how to best motivate and equip citizen collectors for digitisation:1) In Estonia, the approach was to outline tools for registering, digitising and publishing private collection data in the biodiversity data management system PlutoF.2) In Finland, the functionality of FinBIF, a portal offering a popular Notebook Service for citizens to store observations has been expanded to include collection specimens related to a field gathering event.3) In the Netherlands private collection owners were approached directly and asked to start digitising their collection using dedicated software, either by themselves or with the help of volunteers who were recruited specifically for this task.In addition to management tools, pilots also looked at motivation, persons undertaking the work, scope, planning, specific knowledge or skills required and the platform for online publication. Future ownership, legality of specimens residing in private collections and the use of unique identifiers are underexposed aspects effecting digitisation. Besides streamlining the overall process of digitising private collections and dealing with local, national or international challenges, developing a communication strategy is crucial in order to effectively distribute information and keep private collection owners aware of ongoing developments.Besides collection owners other stakeholders were identified and for each of them a roadmap is outlined aimed at further streamlining the data from private collections into the international infrastructure.In conclusion recommendations are presented based on challenges encountered during this task that are considered important to really make significant progress towards the overall accessibility of data stored in privately held natural history collections

    PlutoF: Biodiversity data management platform for the complete data lifecycle

    No full text
    PlutoF online platform (https://plutof.ut.ee) is built for the management of biodiversity data. The concept is to provide a common workbench where the full data lifecycle can be managed and support seamless data sharing between single users, workgroups and institutions. Today, large and sophisticated biodiversity datasets are increasingly developed and managed by international workgroups. PlutoF's ambition is to serve such collaborative projects as well as to provide data management services to single users, museum or private collections and research institutions. Data management in PlutoF follows a logical order of the data lifecycle Fig. 1. At first, project metadata is uploaded including the project description, data management plan, participants, sampling areas, etc. Data upload and management activities then follow which is often linked to the internal data sharing. Some data analyses can be performed directly in the workbench or data can be exported in standard formats. PlutoF includes also data publishing module. Users can publish their data, generating a citable DOI without datasets leaving PlutoF workbench. PlutoF is part of the DataCite collaboration (https://datacite.org) and so far released more than 600 000 DOIs. Another option is to publish observation or collection datasets via the GBIF (Global Biodiversity Information Facility) portal. A. new feature implemented in 2019 allows users to publish High Throughput Sequencing data as taxon occurrences in GBIF. There is an additional option to send specific datasets directly to the Pensoft online journals. Ultimately, PlutoF works as a data archive which completes the data life cycle. In PlutoF users can manage different data types. Most common types include specimen and living specimen data, nucleotide sequences, human observations, material samples, taxonomic backbones and ecological data. Another important feature is that these data types can be managed as a single datasets or projects. PlutoF follows several biodiversity standards. Examples include Darwin Core, GGBN (Global Genome Biodiversity Network), EML (Ecological Metadata Language), MCL (Microbiological Common Language), and MIxS (Minimum Information about any (x) Sequence)

    Future Challenges in Digitisation of Private Natural History Collections

    No full text
    Specimens held in private natural history collections form an essential, but often neglected part of the specimens held worldwide in natural history collections. When engaging in regional, national or international initiatives aimed at increasing the accessibility of biodiversity data, it is paramount to include private collections as much and as often as possible. Compared to larger collections in national history institutions, private collections present a unique set of challenges: they are numerous, anonymous, small and diverse in all aspects of collection management. In ICEDIG, a design study for DiSSCo these challenges were tackled in task 2 "Inventory of content and incentives for digitisation of small and private collections" under Workpackage 2 "Inventory of current criteria for prioritization of digitization". First, we need to understand the current state and content of private collections within Europe, to identify and tackle challenges more effectively. While some private collections will duplicate material already held in public collections, many are likely to fill more specialised or unusual niches, relevant to the particular collector(s). At present, there is little evidence about the content of private collections and this needs to be explored. In 2018, a European survey was carried out amongst private collection owners to gain more insight in the volume, scope and degree of digitisation of these collections. Based on this survey, all of the respondents’ collections combined are estimated to contain between 9 and 33 million specimens. This is only the tip of the iceberg for private collections in Europe and underlines the importance of these private collections. Digitisation and sharing collection data are activities that are overall considered important among private collection owners. The survey also showed that for those who have not yet started digitising their collection, the provision of tools and information would be most valuable. These and other highlights of the survey will be presented. In addition, protocols for inventories of private collections will be discussed, as well as ways to keep these up to date. To enhance the inclusion of private collections in Europe’s digitisation efforts, we recognise that we mainly have to focus on the challenges regarding the ‘how’ (work-process), and the sharing of information residing in private collections (including ownership, legal issues, sensitive data). Where necessary, we will also draw attention to the ‘why’ (motivation) of digitisation. A communication strategy aimed at raising awareness about digitisation, offering insight in the practicalities to implement digitisation as well as providing answers to issues related to sharing information, is an essential tool. Elements of a communication strategy to further engage private collection owners will be presented, as will conclusions and recommendations. Finally, digitisation and communication aspects related to private collection owners will need to be tested within the community. Therefore, a pilot project is currently (2018-2019) being carried out in Estonia, Finland and the Netherlands to digitise private collections in a variety of settings. Preliminary results will be presented, zooming in on different approaches to include data from private collections in the overall (research) infrastructures

    Data sharing tools adopted by the European Biodiversity Observation Network Project

    Get PDF
    A fundamental constituent of a biodiversity observation network is the technological infrastructure that underpins it. The European Biodiversity Network project (EU BON) has been working with and improving upon pre-existing tools for data mobilization, sharing and description. This paper provides conceptual and practical advice for the use of these tools. We review tools for managing metadata, occurrence data, and ecological data and give detailed description of these tools, their capabilities and limitations. This is followed by recommendations on their deployment and possible future enhancements. This is done from the perspective of the needs of the biodiversity observation community with a view to the development of a unified user interface to these data – the European Biodiversity Portal (EBP). We described the steps taken to develop, adapt, deploy and test these tools. This document also gives an overview of the objectives that still need to be achieved and challenges to be addressed for the remainder of the project
    corecore