9 research outputs found

    The UNITE database for molecular identification and taxonomic communication of fungi and other eukaryotes : sequences, taxa and classifications reconsidered

    Get PDF
    Acknowledgements We acknowledge Marie Zirk for her work in designing the UNITE logotype and creating the visual abstract for this article. Funding UNITE database development is financed by the Estonian Research Council [PRG1170]; European Union's Horizon 2020 project BGE [101059492]. The PlutoF digital infrastructure is supported by the European Union's Horizon 2020 project BiCIKL [101007492]; Estonian Research Infrastructure roadmap project DiSSCo Estonia. Funding for open access charge: UNITE Community. Conflict of interest statement. None declared.Peer reviewedPublisher PD

    "My naturesound" - nature observations with sound recordings

    Get PDF
    Online systems for observation reporting by citizen scientists have been operating for many years. iNaturalist (California Academy of Sciences 2016), eBird (Cornell Lab of Ornithology 2016) and Observado (Observation International 2016) are well-known international systems, Artportalen (Swedish Species Information Centre 2016) and Artsobservasjoner (Norwegian Biodiversity Information Centre 2016) are Scandinavian. In addition, databases and online solutions exist that are more directly research-oriented but still offer participation by citizen scientists, such as the PlutoF (University of Tartu Natural History Museum 2016) platform. The University of Tartu Natural History Museum maintains the PlutoF platform (Abarenkov et al. 2010) for storing and managing biodiversity data, including taxon observations. In 2014, development was started to integrate an observation app "Minu loodusheli"/"My naturesound" (University of Tartu Natural History Museum 2017b) (My naturesound, Fig. 1) within PlutoF system. In 2017, an English language version of the app (University of Tartu Natural History Museum 2017c) was launched that includes nearly all major sound-producing taxon groups in its taxonomy. The application also acts as a practical tool for collecting and publishing occurrence data for the Global Biodiversity Information Facility (Global Biodiversity Information Facility 2017) in standardized Darwin Core format together with download links to the multimedia files. Although the sound recording ability of mobile phones opens new opportunities to validate taxon occurrences, current technological solutions limit the use of recordings in biodiversity research. The "My naturesound" allows the user to record taxon occurrences and to provide audio recordings as evidence. After installing the application, the user is promted to login with PlutoF system credentials or to register with PlutoF. The application is targeted primarely to citizen scientists, but researchers themselves can also use it as a tool for easy annotation of taxon occurrences. The dataset consists observation data of birds, amphibians and insects by citizen scientists with on site audio recordings. The dataset gives the possibility to analyze the suitablility of mobile devices for recording animal vocalizations and their use in reporting

    A price tag on species

    No full text
    Species have intrinsic value but also partake in a long range of ecosystem services of major economic value to humans. These values have proved hard to quantify precisely, making it all too easy to dismiss them altogether. We outline the concept of the species stock market (SSM), a system to provide a unified basis for valuation of all living species. The SSM amalgamates digitized information from natural history collections, occurrence data, and molecular sequence databases to quantify our knowledge of each species from scientific, societal, and economic points of view. The conceptual trading system will necessarily be very unlike that of the regular stock market, but the looming biodiversity crisis implores us to finally put an open and transparent price tag on symbiosis, deforestation, and pollutio

    PlutoF: Biodiversity data management platform for the complete data lifecycle

    No full text
    PlutoF online platform (https://plutof.ut.ee) is built for the management of biodiversity data. The concept is to provide a common workbench where the full data lifecycle can be managed and support seamless data sharing between single users, workgroups and institutions. Today, large and sophisticated biodiversity datasets are increasingly developed and managed by international workgroups. PlutoF's ambition is to serve such collaborative projects as well as to provide data management services to single users, museum or private collections and research institutions. Data management in PlutoF follows a logical order of the data lifecycle Fig. 1. At first, project metadata is uploaded including the project description, data management plan, participants, sampling areas, etc. Data upload and management activities then follow which is often linked to the internal data sharing. Some data analyses can be performed directly in the workbench or data can be exported in standard formats. PlutoF includes also data publishing module. Users can publish their data, generating a citable DOI without datasets leaving PlutoF workbench. PlutoF is part of the DataCite collaboration (https://datacite.org) and so far released more than 600 000 DOIs. Another option is to publish observation or collection datasets via the GBIF (Global Biodiversity Information Facility) portal. A. new feature implemented in 2019 allows users to publish High Throughput Sequencing data as taxon occurrences in GBIF. There is an additional option to send specific datasets directly to the Pensoft online journals. Ultimately, PlutoF works as a data archive which completes the data life cycle. In PlutoF users can manage different data types. Most common types include specimen and living specimen data, nucleotide sequences, human observations, material samples, taxonomic backbones and ecological data. Another important feature is that these data types can be managed as a single datasets or projects. PlutoF follows several biodiversity standards. Examples include Darwin Core, GGBN (Global Genome Biodiversity Network), EML (Ecological Metadata Language), MCL (Microbiological Common Language), and MIxS (Minimum Information about any (x) Sequence)

    Third-party Annotations: Linking PlutoF platform and the ELIXIR Contextual Data ClearingHouse for the reporting of source material annotation gaps and inaccuracies

    No full text
    Third-party annotations are a valuable resource to improve the quality of public DNA sequences. For example, sequences in International Nucleotide Sequence Databases Collaboration (INSDC) often lack important features like taxon interactions, species level identification, information associated with habitat, locality, country, coordinates, etc. Therefore, initiatives to mine additional information from publications and link to the public DNA sequences have become common practice (e.g. Tedersoo et al. 2011, Nilsson et al. 2014, Groom et al. 2021). However, third-party annotations have their own specific challenges. For example, annotations can be inaccurate and therefore must be open for permanent data management. Further, every DNA sequence (except sequences from type material) can carry different species names, which must be databased as equal scientific hypotheses. PlutoF platform provides such data management services for third-party annotations.PlutoF is an online data management platform and computing service provider for biology and related disciplines. Registered users can enter and manage a wide range of data, e.g., taxon occurrences, metabarcoding data, taxon classifications, traits, and lab data. It also features an annotation module where third-party annotations (on material source, geolocation and habitat, taxonomic identifications, interacting taxa, etc.) can be added to any collection specimen, living culture or DNA sequence record. The UNITE Community is using these services to annotate and improve the quality of INSDC rDNA Internal Transcribed Spacer (ITS) sequence datasets. The National Center for Biotechnology Information (NCBI) is linking its ITS sequences with their annotations in PlutoF. However, there is still missing an automated solution for linking annotations in PlutoF with any sequence and sample record stored in INSDC databases. One of the ambitions of the BiCIKL Project is to solve this through operating the ELIXIR Contextual Data ClearingHouse (CDCH). CDCH offers a light and simple RESTful Application Programming Interface (API) to enable extension, correction and improvement of publicly available annotations on sample and sequence records available in ELIXIR data resources. It facilitates feeding improved or corrected annotations from databases (such as secondary databases, e.g., PlutoF, which consume and curate data from repositories) back to primary repositories (databases of the three INSDC collaborative partners).In the Biodiversity Community Integrated Knowledge Library (BiCIKL) Project, the University of Tartu Natural History Museum is leading the task of linking the two components—the web interface provided by the PlutoF platform and CDCH APIs—to allow user-friendly and effortless reporting of errors and gaps in sequenced material source annotations. The API and web interface will be promoted to those communities (such as taxonomists, those abstracting from the literature, and those already using the community curated data) with the appropriate knowledge and tools who will be encouraged to report their enhanced annotations back to primary repositories

    Deliverable D8.3 Web interface for ELIXIR Contextual Data ClearingHouse

    No full text
    This deliverable report includes description of the work steps towards building a web interface for the reporting of errors and gaps in sequenced material source annotations as part of the Task 8.3 of BiCIKL. Beta version of the web interface has been published and is available for the registered users of PlutoF platform

    Enabling Community Curation of Biological Source Annotations of Molecular Data Through PlutoF and the ELIXIR Contextual Data Clearinghouse

    No full text
    The advancements in sequencing technologies have greatly contributed to the documentation of Earth’s biodiversity. However, for exploring the full potential of molecular resources for biodiversity, there needs to be a good linkage between sequence data and its biological source, contributing to a network of connected data in the biodiversity research cycle. This requires a foundation of well-structured and accessible annotations in the molecular sequence repositories.The International Nucleotide Sequence Database Collaboration (INSDC), of which the European Nucleotide Archive (ENA) is its European node, holds a large amount of annotations associated with sequence data, relating to its biological source (e.g., specimens in natural history collections). However, for a number of records, these annotations may be incomplete (e.g., missing voucher information), ambiguous or even inaccurate.Therefore, we have implemented a workflow that allows third-party annotations to be attached to sequence and sample records using two existing services, the PlutoF platform and the ELIXIR Contextual Data ClearingHouse. This work was developed within the scope of the BiCIKL (Biodiversity Community Integrated Knowledge Library) project, which aims to establish open science practices in the biodiversity domain.PlutoF is an online data management platform that also provides computing services for biology-related research. PlutoF features allow registered users to enter their own data and access public data at INSDC. Users can enter and manage a range of data, as taxonomic classifications, occurrences, etc. This platform also includes a module that allows the addition of third-party annotations (on material source, taxonomic identification, etc.) linked to specimens or sequence records. This module was already in use by the UNITE community for annotation of INSDC rDNA Internal Transcribed Spacer sequence datasets (Abarenkov et al. 2021). These UNITE annotations are displayed in the National Centre for Biotechnology Information (NCBI) records through links to the PlutoF platform. However, there was the need for an automated solution that allowed third-party annotations to any sequence or sample record at INSDC. This was implemented through the operation of the ELIXIR Contextual Data ClearingHouse (hereafter as Clearinghouse). The Clearinghouse holds a simple RESTful Application Programming Interface (API) to support the submission of additions and improvements to current metadata attributes, such as information on material sources, on records publicly available in the ELIXIR data resources. The Clearinghouse enables the submission of these corrected metadata from databases (such as the PlutoF platform) to the primary data repositories.The workflow developed is shown in Fig. 1 and consists of the following steps: i) users annotate sequence metadata that is regularly downloaded from INSDC using NCBI’s E-utilities; ii) an annotation proposal is created and a verification notification is sent to an assigned reviewer; iii) the reviewer evaluates the annotation proposal and accepts it or rejects it with comments; iv) if the annotation proposal is accepted, the annotated fields that may be mapped to ENA fields are then pushed to the Clearinghouse using their RESTful API. The annotations when received at ENA are then reviewed before being displayed. This workflow is implemented through a web interface in PlutoF, which allows user-friendly and effortless reporting of corrections or additions to biological source metadata in sequence records.Overall, we expect this tool to contribute to the enrichment of metadata associated with sequence records, and therefore increase the links between the molecular and biodiversity resources, and enable sequencing data to deliver their full potential for biodiversity conservation

    The Taxon Hypothesis Paradigm - On the Unambiguous Detection and Communication of Taxa

    No full text
    Here, we describe the taxon hypothesis (TH) paradigm, which covers the construction, identification, and communication of taxa as datasets. Defining taxa as datasets of individuals and their traits will make taxon identification and most importantly communication of taxa precise and reproducible. This will allow datasets with standardized and atomized traits to be used digitally in identification pipelines and communicated through persistent identifiers. Such datasets are particularly useful in the context of formally undescribed or even physically undiscovered species if data such as sequences from samples of environmental DNA (eDNA) are available. Implementing the TH paradigm will to some extent remove the impediment to hastily discover and formally describe all extant species in that the TH paradigm allows discovery and communication of new species and other taxa also in the absence of formal descriptions. The TH datasets can be connected to a taxonomic backbone providing access to the vast information associated with the tree of life. In parallel to the description of the TH paradigm, we demonstrate how it is implemented in the UNITE digital taxon communication system. UNITE TH datasets include rich data on individuals and their rDNA ITS sequences. These datasets are equipped with digital object identifiers (DOI) that serve to fix their identity in our communication. All datasets are also connected to a GBIF taxonomic backbone. Researchers processing their eDNA samples using UNITE datasets will, thus, be able to publish their findings as taxon occurrences in the GBIF data portal. UNITE species hypothesis (species level THs) datasets are increasingly utilized in taxon identification pipelines and even formally undescribed species can be identified and communicated by using UNITE. The TH paradigm seeks to achieve unambiguous, unique, and traceable communication of taxa and their properties at any level of the tree of life. It offers a rapid way to discover and communicate undescribed species in identification pipelines and data portals before they are lost to the sixth mass extinction
    corecore