20 research outputs found

    Community next steps for making globally unique identifiers work for biocollections data

    Get PDF
    Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided

    Biodiversity and Biocollections: Problem of Correspondence

    Get PDF
    This text is an English translation of those several sections of the original paper in Russian, where collection-related issues are considered. The full citation of the original paper is as following: Pavlinov I.Ya. 2016. [Bioraznoobrazie i biokollektsii: problema sootvetstvia]. In: Pavlinov I.Ya. (comp.). Aspects of Biodiversity. Archives of Zoological Museum of Lomonosov Moscow State University, Vol. 54, Pр. 733–786. Orientation of biology, as a natural science, on the study and explanation of the similarities and differences between organisms led in the second half of the 20th century to the recognition of a specifi c subject area of biological explorations, viz. biodiversity (BD). One of the important general scientifi c prerequisites for this shift was understanding that (at the level of ontology) the structured diversity of the living nature is its fundamental property equivocal to subjecting of some of its manifestations to certain laws. At the level of epistemology, this led to acknowledging that the “diversifi cationary” approach to description of the living beings is as justifi able as the before dominated “unifi cationary” one. This general trend has led to a signifi cant increase in the attention to BD. From a pragmatic perspective, its leitmotif was conservation of BD as a renewable resource, while from a scientifi c perspective the leitmotif was studying it was studying BD as a specifi c natural phenomenon. These two points of view are united by recognition of the need for scientific substantiation of BD conservation strategy, which implies the need for a detailed study of BD itself. At the level of ontology, one of the key problems in the study of BD (leaving aside the question of its genesis) is determination of its structure, which is interpreted as a manifestation of the structure of the Earth’s biota itself. With this, it is acknowledged that the subject area of empirical explorations is not the BD as a whole ( “Umgebung”) but its particular manifestations (“Umwelts”). It is proposed herewith to recognized, within the latter: fragments of BD (especially taxa and ecosystems), hierarchical levels of BD (primarily within- and interorganismal ones), and aspects of BD (before all taxonomic and meronomic ones). Attention is drawn to a new interpretation of bioinformatics as a discipline that studies the information support of BD explorations. An important fraction of this support are biocollections. The scientifi c value of collections means that they make it possible both empirical inferring and testing (verification) of the knowledge about BD. This makes biocollections, in their epistemological status, equivalent to experiments, and so makes studies of BD quite scientific. It is emphasized that the natural objects (naturalia), which are permanently kept in collections, contain primary (objective) information about BD, while information retrieved somehow from them is a secondary (subjective) one. Collection, as an information resource, serves as a research sample in the studies of BD. Collection pool, as the totality of all collection materials kept in repositories according to certain standards, can be treated as a general sample, and every single collection as a local sample. The main characteristic of collection-as-sample is its representativeness; so the basic strategy of development of the collection pool is to maximize its representativeness as a means to ensure correspondence of structure of biocollection pool to that of BD itself. The most fundamental characteristic of collection, as an information resource, is its scientific signifi cance. The following three main groups of more particular characteristics are distinguished: — the “proper” characteristics of every collection are its meaningfulness, informativeness, reliability, adequacy, documenting, systematicity, volume, structure, uniqueness, stability, lability; — the “external” characteristics of collection are resolution, usability, ethic constituent; — the “service” characteristics of collection are its museofication, storage system security, inclusion in metastructure, cost. In the contemporary world, development of the biocollection pool, as a specific resource for BD research, requires considerable organizational efforts, including work on their “information support” aimed at demonstrating the necessity of existence of the biocollections

    Building essential biodiversity variables (EBVs) of species distribution and abundance at a global scale

    Get PDF
    Much biodiversity data is collected worldwide, but it remains challenging to assemble the scattered knowledge for assessing biodiversity status and trends. The concept of Essential Biodiversity Variables (EBVs) was introduced to structure biodiversity monitoring globally, and to harmonize and standardize biodiversity data from disparate sources to capture a minimum set of critical variables required to study, report and manage biodiversity change. Here, we assess the challenges of a ‘Big Data’ approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance. The majority of currently available data on species distributions derives from incidentally reported observations or from surveys where presence-only or presence–absence data are sampled repeatedly with standardized protocols. Most abundance data come from opportunistic population counts or from population time series using standardized protocols (e.g. repeated surveys of the same population from single or multiple sites). Enormous complexity exists in integrating these heterogeneous, multi-source data sets across space, time, taxa and different sampling methods. Integration of such data into global EBV data products requires correcting biases introduced by imperfect detection and varying sampling effort, dealing with different spatial resolution and extents, harmonizing measurement units from different data sources or sampling methods, applying statistical tools and models for spatial inter- or extrapolation, and quantifying sources of uncertainty and errors in data and models. To support the development of EBVs by the Group on Earth Observations Biodiversity Observation Network (GEO BON), we identify 11 key workflow steps that will operationalize the process of building EBV data products within and across research infrastructures worldwide. These workflow steps take multiple sequential activities into account, including identification and aggregation of various raw data sources, data quality control, taxonomic name matching and statistical modelling of integrated data. We illustrate these steps with concrete examples from existing citizen science and professional monitoring projects, including eBird, the Tropical Ecology Assessment and Monitoring network, the Living Planet Index and the Baltic Sea zooplankton monitoring. The identified workflow steps are applicable to both terrestrial and aquatic systems and a broad range of spatial, temporal and taxonomic scales. They depend on clear, findable and accessible metadata, and we provide an overview of current data and metadata standards. Several challenges remain to be solved for building global EBV data products: (i) developing tools and models for combining heterogeneous, multi-source data sets and filling data gaps in geographic, temporal and taxonomic coverage, (ii) integrating emerging methods and technologies for data collection such as citizen science, sensor networks, DNA-based techniques and satellite remote sensing, (iii) solving major technical issues related to data product structure, data storage, execution of workflows and the production process/cycle as well as approaching technical interoperability among research infrastructures, (iv) allowing semantic interoperability by developing and adopting standards and tools for capturing consistent data and metadata, and (v) ensuring legal interoperability by endorsing open data or data that are free from restrictions on use, modification and sharing. Addressing these challenges is critical for biodiversity research and for assessing progress towards conservation policy targets and sustainable development goals

    Building a Portal for Scientific Collections at the University of Lisbon

    Get PDF
    Tese de mestrado, Bioinformåtica e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2020As coleçþes científicas, reunindo uma enorme quantidade e diversidade de objetos e os dados que lhes estão associados, constituem um valioso património histórico, científico e cultural. Estas coleçþes estão, geralmente, sob a responsabilidade dos museus e dos seus respetivos curadores, sendo importante que exista uma plataforma sobre a qual os responsåveis das mesmas possam efetuar operaçþes de gestão e de manutenção das mesmas. Atendendo a diversidade das coleçþes, estes dados, pertencentes a diferentes domínios científicos e com propriedades distintas, colocam problemas de integração, disponibilização e manutenção, problemas estes cada vez mais pertinentes numa realidade que vive de dados e da anålise e partilha dos mesmos. Este projeto, centrado neste desafio, pretendeu desenvolver, para o Museu Nacional de História Natural e da Ciência da Universidade de Lisboa, uma plataforma que agregasse as variadíssimas coleçþes desta instituição, tirando partido de uma plataforma open-source base chamada CollectiveAccess. No decorrer do mesmo, foi desenvolvida uma metodologia generalizada para qualquer coleção que cobre os processos desde a aquisição dos dados, o seu processamento e correção ate a sua importação e disponibilização dentro da plataforma. Foram, tambÊm, desenvolvidas e implementadas funcionalidades especificas que visaram resolver determinadas características particulares dos diferentes conjuntos de dados como e o caso da implementação de um sistema hierårquico para dados relacionados com taxonomia, sistema de introdução de dados geogråficos utilizando uma API externa e desenvolvimento das funcionalidades de pesquisa de modo a satisfazerem as necessidades de cada conjunto de dados. Estas funcionalidades e o desempenho do sistema foram avaliados atravÊs de dois questionårios de usabilidade (System Usability Scale), atraves de dois Google Form diferentes. Estes questionårios foram direcionados para dois tipos principais de utilizadores do sistema: curadores e publico, em geral. Para alem disto, foram pedidos comentårios e sugestþes de melhorias ou acrescento de funcionalidades. Os resultados dos questionårios foram satisfatórios obtendo-se uma classificação de A e B, por parte dos testes do publico e dos curadores respetivamente, na escala de usabilidade. A analise dos comentårios e sugestþes tambÊm permitiu obter uma ideia sobre possíveis melhoramentos e novas funcionalidades a implementar.With scientific collections bringing together a huge number and diversity of objects and the data associated with them, they constitute a valuable historical, scientific and cultural heritage. These collections are generally under the responsibility of museums and their respective curators, and it is important that there is a platform on which those responsible for them can carry out management and maintenance operations. Given the diversity of the collections, these data, belonging to different scientific domains and with different properties, pose problems of integration, availability and maintenance, problems that are increasingly relevant in a data-centric world that relies on the analysis and sharing of the data. This project, focused on this challenge, aimed to develop, for the Museu Nacional de Historia Natural e da Ciência da Universidade de Lisboa, a platform that aggregates the very diverse collections of this institution, taking advantage of an open-source base platform called CollectiveAccess. In the course of the same, a generalized methodology was developed for any collection, covering the processes from the acquisition of the data, its processing and correction to its import and availability within the platform. Specific features were also developed and implemented that aimed at solving certain particular characteristics of different data sets, such as the implementation of a hierarchical system for taxonomyrelated data, geographic data entry system using an external API and development of the base search features, meeting the requirements for each collection. These functionalities and the overall performance of the system were evaluated through two usability questionnaires (System Usability Scale), via two different Google Forms. These questionnaires were aimed at two main types of users of the system: curators and the general public. In addition, comments and suggestions for improvements or addition of features were requested. The results of the questionnaires were satisfactory, obtaining a classification of A and B, by the tests of the public and the curators, respectively, on the usability scale. The analysis of comments and suggestions also provided an idea of possible improvements and new features to be implemented

    Towards a Post-Graduate Level Curriculum for Biodiversity Informatics. Perspectives from the Global Biodiversity Information Facility (GBIF) Community

    Get PDF
    Biodiversity informatics is a new and evolving field, requiring efforts to develop capacity and a curriculum for this field of science. The main objective was to summarise the level of activity and the efforts towards developing biodiversity informatics curricula, for work-based training and/or academic teaching at universities, taking place within the Global Biodiversity Information Facility (GBIF) countries and its associated network. A survey approach was used to identify existing capacities and resources within the network. Most of GBIF Nodes survey respondents (80%) are engaged in onsite training activities, with a focus on work-based professionals, mostly researchers, policy-makers and students. Training topics include data mobilisation, digitisation, management, publishing, analysis and use, to enable the accessibility of analogue and digital biological data that currently reside as scattered datasets. An initial assessment of academic teaching activities highlighted that countries in most regions, to varying degrees, were already engaged in the conceptualisation, development and/or implementation of formal academic programmes in biodiversity informatics, including programmes in Benin, Colombia, Costa Rica, Finland, France, India, Norway, South Africa, Sweden, Taiwan and Togo. Digital e-learning platforms were an important tool to help build capacity in many countries. In terms of the potential in the Nodes network, 60% expressed willingness to be recruited or commissioned for capacity enhancement purposes. Contributions and activities of various country nodes across the network have been highlighted and a working curriculum framework has been defined. Š 2021. Parker-Allie F et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are creditedPeer reviewe

    Digital Extended Specimens: Enabling an Extensible Network of Biodiversity Data Records as Integrated Digital Objects on the Internet

    Get PDF
    The early twenty-first century has witnessed massive expansions in availability and accessibility of digital data in virtually all domains of the biodiversity sciences. Led by an array of asynchronous digitization activities spanning ecological, environmental, climatological, and biological collections data, these initiatives have resulted in a plethora of mostly disconnected and siloed data, leaving to researchers the tedious and time-consuming manual task of finding and connecting them in usable ways, integrating them into coherent data sets, and making them interoperable. The focus to date has been on elevating analog and physical records to digital replicas in local databases prior to elevating them to ever-growing aggregations of essentially disconnected discipline-specific information. In the present article, we propose a new interconnected network of digital objects on the Internet—the Digital Extended Specimen (DES) network—that transcends existing aggregator technology, augments the DES with third-party data through machine algorithms, and provides a platform for more efficient research and robust interdisciplinary discovery

    Toward a Flexible Metadata Pipeline for Fish Specimen Images

    Full text link
    Flexible metadata pipelines are crucial for supporting the FAIR data principles. Despite this need, researchers seldom report their approaches for identifying metadata standards and protocols that support optimal flexibility. This paper reports on an initiative targeting the development of a flexible metadata pipeline for a collection containing over 300,000 digital fish specimen images, harvested from multiple data repositories and fish collections. The images and their associated metadata are being used for AI-related scientific research involving automated species identification, segmentation and trait extraction. The paper provides contextual background, followed by the presentation of a four-phased approach involving: 1. Assessment of the Problem, 2. Investigation of Solutions, 3. Implementation, and 4. Refinement. The work is part of the NSF Harnessing the Data Revolution, Biology Guided Neural Networks (NSF/HDR-BGNN) project and the HDR Imageomics Institute. An RDF graph prototype pipeline is presented, followed by a discussion of research implications and conclusion summarizing the results.Comment: 12 pages. 5 figures. Presented at the 16th International Conference on Metadata and Semantics Research. To be published in the conference proceedings of Metadata and Semantic Research: 16th International Conference, MTSR 2022, London, United Kingdom, November 8-10, 202

    A decadal view of biodiversity informatics: challenges and priorities

    Get PDF
    Biodiversity informatics plays a central enabling role in the research community's efforts to address scientific conservation and sustainability issues. Great strides have been made in the past decade establishing a framework for sharing data, where taxonomy and systematics has been perceived as the most prominent discipline involved. To some extent this is inevitable, given the use of species names as the pivot around which information is organised. To address the urgent questions around conservation, land-use, environmental change, sustainability, food security and ecosystem services that are facing Governments worldwide, we need to understand how the ecosystem works. So, we need a systems approach to understanding biodiversity that moves significantly beyond taxonomy and species observations. Such an approach needs to look at the whole system to address species interactions, both with their environment and with other species. It is clear that some barriers to progress are sociological, basically persuading people to use the technological solutions that are already available. This is best addressed by developing more effective systems that deliver immediate benefit to the user, hiding the majority of the technology behind simple user interfaces. An infrastructure should be a space in which activities take place and, as such, should be effectively invisible. This community consultation paper positions the role of biodiversity informatics, for the next decade, presenting the actions needed to link the various biodiversity infrastructures invisibly and to facilitate understanding that can support both business and policy-makers. The community considers the goal in biodiversity informatics to be full integration of the biodiversity research community, including citizens’ science, through a commonly-shared, sustainable e-infrastructure across all sub-disciplines that reliably serves science and society alike

    Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies

    Get PDF
    The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers
    corecore