10 research outputs found
From Raw Data to Data Standards through Quality Assessment and Semantic Annotation
Data quality and documentation are at the core of the FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al. 2016). Regarding biodiversity and more broadly ecology domains, complementary solutions of the well-known data standard (notably through Darwin Core (Wieczorek et al. 2012)) orientation are emerging from the intensive use of EML (Ecological Metadata Language (Michener et al. 1997)) metadata standard. These notably capitalize on using:semantic annotation from EML metadata documents that describe data attributes, andFAIR quality assessment as proposed by DataOne (Data Observation Network for Earth) network.Here we propose to present this point of view by orchestrating the production of rich (with attributes description and links with terminological resources terms) EML metadata from raw datafiles and, through the generation of FAIR metrics for direct assessment of FAIRness and creation of data standards like Darwin Core. Using EML, we can describe each data attribute (e.g., name, type, unit) and associate each attribute one to several terms coming from terminological resources. Using the Darwin Core vocabulary as a terminological resource, we can thus associate, on the metadata file, original attributes terms to corresponding Darwin Core ones. Then, the data and their metadata files can be processed in order to automatically create the necessary files for a Darwin Core Archive. By acting at the metadata level, associated with accessible raw data files, we can associate raw attribute names to standardized ones, and then, potentially create data standards
Open Science for Better FAIRness: A biodiversity virtual research environment point of view
"FAIR (Findable, Accessible, Interoperable, Reusable) principles" (Wilkinson et al. 2016) and "open science" are two complementary movements in biodiversity science. Although we need to transition to making scientific data and associated material more FAIR, this does not necessarily imply open data or open source algorithms. Here, based on the experience of the French Biodiversity Data Hub ("PÎle national de données de Biodiversité" - PNDB), which is an e-infrastructure for and by researchers, we want to showcase how focusing on openness can be a strategy to efficiently reach greater FAIRness. Following DataOne guidance, we can build a complete data/metadata ecosystem allowing us to structure heterogeneous environmental information systems. Using the Galaxy analysis platform and its related initiatives (Galaxy training network, European Erasmus+ Gallantries project, bioconda, bioContainer), we can thus illustrate how we can create transparent, peer-reviewed and accessible tools and workflows and collaborative training material. The Galaxy platform also facilitates use of high performance computing infrastructure through notably the European Open Science Cloud marketplace. Finally, through our experiences contributing to open source projects like EML (Ecological Metadata Language (Michener et al. 1997)) Assembly Line, EDI (Environmental Data Initiative, or PAMPA (Indicators of Marine Protected Areas performance for managing coastal ecosystems, resources and their uses), a French platform to help protected areas managers to standardize and analyse their data, we also show how building open source "doors" through the R Shiny programming language to these environments can be beneficial for all
From Biodiversity Observation Networks to Datasets and Workflows Supporting Biodiversity Indicators, a French Biodiversity Observation Network (BON) Essential Biodiversity Variables (EBV) Operationalization Pilot using Galaxy and Ecological Metadata Language
Integration of biological data with different ecological scales is complex! The biodiversity community (scientists, policy makers, managers, citizen, NGOs) needs to build a framework of harmonized and interoperable data from raw, heterogeneous and scattered datasets. Such a framework will help observation, measurement and understanding of the spatio-temporal dynamic of biodiversity from local to global scales. One of the most relevant approaches to reach that aim is the concept of Essential Biodiversity Variables (EBV). As we can potentially extract a lot of information from raw datasets sampled at different ecological scales, the EBV concept represents a useful leverage for identifying appropriate data to be collated as well as associated analytical workflows for processing these data. Thanks to FAIR data and source code implementation (Findable, Accessible, Interoperability, Reusable), it is possible to make a transparent assessment of biodiversity by generating operational biodiversity indicators (that can be reused / declined) through the EBV framework, and help designing or improving biodiversity monitoring at various scales. Through the BiodiFAIRse GO FAIR implementation network, we established how ecological and environmental sciences can benefit from existing open standards, tools and platforms used by European, Australian and United States infrastructures, particularly regarding the Galaxy platform for code sources accessiblility and the DataOne network of data catalogs and the Ecological Metadata Language standard for data management. We propose that these implementation choices can help fight the biodiversity crisis by supporting the important mission of GEO BON (Group on Earth Observation Biodiversity Observation Network): âImprove the acquisition, coordination and delivery of biodiversity observations and related services to users including decision makers and the scientific communityâ (GEO BON 2022)
French Biodiversity Data Hub: Linking local to global biodiversity through international initiatives and open science clouds
The French national biodiversity data hub (âPĂŽle National de DonnĂ©es de BiodiversitĂ©â - PNDB) is a national e-infrastructure created in 2018 and led by the National Museum of Natural History, contributing to the Open Science policy of the Ministry of Higher Education, Research and Innovation (MESRI).PNDB contributes to building an integrative framework taking into account biodiversity over the long term (from the origins of life to future models), at all biological scales (from the molecule to the socio-ecosystem), and in all its interactions, by providing tools and services for the description, access, validation, analysis and reuse of biodiversity data.With the diversity and complementary type of research biodiversity data (information systems, institutional data repositories, research infrastructures as observatories, experimental devices, natural history collections, etc.), but also from public policy data, the missions of the PNDB are deeply based on the FAIR approach (making data Findable, Accessible, Interoperable, Reusable).Thanks to its nomination in 2022 as a thematic reference center of the MESRI, PNDB will contribute to promoting the FAIR approach, will increase the skills (e.g., by training, good practices) of the scientific communities around open science, and stimulate interactions between producers and users of biodiversity data.PNDB has led the French participation to GEO BON (Group on Earth Observations Biodiversity Observation Network) since 2018 and recently shared the lead with public policies information system coordination. Thanks to this co-lead, this national BON proposes an innovative coordination of all biodiversity monitoring programs, from expertise to research around an innovative Essential Biodiversity Variable (EBV) operationalization pilot. This pilot is made of open practical solutions providing a particular high degree of FAIRNess of biodiversity research objects, from data to source codes. PNDB is also a major European point of contact for the DataOne network, who, in combination with the strong link between PNDB and French Global Biodiversity Information Facility (GBIF) node colleagues, allows the dissemination of all types of data through the world in the best manner
Kakila « qui est la ? » - Base de données d'observation de cétacés dans l'archipel Guadeloupéen
International audienc
Kakila « qui est la ? » - Base de données d'observation de cétacés dans l'archipel Guadeloupéen
International audienc
Kakila database: Towards a FAIR community approved database of cetacean presence in the waters of the Guadeloupe Archipelago, based on citizen science
International audienceBackground: In the French West Indies, more than 20 species of cetaceans have been observed over the last decades. The recognition of this hotspot of biodiversity of marine mammals, observed in the French Exclusive Economic Zone of the West Indies, motivated the French government to create in 2010 a marine protected area (MPA) dedicated to the conservation of marine mammals: the Agoa Sanctuary. Threats that cetacean populations face are multiple, but well-documented. Cetacean conservation can only be achieved if relevant and reliable data are available, starting by occurrence data. In the Guadeloupe Archipelago and in addition to some data collected by the Agoa Sanctuary, occurrence data are mainly available through the contribution of citizen science and of local stakeholders (i.e. non-profit organisations (NPO) and whale-watchers). However, no observation network has been coordinated and no standards exist for cetacean presence data collection and management.New information: In recent years, several whale watchers and NPOs regularly collected cetacean observation data around the Guadeloupe Archipelago. Our objective was to gather datasets from three Guadeloupean whale watchers, two NPOs and the Agoa Sanctuary, that agreed to share their data. These heterogeneous data went through a careful process of curation and standardisation in order to create a new extended database, using a newly-designed metadata set. This aggregated dataset contains a total of 4,704 records of 21 species collected in the Guadeloupe Archipelago from 2000 to 2019. The database was called Kakila ("who is there?" in Guadeloupean Creole). The Kakila database was developed following the FAIR principles with the ultimate objective of ensuring sustainability. All these data were transferred into the PNDB repository (Pöle National de Données de Biodiversité, Biodiversity French Data Hub, https://www.pndb.fr).In the Agoa Sanctuary and surrounding waters, marine mammals have to interact with increasing anthropogenic pressure from growing human activities. In this context, the Kakila database fulfils the need for an organised system to structure marine mammal occurrences collected by multiple local stakeholders with a common objective: contribute to the knowledge and conservation of cetaceans living in the French Antilles waters. Much needed data analysis will enable us to identify high cetacean presence areas, to document the presence of rarer species and to determine areas of possible negative interactions with anthropogenic activities
Baleines et dauphins : des belles espĂšces sentinelles Ă Ă©tudier dans le cadre d'un OHM littoral, mais d'un abord bien complexe
International audienc