3 research outputs found

    A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis

    No full text
    To ensure seamless, programmatic access to data for High Performance Computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a Data Quality Strategy (DQS) that currently provides processes for: (1) Consistency of data structures needed for a High Performance Data (HPD) platform; (2) Quality Control (QC) through compliance with recognized community standards; (3) Benchmarking cases of operational performance tests; and (4) Quality Assurance (QA) of data through demonstrated functionality and performance across common platforms, tools and services. By implementing the NCI DQS, we have seen progressive improvement in the quality and usefulness of the datasets across the different subject domains, and demonstrated the ease by which modern programmatic methods can be used to access the data, either in situ or via web services, and for uses ranging from traditional analysis methods through to emerging machine learning techniques. To help increase data re-usability by broader communities, particularly in high performance environments, the DQS is also used to identify the need for any extensions to the relevant international standards for interoperability and/or programmatic access

    Recommendations for Discipline-Specific FAIRness Evaluation Derived from Applying an Ensemble of Evaluation Tools

    Get PDF
    From a research data repositories’ perspective, offering research data management services in line with the FAIR principles is becoming increasingly important. However, there exists no globally established and trusted approach to evaluate FAIRness to date. Here, we apply five different available FAIRness evaluation approaches to selected data archived in the World Data Center for Climate (WDCC). Two approaches are purely automatic, two approaches are purely manual and one approach applies a hybrid method (manual and automatic combined). The results of our evaluation show an overall mean FAIR score of WDCC-archived (meta) data of 0.67 of 1, with a range of 0.5 to 0.88. Manual approaches show higher scores than automated ones and the hybrid approach shows the highest score. Computed statistics indicate that the test approaches show an overall good agreement at the data collection level. We find that while neither one of the five valuation approaches is fully fit-forpurpose to evaluate (discipline-specific) FAIRness, all have their individual strengths. Specifically, manual approaches capture contextual aspects of FAIRness relevant for reuse, whereas automated approaches focus on the strictly standardised aspects of machine actionability. Correspondingly, the hybrid method combines the advantages and eliminates the deficiencies of manual and automatic evaluation approaches. Based on our results, we recommend future FAIRness evaluation tools to be based on a mature hybrid approach. Especially the design and adoption of the discipline-specific aspects of FAIRness will have to be conducted in concerted community efforts

    Quality Management Framework for Climate Datasets

    Get PDF
    Data from a variety of research programmes are increasingly used by policy makers, researchers, and private sectors to make data-driven decisions related to climate change and variability. Climate services are emerging as the link to narrow the gap between climate science and downstream users. The Global Framework for Climate Services (GFCS) of the World Meteorological Organization (WMO) offers an umbrella for the development of climate services and has identified the quality assessment, along with its use in user guidance, as a key aspect of the service provision. This offers an extra stimulus for discussing what type of quality information to focus on and how to present it to downstream users. Quality has become an important keyword for those working on data in both the private and public sectors and significant resources are now devoted to quality management of processes and products. Quality management guarantees reliability and usability of the product served, it is a key element to build trust between consumers and suppliers. Untrustworthy data could lead to a negative economic impact at best and a safety hazard at worst. In a progressive commitment to establish this relation of trust, as well as providing sufficient guidance for users, the Copernicus Climate Change Service (C3S) has made significant investments in the development of an Evaluation and Quality Control (EQC) function. This function offers a homogeneous user-driven service for the quality of the C3S Climate Data Store (CDS). Here we focus on the EQC component targeting the assessment of the CDS datasets, which include satellite and in-situ observations, reanalysis, climate projections, and seasonal forecasts. The EQC function is characterised by a two-tier review system designed to guarantee the quality of the dataset information. While the need of assessing the quality of climate data is well recognised, the methodologies, the metrics, the evaluation framework, and how to present all this information to the users have never been developed before in an operational service, encompassing all the main climate dataset categories. Building the underlying technical solutions poses unprecedented challenges and makes the C3S EQC approach unique. This paper describes the development and the implementation of the operational EQC function providing an overarching quality management service for the whole CDS data.This study is based on work carried out in the C3S_512 contract funded by Copernicus Programme and operated by ECMWF on behalf of the European Commission (Service Contract number: ECMWF/COPERNICUS720187C3S_512_BSC). We would like to acknowledge the work of colleagues from several European institutions, the data providers and C3S, who contributed to the development of the EQC framework as well as to the QAR production. We would also like to acknowledge the focus group users, who took time to review and provide valuable feedback on the QARs, QATs, minimum requirements and the CDS quality assessment tab. The authors are grateful to the anonymous reviewers for their constructive comments that have helped for the improvement of this paper.Peer Reviewed"Article signat per 23 autors/es: Carlo Lacagnina , Francisco Doblas-Reyes, Gilles Larnicol, Carlo Buontempo, André Obregón, Montserrat Costa-Surós, Daniel San-Martín, Pierre-Antoine Bretonnière, Suraj D. Polade, Vanya Romanova, Davide Putero, Federico Serva, Alba Llabrés-Brustenga, Antonio Pérez, Davide Cavaliere, Olivier Membrive, Christian Steger, Núria Pérez-Zanón, Paolo Cristofanelli, Fabio Madonna, Marco Rosoldi, Aku Riihelä, Markel García Díez"Postprint (published version
    corecore