10 research outputs found
Recommended from our members
AtMoDat: Improving the reusability of ATmospheric MOdel DATa with DataCite DOIs paving the path towards FAIR data
The generation of high quality research data is expensive. The FAIR principles were established to foster the reuse of such data for the benefit of the scientific community and beyond. Publishing research data with metadata and DataCite DOIs in public repositories makes them findable and accessible (FA of FAIR). However, DOIs and basic metadata do not guarantee the data are actually reusable without discipline-specific knowledge: if data are saved in proprietary or undocumented file formats, if detailed discipline-specific metadata are missing and if quality information on the data and metadata are not provided. In this contribution, we present ongoing work in the AtMoDat project, -a consortium of atmospheric scientists and infrastructure providers, which aims on improving the reusability of atmospheric model data.
Consistent standards are necessary to simplify the reuse of research data. Although standardization of file structure and metadata is well established for some subdomains of the earth system modeling community – e.g. CMIP –, several other subdomains are lacking such standardization. Hence, scientists from the Universities of Hamburg and Leipzig and infrastructure operators cooperate in the AtMoDat project in order to advance standardization for model output files in specific subdomains of the atmospheric modeling community. Starting from the demanding CMIP6 standard, the aim is to establish an easy-to-use standard that is at least compliant with the Climate and Forecast (CF) conventions. In parallel, an existing netCDF file convention checker is extended to check for the new standards. This enhanced checker is designed to support the creation of compliant files and thus lower the hurdle for data producers to comply with the new standard. The transfer of this approach to further sub-disciplines of the earth system modeling community will be supported by a best-practice guide and other documentation. A showcase of a standard for the urban atmospheric modeling community will be presented in this session. The standard is based on CF Conventions and adapts several global attributes and controlled vocabularies from the well-established CMIP6 standard.
Additionally, the AtMoDat project aims on introducing a generic quality indicator into the DataCite metadata schema to foster further reuse of data. This quality indicator should require a discipline-specific implementation of a quality standard linked to the indicator. We will present the concept of the generic quality indicator in general and in the context of urban atmospheric modeling data
Recommendations for Discipline-Specific FAIRness Evaluation Derived from Applying an Ensemble of Evaluation Tools
From a research data repositories’ perspective, offering research data management services in line with the FAIR principles is becoming increasingly important. However, there exists no globally established and trusted approach to evaluate FAIRness to date. Here, we apply five different available FAIRness evaluation approaches to selected data archived in the World Data Center for Climate (WDCC). Two approaches are purely automatic, two approaches are purely manual and one approach applies a hybrid method (manual and automatic combined).
The results of our evaluation show an overall mean FAIR score of WDCC-archived (meta) data of 0.67 of 1, with a range of 0.5 to 0.88. Manual approaches show higher scores than automated ones and the hybrid approach shows the highest score. Computed statistics indicate that the test approaches show an overall good agreement at the data collection level.
We find that while neither one of the five valuation approaches is fully fit-forpurpose to evaluate (discipline-specific) FAIRness, all have their individual strengths. Specifically, manual approaches capture contextual aspects of FAIRness relevant for reuse, whereas automated approaches focus on the strictly standardised aspects of machine actionability. Correspondingly, the hybrid method combines the advantages and eliminates the deficiencies of manual and automatic evaluation approaches.
Based on our results, we recommend future FAIRness evaluation tools to be based on a mature hybrid approach. Especially the design and adoption of the discipline-specific aspects of FAIRness will have to be conducted in concerted community efforts
Recommendations for Discipline-Specific FAIRness Evaluation Derived from Applying an Ensemble of Evaluation Tools
From a research data repositories’ perspective, offering research data management services in line with the FAIR principles is becoming increasingly important. However, there exists no globally established and trusted approach to evaluate FAIRness to date. Here, we apply five different available FAIRness evaluation approaches to selected data archived in the World Data Center for Climate (WDCC). Two approaches are purely automatic, two approaches are purely manual and one approach applies a hybrid method (manual and automatic combined).
The results of our evaluation show an overall mean FAIR score of WDCC-archived (meta) data of 0.67 of 1, with a range of 0.5 to 0.88. Manual approaches show higher scores than automated ones and the hybrid approach shows the highest score. Computed statistics indicate that the test approaches show an overall good agreement at the data collection level.
We find that while neither one of the five valuation approaches is fully fit-forpurpose to evaluate (discipline-specific) FAIRness, all have their individual strengths. Specifically, manual approaches capture contextual aspects of FAIRness relevant for reuse, whereas automated approaches focus on the strictly standardised aspects of machine actionability. Correspondingly, the hybrid method combines the advantages and eliminates the deficiencies of manual and automatic evaluation approaches.
Based on our results, we recommend future FAIRness evaluation tools to be based on a mature hybrid approach. Especially the design and adoption of the discipline-specific aspects of FAIRness will have to be conducted in concerted community efforts
Recommended from our members
ATMODAT Standard v3.0
Within the AtMoDat project (Atmospheric Model Data), a standard has been developed which is meant for improving the FAIRness of atmospheric model data published in repositories. The ATMODAT standard includes concrete recommendations related to the maturity, publication and enhanced FAIRness of atmospheric model data. The suggestions include requirements for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF) and the structure within files. Human- and machine readable landing pages are a core element of this standard, and should hold and present discipline-specific metadata on simulation and variable level.
This standard is an updated and translated version of "Bericht über initialen Kernstandard und Kurationskriterien des AtMoDat Projektes (v2.4
DATA PUBLICATION IN THE OPEN ACCESS INITIATIVE
The ‘Berlin Declaration’ was published in 2003 as a guideline to policy makers to promote the Internet as a functional instrument for a global scientific knowledge base. Because knowledge is derived from data, the principles of the ‘Berlin Declaration’ should apply to data as well. Today, access to scientific data is hampered by structural deficits in the publication process. Data publication needs to offer authors an incentive to publish data through long-term repositories. Data publication also requires an adequate licence model that protects the intellectual property rights of the author while allowing further use of the data by the scientific community
Use of persistent identifiers in the publication and citation of scientific data
In the last decade the primary data, research is based on has become a third pillar of scientific work alongside with theoretical reasoning and experiment. Greatly increased computing power and storage, together with web services and other electronic resources have facilitated a quantum leap in new research based on the analysis of great amounts of data. However, traditional scientific communication only slowly changes to new media other than an emulation of paper. This leaves many data inaccessible and, in the long run exposes valuable data to the risk of loss. Most important to the availabilty of data is a valid citation. This means that all fields mandatory for a bibliographic citation are included. In addition a mechanism is needed that ensures that the location of the referenced data on the Internet can be resolved on a long-term. Just using URLs by doing "data management on web servers" does not help at all because it is short-lived, mostly becoming invalid after just a few months. Data publication on the Internet therefore needs a system of reliable pointers to each digital object as integral part of the citation. To achieve this persistence of identifiers for their conventional publications many scientific publishers use Digital Object Identifiers (DOI). The identifier is resolved through the handle system to the valid location (URL) where the dataset can be found. This approach meets one of the prerequisites for citeability of scientific data published online. In addition, the valid bibliographic citation can be included in the catalogues of Libraries. To improve access to data and to create incentives for scientists to make their data accessible, some german data centers initiated a project on publication and citation of scientific data. The project "Publication and Citation of Scientific Data" (STD-DOI) was funded by the German Science Foundation (DFG) between 2003 and 2008. In STD-DOI the German National Library for Science and Technology (TIB Hannover), together with the German Research Centre for Geoscience (GFZ Potsdam), the Alfred Wegener Institute for Polar and Marine Research (AWI) Bremerhaven, the University of Bremen, the Max Planck Institute for Meteorology in Hamburg, and the DLR German Remote Sensing Data Center set up the first system to assign DOIs to data sets and finaly to its publications. The STD-DOI system for data publication is now used by eight data publication agents. Data publication through specific agents addresses specific user communities and cater for their requirements in the data publication process. The registration process between TIB and the publication agents is based on a SOAP web service. This presentation will show the organisational and technical aspects of the data publication process through the STD-DOI project and give examples of a successful workflow towards established data citations in the earth sciences
Atarrabi - A Workflow System for the Publication of Environmental Data
In a research project funded by the German Research Foundation, meteorologists, data publication experts, and computer scientists optimised the publication process of meteorological data and developed software that supports metadata review. The project group placed particular emphasis on scientific and technical quality assurance of primary data and metadata. At the end, the software automatically registers a Digital Object Identifier at DataCite. The software has been successfully integrated into the infrastructure of the World Data Center for Climate, but a key objective was to make the results applicable to data publication processes in other sciences as well
DATA PUBLICATION IN THE OPEN ACCESS INITIATIVE
The ‘Berlin Declaration ’ was published in 2003 as a guideline to policy makers to promote the Internet as a functional instrument for a global scientific knowledge base. Because knowledge is derived from data, the principles of the ‘Berlin Declaration ’ should apply to data as well. Today, access to scientific data is hampered by structural deficits in the publication process. Data publication needs to offer authors an incentive to publish data through long-term repositories. Data publication also requires an adequate licence model that protects the intellectual property rights of the author while allowing further use of the data by the scientific community