8 research outputs found

    Facing the Challenges in simulation-based Earth System Sciences and the Role of FAIR Digital Objects

    Get PDF
    MotivationResults of simulations with climate models form the most important basis for research and statements about possible changes in the future global, regional and local climate. These output volumes are increasing at an exponential rate (Balaji et al. 2018, Stevens et al. 2019). Efficiently handling these amounts of data is a challenge for researchers, mainly because the development of novel data and workflow handling approaches have not proceeded at the same rate as data volume has been increasing. This problem will only become more pronounced with the ever increasing performance of High Performance Computing (HPC) - systems used to perform weather and climate simulations (Lawrence et al. 2018). For example, in the framework of the European Commission's Destination Earth program the Digital Twins (Bauer et al. 2021) are expected to produce hundreds of terabytes of model output data every day at the EuroHPC computing sites.The described data challenge can be dissected into several aspects, two of which we will focus on in this contribution. Available data in the Earth System Sciences (ESS) are increasingly made openly accessible by various institutions, such as universities, research centres and government agencies, in addition to subject-specific repositories. Further, the exploitability of weather and climate simulation output beyond the expert community by humans and automated agents (as described by the FAIR data principles (F-Findable, A-Accessable, I-Interoperable, R-Reusable), Wilkinson et al. 2016) is currently very limited if not impossible due to disorganized metadata or incomplete provenance information. Additionally, developments regarding globally available and FAIR workflows in the spirit of the FAIR Digital Object (FDO) framework (Schultes and Wittenburg 2019, Schwardmann 2020) are just at the beginning.Cultural ChangeIn order to address the challenges with respect to data mentioned above, current efforts at DKRZ (German Climate Computing Center) are aimed at a complete restructuring of the way research is performed in simulation-based climate research (Anders et al. 2022, Mozaffari et al. 2022, Weigel et al. 2020). DKRZ is perfectly suited for this endeavor, because researchers have the resources and services available to conduct the entire suite of their data-intensive workflows - ranging from planning and setting up of model simulations, analyzing the model output, reusing existing large-volume datasets to data publication and long-term archival. At the moment, DKRZ-users do not have the possibility to orchestrate their workflows via a central service, but rather use a plethora of different tools to piece them together.Framework Environment FrevaThe central element of the new workflow environment at DKRZ shall be represented by the Freva (Free Evaluation System Framework) software infrastructure, which offers standardized data and tool solutions in ESS and is optimized for use on high-performance computer systems (Kadow et al. 2021). Freva is designed to be very well suited to the use of the FDO framework. The crucial aspects here are:the standardisation of data objects as input for analysis and processing,the already implemented remote access to data via a Persisitent Identifier (PID),the currently still system-internal capture of analysis provenance andthe possibility of sharing results but also workflows by research groups up to large communities.It is planned to extend the functionality of Freva so that the system automatically determines the data required for a specific analysis from a researcher’s research question (provided to the system via some interface), enquires available databases (local disk or tape, cloud or federated resources) for that data and retrieves the data if possible. If data are not available (yet), Freva shall be able to automatically configure, set up and submit model simulations to the HPC-System, so that the required data is created and becomes available (cf. Fig. 1). These data will in turn be ingested into Freva’s data catalog for reuse. Next, Freva shall orchestrate and document the analysis performed. Results will be provided either as numerical fields, images or animations depending on the researcher’s need. As a final step, the applied workflow and/or underlying data are published in accordance with the FAIR data guiding principles.FDOs - towards a global integrated Data Space To make the process sketched out above a reality, application of the FDO concept is essential (Schwardmann 2020, Schultes and Wittenburg 2019). There is a long tradition in the ESS community of global dissemination and reuse of large-volume climate data sets. Community standards like those developed and applied in the framework of internationally coordinated model intercomparison studies (CMIP) allow for low-barrier reuse of data (Balaji et al. 2018). Globally resolvable PIDs are provided on a regular basis. Current community ESS standards and workflows are already close to being compatible with implementing FDOs, however, now we also have to work on open points in the FDO concept, which are:the clear definition of community-specific FDO requirements including PID Kernel Types specifications,the operation of data type registries andthe technical implementation requirements for global access to FDOs.With these in place and implemented in Freva following standardized implementation recommendations, automated data queries across spatially distributed or different types of local databases become possible.We introduce the concept of implementations in Freva and also use it to highlight the challenges we face. Using an example, we show the vision of the work of a scientist in earth system science

    The Vision of the FAIR Digital Object Machine and Ubiquitous FDO Services

    Get PDF
    In addition to the previous intensive discussion on the “Data Deluge” with respect to enormous increase of available research data, the 2022 Internet-of-Things conference confirmed that in the near future there will be billions if not trillions of smart IoT devices in a very wide range of applications and locations, many of them with computational capacities. This large number of distributed IoT devices will create continuous streams of data that will require a global framework to facilitate their integration into the Internet to enable controlled access to their data and services, to name but a few aspects. This framework would enable tracking of these IoT devices to measure their resource usage for instance to globally address the UN Sustainable Development Goals. Additionally, policy makers are committed to define regulations to break data monopolies and increase sharing. The result will be an increasingly huge domain of accessible digital data which on the one hand allows addressing new challenges especially cross-sector ones. A key prerequisite for this is to find the right data across domain boundaries supporting a specific task.Digitisation is already being called the fourth industrial revolution and the emerging data and information is the 21st century's new resource. Currently this vision is mostly unrealised due to the inability of existing data and digital resources to be findable, accessible, interoperable, and reusable despite the progress in providing thematic catalogs. As a result, the capacity of this new resource is latent and mostly underutilized. There is no Internet level infrastructure that currently exists to facilitate the process by which all data and digital resources are made consistently and globally accessible. There are patchworks of localized and limited access to trusted data on the Internet created by specific communities that have been funded or directed to collaborate.To turn digital information into a commodity, description, access to, validation, and processing of data needs to become part of the Internet infrastructure we call the Global Integrated Data Space (GIDS). The main pillars of this approach require that data and services be globally identified and consistently accessed, with predictive descriptions and access control to make them globally findable.Currently researchers are relying partly on informal knowledge such as knowing the labs and persons to maximize the chance to access trustworthy data, but this method is limiting the use of suitable data. In the future data scenario, other mechanisms will become possible. In the public information space Google-like searches using specific key terms have become an accepted solution to find documents for human consumption. This approach however, does not work in the GIDS with large numbers of data contributors from a wide range of institutions, from millions of IoT devices worldwide, and where a wide range of data types and automatic data processing procedures dominate. Indeed, successful labs that apply complex models describing digital surrogates can automatically leverage data and data processing procedures from other labs. This makes the currently often operationally applied manual stitching of data and operations too costly both in time and resources to be a competitive option. A researcher looking for specific brain imaging data for a specific study has a few options:Rely on a network of colleagues.Execute Google-like searches in known registries looking for appropriate departments and researchers.Execute Google-like searches on suitable data.He/she engages an agent to execute profile matching in suitable sub-spaces.We assume that data creators will have the capability and be interested to create detailed metadata of different types and that the researchers, who are looking for specific data, will be able to specify precise profiles for data they are looking for. Two of the key characteristics of the future data space will be operations that can carry out profile matching at ultra-high speeds and that will lead to various subspaces according to some facets using self-organizing mechanisms. Of course, this poses high requirements on the metadata quality being used and that creators and potential consumers share knowledge about the semantic space in which they operate, and available semantic mappings used by brokers or self-provided. Metadata must be highly detailed and suitable schemas have been developed already in many communities. In addition to the usual metadata, potential users will need to specify their credentials in the form of trusted roles and their usage purposes to indicate access opportunities.Changing current metadata practices to yield richer metadata as presribed by the FAIR principles will not be simple, especially since we seem to be far away from formalizing roles and usage purposes in a broadly accepted way, but the pressure to create rich and standardized metadata will increase. It should be noted of course that for data streams created by IoT sensors, defining proper metadata is an action that is only requested once or a few times.Why are FDOs special in this automatic profile matching scenario? FDOs are bundling all information required for automatic profile matching in a secure way, i.e., all metadata information are available via the gloablly unique resolvable and persisten identifiers (PID) of the FDO and the PID security mechanisms are at the basis to establish trust. FDOs will be provided with a secure method that is capable of computing efficiently coded profiles representing all properties of an FDO relevant for profile matching. This would speedup profile matching enormously.We will address two major questions characterizing the “FDO Machine” we are envisioning:Which kinds of representations could make profile matching much more efficient?How could FDO-based mechanisms be used to efficiently create sub-spaces that would help the emerging layer of information brokers to offer specialized services addressing specialized needs as for example requested by UN’s Sustainable Development Goals?Brokers might want to use specialized agents to create subspaces along many different important facets such as domains, trustworthiness, roles, etc. Such subspaces are ephemeral virtual structures on top of the huge global integrated data space

    Train-the-Trainer-Konzept zum Thema Forschungsdatenmanagement: Erweiterungsmodul Nachnutzung von Forschungsdaten (Version v1)

    No full text
    Im Rahmen des BMBF-Projekts FDMentor wurde ein deutschsprachiges Train-the-Trainer-Programm zum Thema Forschungsdatenmanagement (FDM) erstellt, das nach Projektende durch Mitglieder der UAG Schulungen/Fortbildungen der DINI/nestor-AG Forschungsdaten fortlaufend ergänzt und aktualisiert wird. Die behandelten Themen umfassen sowohl die inhaltlichen Aspekte des Forschungsdatenmanagements als auch Einheiten zu didaktischen Grundlagen, zur Entwicklung von Lehr- und Workshopkonzepten. Mit der Veröffentlichung der fünften Version des Train-the-Trainer-Konzeptes zum FDM (DOI: 10.5281/zenodo.10122153) wurden zusätzliche thematische Einheiten konzipiert und als Zusatz- bzw. Erweiterungsmodule publiziert. Der Workshop „Nachnutzung von Forschungsdaten“ richtet sich an FDM-Trainer*innen. Thematisiert werden innerhalb von 120 min Aspekte, die bei der Schulung von Forschenden zum Thema Datenwiederverwendung im wissenschaftlichen Kontext (Datenwiederverwendung zur Beantwortung einer Forschungsfrage) berücksichtigt werden sollten. Dazu gehören u. a. verschiedene Bezugswege, die Plausibilitäts- und Qualitätsprüfung sowie lizenzrechtliche Aspekte bei der Nachnutzung von Forschungsdaten. Die Inhalte sind auf eine interdisziplinäre Umsetzung ausgelegt und können an die FDM-Kenntnisse der Zielgruppe angepasst werden. Der Workshop integriert außerdem verschiedene didaktische Elemente und erfüllt damit den Anspruch der UAG Schulungen/Fortbildungen der DINI/nestor-AG Forschungsdaten, bei der alle Vortragenden Mitglieder sind. Die publizierten Materialien beinhalten neben dem Konzept noch Workshopmaterialien wie Arbeitsblätter und Vorlagen, ein Beispiel für ein Lehrdrehbuch sowie eine Erläuterung der verwendeten Methoden. Der Workshop fand am 21. November 2023 digital statt sowie ebenfalls digital in einer vorherigen Version am 14. Februar 2023 im Rahmen der RDA Deutschland Tagung 2023 (13. bis 17. Februar 2023, veranstaltet von der RDA Deutschland e. V. in Kooperation mit dem Helmholtz Open Science Office und der Georg-August-Universität Göttingen) https://indico.desy.de/event/37011/contributions/132887/ und Ariza, A., Asef, E., Jacob, J., Mühlichen, A., Peters-von Gehlen, K., Schranzhofer, H., & Trautwein-Bruns, U. (2023, Februar 14). Datennachnutzung in der Praxis. RDA-DE. Zenodo. https://doi.org/10.5281/zenodo.756826

    FAIR Digital Object Demonstrators 2021

    Get PDF
    This paper gives a summary of implementation activities in the realm of FAIR Digital Objects (FDO). It gives an idea which software components are robust and used for many years, which components are comparatively new and are being tested out in pilot projects and what the challenges are that need to be urgently addressed by the FDO community. After basically only one year of advancing the FDO specifications by the FDO Forum we can recognise an increasing momentum to test and integrate essential FDO components. However, many developments still occur as soloistic engagements that offer a scattered picture. It is widely agreed that it is now time to combine these different pilots to comprehensive testbeds, to identify still existing gaps and to turn some services into components of a convincing and stable infrastructure. This step is urgently needed to convince even more institutions to invest in FDO technology and therefore to increase FAIRness of the evolving global data space

    FAIR Digital Object Demonstrators 2021

    Get PDF
    This paper gives a summary of implementation activities in the realm of FAIR Digital Objects (FDO). It gives an idea which software components are robust and used for many years, which components are comparatively new and are being tested out in pilot projects and what the challenges are that need to be urgently addressed by the FDO community. After basically only one year of advancing the FDO specifications by the FDO Forum we can recognise an increasing momentum to test and integrate essential FDO components. However, many developments still occur as soloistic engagements that offer a scattered picture. It is widely agreed that it is now time to combine these different pilots to comprehensive testbeds, to identify still existing gaps and to turn some services into components of a convincing and stable infrastructure. This step is urgently needed to convince even more institutions to invest in FDO technology and therefore to increase FAIRness of the evolving global data space

    Literatur

    No full text
    corecore