Search CORE

39 research outputs found

Containers and Reproducibility in Scientific Research

Author: Apostal David
Apostal Sara Faraji Jalal
Marsh Ronald
Publication venue: UND Scholarly Commons
Publication date: 22/10/2018
Field of study

Numerical reproducibility has received increased emphasis in the scientific community. One reason that makes scientific research difficult to repeat is that different computing platforms calculate mathematical operations differently. Software containers have been shown to improve reproducibility in some instances and provide a convenient way to deploy applications in a variety of computing environments. However, there are software patterns or idioms that produce inconsistent results because mathematical operations are performed in different orders in different environments resulting in reproducibility errors. The performance of software in containers and the performance of software that improves numeric reproducibility may be of concern for some scientists. An existing algorithm for reproducible sum reduction was implemented, the runtime performance of this implementation was found to be between 0.3x and 0.5x the speed of the non-reproducible sum reduction. Finally, to evaluate the impact of using a container on performance, the runtime performance of the WRF (Weather Research Forecasting) package was tested and found to be 0.98x of the performance in a native Linux environment

Crossref

UND Scholarly Commons (University of North Dakota)

HPC-oriented Canonical Workflows for Machine Learning Applications in Climate and Weather Prediction

Author: Ahring Jessica
Baumann Peter
Campos Adrian Rojas
Escobar Otoniel José Campos
Gong Bing
Langguth Michael
Mozaffari Amirpasha
Nieters Pascal
Schultz Martin G.
Wittenbrink Martin
Publication venue: MIT Press
Publication date: 03/08/2022
Field of study

Machine learning (ML) applications in weather and climate are gaining momentum as big data and the immense increase in High-performance computing (HPC) power are paving the way. Ensuring FAIR data and reproducible ML practices are significant challenges for Earth system researchers. Even though the FAIR principle is well known to many scientists, research communities are slow to adopt them. Canonical Workflow Framework for Research (CWFR) provides a platform to ensure the FAIRness and reproducibility of these practices without overwhelming researchers. This conceptual paper envisions a holistic CWFR approach towards ML applications in weather and climate, focusing on HPC and big data. Specifically, we discuss Fair Digital Object (FDO) and Research Object (RO) in the DeepRain project to achieve granular reproducibility. DeepRain is a project that aims to improve precipitation forecast in Germany by using ML. Our concept envisages the raster datacube to provide data harmonization and fast and scalable data access. We suggest the Juypter notebook as a single reproducible experiment. In addition, we envision JuypterHub as a scalable and distributed central platform that connects all these elements and the HPC resources to the researchers via an easy-to-use graphical interface

KITopen

Data sharing of computer scientists: an analysis of current research information system data

Author: Rousi Antti Mikael
Publication venue
Publication date: 17/06/2021
Field of study

Without sufficient information about researchers data sharing, there is a risk of mismatching FAIR data service efforts with the needs of researchers. This study describes a methodology where departmental publications are used to analyse the ways in which computer scientists share research data. All journal articles published by researchers in the computer science department of the case studys university during 2019 were extracted for scrutiny from the current research information system. For these 193 articles, a coding framework was developed to capture the key elements of acquiring and sharing research data. Furthermore, a rudimentary classification of the main study types exhibited in the investigated articles was developed to accommodate the multidisciplinary nature of the case departments research agenda. Human interaction and intervention studies often collected original data, whereas research on novel computational methods and life sciences more frequently used openly available data. Articles that made data available for reuse were most often in life science studies, whereas data sharing was least frequent in human interaction studies. The use of open code was most frequent in life science studies and novel computational methods. The findings highlight that multidisciplinary research organisations may include diverse subfields that have their own cultures of data sharing, and suggest that research information system-based methods may be valuable additions to the questionnaire and interview methodologies eliciting insight into researchers data sharing. The collected data and coding framework are provided as open data to facilitate future research

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Methods Included:Standardizing Computational Reuse and Portability with the Common Workflow Language

Author: Abeln Sanne
Amstutz Peter
Chilton John
Crusoe Michael R.
Gavrilović Bogdan
Goble Carole A.
Iosup Alexandru
Ménager Hervé
Soiland-Reyes Stian
The CWL Community
Tijanić Nebojša
Publication venue
Publication date: 04/08/2021
Field of study

A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment. Published research that used multiple computer languages for their analysis pipelines would include a complete and reusable description of that analysis that is runnable on a diverse set of computing environments. Researchers would be able to easier collaborate and reuse these pipelines, adding or exchanging components regardless of programming language used; collaborations with and within the industry would be easier; approval of new medical interventions that rely on such pipelines would be faster. Time will be saved and environmental impact would also be reduced, as these descriptions contain enough information for advanced optimization without user intervention. Workflows are widely used in data analysis pipelines, enabling innovation and decision-making for the modern society. In many domains the analysis components are numerous and written in multiple different computer languages by third parties. However, lacking a standard for reusable and portable multilingual workflows, then reusing published multilingual workflows, collaborating on open problems, and optimizing their execution would be severely hampered. Moreover, only a standard for multilingual data analysis pipelines that was widely used would enable considerable benefits to research-industry collaboration, regulatory cost control, and to preserving the environment. Prior to the start of the CWL project, there was no standard for describing multilingual analysis pipelines in a portable and reusable manner. Even today / currently, although there exist hundreds of single-vendor and other single-source systems that run workflows, none is a general, community-driven, and consensus-built standard

arXiv.org e-Print Archive

VU Research Portal

CWI's Institutional Repository

The University of Manchester - Institutional Repository

Utrecht University Repository

HAL-Pasteur

International Migration, Integration and Social Cohesion online publications

UvA-DARE