1,629 research outputs found

    Reproducible science: What, why, how

    Get PDF
    Most scientific papers are not reproducible: it is really hard, if not impossible, to understand how results are derived from data, and being able to regenerate them in the future (even by the same researchers). However, traceability and reproducibility of results are indispensable elements of highquality science, and an increasing requirement of many journals and funding sources. Reproducible studies include code able to regenerate results from the original data. This practice not only provides a perfect record of the whole analysis but also reduces the probability of errors and facilitates code reuse, thus accelerating scientific progress. But doing reproducible science also brings many benefits to the individual researcher, including saving time and effort, improved collaborations, and higher quality and impact of final publications. In this article we introduce reproducible science, why it is important, and how we can improve the reproducibility of our work. We introduce principles and tools for data management, analysis, version control, and software management that help us achieve reproducible workflows in the context of ecology.Peer Reviewe

    DRIVER Technology Watch Report

    Get PDF
    This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field

    Improving reproducibility and reuse of modelling results in the life sciences

    Get PDF
    Research results are complex and include a variety of heterogeneous data. This entails major computational challenges to (i) to manage simulation studies, (ii) to ensure model exchangeability, stability and validity, and (iii) to foster communication between partners. I describe techniques to improve the reproducibility and reuse of modelling results. First, I introduce a method to characterise differences in computational models. Second, I present approaches to obtain shareable and reproducible research results. Altogether, my methods and tools foster exchange and reuse of modelling results.Die verteilte Entwicklung von komplexen Simulationsstudien birgt eine große Zahl an informationstechnischen Herausforderungen: (i) Modelle mĂŒssen verwaltet werden; (ii) Reproduzierbarkeit, StabilitĂ€t und GĂŒltigkeit von Ergebnissen muss sichergestellt werden; und (iii) die Kommunikation zwischen Partnern muss verbessert werden. Ich stelle Techniken vor, um die Reproduzierbarkeit und Wiederverwendbarkeit von Modellierungsergebnissen zu verbessern. Meine Implementierungen wurden erfolgreich in internationalen Anwendungen integriert und fördern das Teilen von wissenschaftlichen Ergebnissen

    The ENIGMA Stroke Recovery Working Group: Big data neuroimaging to study brain–behavior relationships after stroke

    Get PDF
    The goal of the Enhancing Neuroimaging Genetics through Meta‐Analysis (ENIGMA) Stroke Recovery working group is to understand brain and behavior relationships using well‐powered meta‐ and mega‐analytic approaches. ENIGMA Stroke Recovery has data from over 2,100 stroke patients collected across 39 research studies and 10 countries around the world, comprising the largest multisite retrospective stroke data collaboration to date. This article outlines the efforts taken by the ENIGMA Stroke Recovery working group to develop neuroinformatics protocols and methods to manage multisite stroke brain magnetic resonance imaging, behavioral and demographics data. Specifically, the processes for scalable data intake and preprocessing, multisite data harmonization, and large‐scale stroke lesion analysis are described, and challenges unique to this type of big data collaboration in stroke research are discussed. Finally, future directions and limitations, as well as recommendations for improved data harmonization through prospective data collection and data management, are provided

    Empirical Standards for Software Engineering Research

    Full text link
    Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.Comment: For the complete standards, supplements and other resources, see https://github.com/acmsigsoft/EmpiricalStandard

    The Software Heritage

    Get PDF
    Software Heritage est la plus grande archive publique de code source et historique de dĂ©veloppement associĂ©, tel que capturĂ© par les systĂšmes modernes de contrĂŽle de version. C’est ainsi qu’en fĂ©vrier 2023, il a archivĂ© plus de 12 milliards de fichiers de code source uniques et 2 milliards d’engagements, issus de plus de 180 millions de dĂ©veloppement collaboratif. Dans ce chapitre, nous dĂ©crivons l’écosystĂšme du patrimoine logiciel, en mettant l’accent sur la recherche et la science ouverte

    I'll take that to go:Big data bags and minimal identifiers for exchange of large, complex datasets

    Get PDF
    Big data workflows often require the assembly and exchange of complex, multi-element datasets. For example, in biomedical applications, the input to an analytic pipeline can be a dataset consisting thousands of images and genome sequences assembled from diverse repositories, requiring a description of the contents of the dataset in a concise and unambiguous form. Typical approaches to creating datasets for big data workflows assume that all data reside in a single location, requiring costly data marshaling and permitting errors of omission and commission because dataset members are not explicitly specified. We address these issues by proposing simple methods and tools for assembling, sharing, and analyzing large and complex datasets that scientists can easily integrate into their daily workflows. These tools combine a simple and robust method for describing data collections (BDBags), data descriptions (Research Objects), and simple persistent identifiers (Minids) to create a powerful ecosystem of tools and services for big data analysis and sharing. We present these tools and use biomedical case studies to illustrate their use for the rapid assembly, sharing, and analysis of large datasets
    • 

    corecore