19 research outputs found

    Factors of trust in data reuse

    Get PDF
    Purpose The purpose of this paper is to quantitatively examine factors of trust in data reuse from the reusers’ perspectives. Design/methodology/approach This study utilized a survey method to test the proposed hypotheses and to empirically evaluate the research model, which was developed to examine the relationship each factor of trust has with reusers’ actual trust during data reuse. Findings This study found that the data producer (H1) and data quality (H3) were significant, as predicted, while scholarly community (H3) and data intermediary (H4) were not significantly related to reusers’ trust in data. Research limitations/implications Further disciplinary specific examinations should be conducted to complement the study findings and fully generalize the study findings. Practical implications The study finding presents the need for engaging data producers in the process of data curation, preferably beginning in the early stages and encouraging them to work with curation professionals to ensure data management quality. The study finding also suggests the need for re-defining the boundaries of current curation work or collaborating with other professionals who can perform data quality assessment that is related to scientific and methodological rigor. Originality/value By analyzing theoretical concepts in empirical research and validating the factors of trust, this study fills this gap in the data reuse literature

    Capturing the silences in digital archaeological knowledge

    Get PDF
    The availability and accessibility of digital data are increasingly significant in the creation of archaeological knowledge with, for example, multiple datasets being brought together to perform extensive analyses that would not otherwise be possible. However, this makes capturing the silences in those data—what is absent as well as present, what is unknown as well as what is known—a critical challenge for archaeology in terms of the suitability and appropriateness of data for subsequent reuse. This paper reverses the usual focus on knowledge and considers the role of ignorance—the lack of knowledge, or nonknowledge—in archaeological data and knowledge creation. Examining aspects of archaeological practice in the light of different dimensions of ignorance, it proposes ways in which the silences, the range of unknowns, can be addressed within a digital environment and the benefits which may accrue

    A provenance-based semantic approach to support understandability, reproducibility, and reuse of scientific experiments

    Get PDF
    Understandability and reproducibility of scientific results are vital in every field of science. Several reproducibility measures are being taken to make the data used in the publications findable and accessible. However, there are many challenges faced by scientists from the beginning of an experiment to the end in particular for data management. The explosive growth of heterogeneous research data and understanding how this data has been derived is one of the research problems faced in this context. Interlinking the data, the steps and the results from the computational and non-computational processes of a scientific experiment is important for the reproducibility. We introduce the notion of end-to-end provenance management'' of scientific experiments to help scientists understand and reproduce the experimental results. The main contributions of this thesis are: (1) We propose a provenance modelREPRODUCE-ME'' to describe the scientific experiments using semantic web technologies by extending existing standards. (2) We study computational reproducibility and important aspects required to achieve it. (3) Taking into account the REPRODUCE-ME provenance model and the study on computational reproducibility, we introduce our tool, ProvBook, which is designed and developed to demonstrate computational reproducibility. It provides features to capture and store provenance of Jupyter notebooks and helps scientists to compare and track their results of different executions. (4) We provide a framework, CAESAR (CollAborative Environment for Scientific Analysis with Reproducibility) for the end-to-end provenance management. This collaborative framework allows scientists to capture, manage, query and visualize the complete path of a scientific experiment consisting of computational and non-computational steps in an interoperable way. We apply our contributions to a set of scientific experiments in microscopy research projects

    Understanding Legacy Workflows through Runtime Trace Analysis

    Get PDF
    abstract: When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due to the accessibility of dynamic languages. Unfortunately, ad-hoc workflows lack a structured description as provided by specialized management systems, making ad-hoc workflow maintenance and reuse difficult, and motivating the need for analysis methods. The analysis of ad-hoc workflows using compiler techniques does not address dynamic languages - a program has so few constrains that its behavior cannot be predicted. In contrast, workflow provenance tracking has had success using run-time techniques to record data. The aim of this work is to develop a new analysis method for extracting workflow structure at run-time, thus avoiding issues with dynamics. The method captures the dataflow of an ad-hoc workflow through its execution and abstracts it with a process for simplifying repetition. An instrumentation system first processes the workflow to produce an instrumented version, capable of logging events, which is then executed on an input to produce a trace. The trace undergoes dataflow construction to produce a provenance graph. The dataflow is examined for equivalent regions, which are collected into a single unit. The workflow is thus characterized in terms of its treatment of an input. Unlike other methods, a run-time approach characterizes the workflow's actual behavior; including elements which static analysis cannot predict (for example, code dynamically evaluated based on input parameters). This also enables the characterization of dataflow through external tools. The contributions of this work are: a run-time method for recording a provenance graph from an ad-hoc Python workflow, and a method to analyze the structure of a workflow from provenance. Methods are implemented in Python and are demonstrated on real world Python workflows. These contributions enable users to derive graph structure from workflows. Empowered by a graphical view, users can better understand a legacy workflow. This makes the wealth of legacy ad-hoc workflows accessible, enabling workflow reuse instead of investing time and resources into creating a workflow.Dissertation/ThesisMasters Thesis Computer Science 201

    Workflow models for heterogeneous distributed systems

    Get PDF
    The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

    The Social and Cultural Contexts of Historic Writing Practices

    Get PDF
    Writing is not just a set of systems for transcribing language and communicating meaning, but an important element of human practice, deeply embedded in the cultures where it is present and fundamentally interconnected with all other aspects of human life. The Social and Cultural Contexts of Historic Writing Practices explores these relationships in a number of different cultural contexts and from a range of disciplinary perspectives, including archaeological, anthropological and linguistic. It offers new ways of approaching the study of writing and integrating it into wider debates and discussions about culture, history and archaeology
    corecore