307 research outputs found

    Active provenance for data intensive research

    Get PDF
    The role of provenance information in data-intensive research is a significant topic of discussion among technical experts and scientists. Typical use cases addressing traceability, versioning and reproducibility of the research findings are extended with more interactive scenarios in support, for instance, of computational steering and results management. In this thesis we investigate the impact that lineage records can have on the early phases of the analysis, for instance performed through near-real-time systems and Virtual Research Environments (VREs) tailored to the requirements of a specific community. By positioning provenance at the centre of the computational research cycle, we highlight the importance of having mechanisms at the data-scientists’ side that, by integrating with the abstractions offered by the processing technologies, such as scientific workflows and data-intensive tools, facilitate the experts’ contribution to the lineage at runtime. Ultimately, by encouraging tuning and use of provenance for rapid feedback, the thesis aims at improving the synergy between different user groups to increase productivity and understanding of their processes. We present a model of provenance, called S-PROV, that uses and further extends PROV and ProvONE. The relationships and properties characterising the workflow’s abstractions and their concrete executions are re-elaborated to include aspects related to delegation, distribution and steering of stateful streaming operators. The model is supported by the Active framework for tuneable and actionable lineage ensuring the user’s engagement by fostering rapid exploitation. Here, concepts such as provenance types, configuration and explicit state management allow users to capture complex provenance scenarios and activate selective controls based on domain and user-defined metadata. We outline how the traces are recorded in a new comprehensive system, called S-ProvFlow, enabling different classes of consumers to explore the provenance data with services and tools for monitoring, in-depth validation and comprehensive visual-analytics. The work of this thesis will be discussed in the context of an existing computational framework and the experience matured in implementing provenance-aware tools for seismology and climate VREs. It will continue to evolve through newly funded projects, thereby providing generic and user-centred solutions for data-intensive research

    Scientific Workflows: Moving Across Paradigms

    Get PDF
    Modern scientific collaborations have opened up the opportunity to solve complex problems that require both multidisciplinary expertise and large-scale computational experiments. These experiments typically consist of a sequence of processing steps that need to be executed on selected computing platforms. Execution poses a challenge, however, due to (1) the complexity and diversity of applications, (2) the diversity of analysis goals, (3) the heterogeneity of computing platforms, and (4) the volume and distribution of data. A common strategy to make these in silico experiments more manageable is to model them as workflows and to use a workflow management system to organize their execution. This article looks at the overall challenge posed by a new order of scientific experiments and the systems they need to be run on, and examines how this challenge can be addressed by workflows and workflow management systems. It proposes a taxonomy of workflow management system (WMS) characteristics, including aspects previously overlooked. This frames a review of prevalent WMSs used by the scientific community, elucidates their evolution to handle the challenges arising with the emergence of the “fourth paradigm,” and identifies research needed to maintain progress in this area

    Repairing Innovation: A Study of Integrating AI in Clinical Care

    Get PDF
    Over the past two years, a multi-disciplinary team of clinicians and technologists associated with Duke University and Duke Health system have developed and implemented Sepsis Watch, a sociotechnical system combining an artificial intelligence (AI) deep learning model with new hospital protocols to raise the quality of sepsis treatment. Sepsis is a widespread and deadly condition that can develop from any infection and is one of the most common causes of death in hospitals. And while sepsis is treatable, it is notoriously difficult to diagnose consistently. This makes sepsis a prime candidate for AI-based interventions, where new approaches to patient data might raise levels of detection, treatment, and, ultimately, patient outcomes in the form of fewer deaths.As an application of AI, the deep learning model tends to eclipse the other parts of the system; in practice, Sepsis Watch is constituted by a complex combination of human labor and expertise, as well as technical and institutional infrastructures. This report brings into focus the critical role of human labor and organizational context in developing an effective clinical intervention by framing Sepsis Watch as a complex sociotechnical system, not just a machine learning model
