307 research outputs found
Active provenance for data intensive research
The role of provenance information in data-intensive research is a significant topic of
discussion among technical experts and scientists. Typical use cases addressing traceability,
versioning and reproducibility of the research findings are extended with more
interactive scenarios in support, for instance, of computational steering and results
management. In this thesis we investigate the impact that lineage records can have on
the early phases of the analysis, for instance performed through near-real-time systems
and Virtual Research Environments (VREs) tailored to the requirements of a specific
community. By positioning provenance at the centre of the computational research
cycle, we highlight the importance of having mechanisms at the data-scientistsâ side
that, by integrating with the abstractions offered by the processing technologies, such
as scientific workflows and data-intensive tools, facilitate the expertsâ contribution to
the lineage at runtime. Ultimately, by encouraging tuning and use of provenance for
rapid feedback, the thesis aims at improving the synergy between different user groups
to increase productivity and understanding of their processes.
We present a model of provenance, called S-PROV, that uses and further extends
PROV and ProvONE. The relationships and properties characterising the workflowâs
abstractions and their concrete executions are re-elaborated to include aspects related
to delegation, distribution and steering of stateful streaming operators. The model is
supported by the Active framework for tuneable and actionable lineage ensuring the
userâs engagement by fostering rapid exploitation. Here, concepts such as provenance
types, configuration and explicit state management allow users to capture complex
provenance scenarios and activate selective controls based on domain and user-defined
metadata. We outline how the traces are recorded in a new comprehensive system,
called S-ProvFlow, enabling different classes of consumers to explore the provenance
data with services and tools for monitoring, in-depth validation and comprehensive
visual-analytics. The work of this thesis will be discussed in the context of an existing
computational framework and the experience matured in implementing provenance-aware
tools for seismology and climate VREs. It will continue to evolve through
newly funded projects, thereby providing generic and user-centred solutions for data-intensive
research
Scientific Workflows: Moving Across Paradigms
Modern scientific collaborations have opened up the opportunity to solve complex problems that require both multidisciplinary expertise and large-scale computational experiments. These experiments typically consist of a sequence of processing steps that need to be executed on selected computing platforms. Execution poses a challenge, however, due to (1) the complexity and diversity of applications, (2) the diversity of analysis goals, (3) the heterogeneity of computing platforms, and (4) the volume and distribution of data. A common strategy to make these in silico experiments more manageable is to model them as workflows and to use a workflow management system to organize their execution. This article looks at the overall challenge posed by a new order of scientific experiments and the systems they need to be run on, and examines how this challenge can be addressed by workflows and workflow management systems. It proposes a taxonomy of workflow management system (WMS) characteristics, including aspects previously overlooked. This frames a review of prevalent WMSs used by the scientific community, elucidates their evolution to handle the challenges arising with the emergence of the âfourth paradigm,â and identifies research needed to maintain progress in this area
Repairing Innovation: A Study of Integrating AI in Clinical Care
Over the past two years, a multi-disciplinary team of clinicians and technologists associated with Duke University and Duke Health system have developed and implemented Sepsis Watch, a sociotechnical system combining an artificial intelligence (AI) deep learning model with new hospital protocols to raise the quality of sepsis treatment. Sepsis is a widespread and deadly condition that can develop from any infection and is one of the most common causes of death in hospitals. And while sepsis is treatable, it is notoriously difficult to diagnose consistently. This makes sepsis a prime candidate for AI-based interventions, where new approaches to patient data might raise levels of detection, treatment, and, ultimately, patient outcomes in the form of fewer deaths.As an application of AI, the deep learning model tends to eclipse the other parts of the system; in practice, Sepsis Watch is constituted by a complex combination of human labor and expertise, as well as technical and institutional infrastructures. This report brings into focus the critical role of human labor and organizational context in developing an effective clinical intervention by framing Sepsis Watch as a complex sociotechnical system, not just a machine learning model
- âŠ