2,051 research outputs found
Data provenance tracking as the basis for a biomedical virtual research environment
In complex data analyses it is increasingly important to capture information about the usage of data sets in addition to their preservation over time to ensure reproducibility of results, to verify the work of others and to ensure appropriate conditions data have been used for specific analyses. Scientific workflow based studies are beginning to realize the benefit of capturing this provenance of data and the activities used to process, transform and carry out studies on those data. This is especially true in biomedicine where the collection of data through experiment is costly and/or difficult to reproduce and where that data needs to be preserved over time. One way to support the development of workflows and their use in (collaborative) biomedical analyses is through the use of a Virtual Research Environment. The dynamic and distributed nature of Grid/Cloud computing, however, makes the capture and processing of provenance information a major research challenge. Furthermore most workflow provenance management services are designed only for data-flow oriented workflows and researchers are now realising that tracking data or workflows alone or separately is insufficient to support the scientific process. What is required for collaborative research is traceable and reproducible provenance support in a full orchestrated Virtual Research Environment (VRE) that enables researchers to define their studies in terms of the datasets and processes used, to monitor and visualize the outcome of their analyses and to log their results so that others users can call upon that acquired knowledge to support subsequent studies. We have extended the work carried out in the neuGRID and N4U projects in providing a so-called Virtual Laboratory to provide the foundation for a generic VRE in which sets of biomedical data (images, laboratory test results, patient records, epidemiological analyses etc.) and the workflows (pipelines) used to process those data, together with their provenance data and results sets are captured in the CRISTAL software. This paper outlines the functionality provided for a VRE by the Open Source CRISTAL software and examines how that can provide the foundations for a practice-based knowledge base for biomedicine and, potentially, for a wider research community
A proposal on leveraging workflow technology for building process aware visual analytics system.
Workflow analysis, conducted using both cognitive workflows and process workflows, has been employed to build and improve visual analytics systems. However, workflows and the visual analytics system have to date remained computationally separate. In this paper, we propose that workflow technology be leveraged to create process aware visual analytics systems. We argue that a process aware visual analytics system would be better able to support users, collect provenance information on user activity and track user decision pathways. This will enable visual analytics systems to become process
Enabling Interactive Analytics of Secure Data using Cloud Kotta
Research, especially in the social sciences and humanities, is increasingly
reliant on the application of data science methods to analyze large amounts of
(often private) data. Secure data enclaves provide a solution for managing and
analyzing private data. However, such enclaves do not readily support discovery
science---a form of exploratory or interactive analysis by which researchers
execute a range of (sometimes large) analyses in an iterative and collaborative
manner. The batch computing model offered by many data enclaves is well suited
to executing large compute tasks; however it is far from ideal for day-to-day
discovery science. As researchers must submit jobs to queues and wait for
results, the high latencies inherent in queue-based, batch computing systems
hinder interactive analysis. In this paper we describe how we have augmented
the Cloud Kotta secure data enclave to support collaborative and interactive
analysis of sensitive data. Our model uses Jupyter notebooks as a flexible
analysis environment and Python language constructs to support the execution of
arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing,
Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page
Accelerating Science: A Computing Research Agenda
The emergence of "big data" offers unprecedented opportunities for not only
accelerating scientific advances but also enabling new modes of discovery.
Scientific progress in many disciplines is increasingly enabled by our ability
to examine natural phenomena through the computational lens, i.e., using
algorithmic or information processing abstractions of the underlying processes;
and our ability to acquire, share, integrate and analyze disparate types of
data. However, there is a huge gap between our ability to acquire, store, and
process data and our ability to make effective use of the data to advance
discovery. Despite successful automation of routine aspects of data management
and analytics, most elements of the scientific process currently require
considerable human expertise and effort. Accelerating science to keep pace with
the rate of data acquisition and data processing calls for the development of
algorithmic or information processing abstractions, coupled with formal methods
and tools for modeling and simulation of natural processes as well as major
innovations in cognitive tools for scientists, i.e., computational tools that
leverage and extend the reach of human intellect, and partner with humans on a
broad range of tasks in scientific discovery (e.g., identifying, prioritizing
formulating questions, designing, prioritizing and executing experiments
designed to answer a chosen question, drawing inferences and evaluating the
results, and formulating new questions, in a closed-loop fashion). This calls
for concerted research agenda aimed at: Development, analysis, integration,
sharing, and simulation of algorithmic or information processing abstractions
of natural processes, coupled with formal methods and tools for their analyses
and simulation; Innovations in cognitive tools that augment and extend human
intellect and partner with humans in all aspects of science.Comment: Computing Community Consortium (CCC) white paper, 17 page
- …