341 research outputs found
Providing traceability for neuroimaging analyses
IntroductionWith the increasingly digital nature of biomedical data and as the complexity of analyses in medical research increases, the need for accurate information capture, traceability and accessibility has become crucial to medical researchers in the pursuance of their research goals. Grid- or Cloud-based technologies, often based on so-called Service Oriented Architectures (SOA), are increasingly being seen as viable solutions for managing distributed data and algorithms in the bio-medical domain. For neuroscientific analyses, especially those centred on complex image analysis, traceability of processes and datasets is essential but up to now this has not been captured in a manner that facilitates collaborative study. Purpose and MethodFew examples exist, of deployed medical systems based on Grids that provide the traceability of research data needed to facilitate complex analyses and none have been evaluated in practice. Over the past decade, we have been working with mammographers, paediatricians and neuroscientists in three generations of projects to provide the data management and provenance services now required for 21st century medical research. This paper outlines the finding of a requirements study and a resulting system architecture for the production of services to support neuroscientific studies of biomarkers for Alzheimer’s Disease.ResultsThe paper proposes a software infrastructure and services that provide the foundation for such support. It introduces the use of the CRISTAL software to provide provenance management as one of a number of services delivered on a SOA, deployed to manage neuroimaging projects that have been studying biomarkers for Alzheimer’s disease. ConclusionsIn the neuGRID and N4U projects a Provenance Service has been delivered that captures and reconstructs the workflow information needed to facilitate researchers in conducting neuroimaging analyses. The software enables neuroscientists to track the evolution of workflows and datasets. It also tracks the outcomes of various analyses and provides provenance traceability throughout the lifecycle of their studies. As the Provenance Service has been designed to be generic it can be applied across the medical domain as a reusable tool for supporting medical researchers thus providing communities of researchers for the first time with the necessary tools to conduct widely distributed collaborative programmes of medical analysis
Designing Traceability into Big Data Systems
Providing an appropriate level of accessibility and traceability to data or
process elements (so-called Items) in large volumes of data, often
Cloud-resident, is an essential requirement in the Big Data era.
Enterprise-wide data systems need to be designed from the outset to support
usage of such Items across the spectrum of business use rather than from any
specific application view. The design philosophy advocated in this paper is to
drive the design process using a so-called description-driven approach which
enriches models with meta-data and description and focuses the design process
on Item re-use, thereby promoting traceability. Details are given of the
description-driven design of big data systems at CERN, in health informatics
and in business process management. Evidence is presented that the approach
leads to design simplicity and consequent ease of management thanks to loose
typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International
Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore
July 2015. arXiv admin note: text overlap with arXiv:1402.5764,
arXiv:1402.575
Data provenance tracking as the basis for a biomedical virtual research environment
In complex data analyses it is increasingly important to capture information about the usage of data sets in addition to their preservation over time to ensure reproducibility of results, to verify the work of others and to ensure appropriate conditions data have been used for specific analyses. Scientific workflow based studies are beginning to realize the benefit of capturing this provenance of data and the activities used to process, transform and carry out studies on those data. This is especially true in biomedicine where the collection of data through experiment is costly and/or difficult to reproduce and where that data needs to be preserved over time. One way to support the development of workflows and their use in (collaborative) biomedical analyses is through the use of a Virtual Research Environment. The dynamic and distributed nature of Grid/Cloud computing, however, makes the capture and processing of provenance information a major research challenge. Furthermore most workflow provenance management services are designed only for data-flow oriented workflows and researchers are now realising that tracking data or workflows alone or separately is insufficient to support the scientific process. What is required for collaborative research is traceable and reproducible provenance support in a full orchestrated Virtual Research Environment (VRE) that enables researchers to define their studies in terms of the datasets and processes used, to monitor and visualize the outcome of their analyses and to log their results so that others users can call upon that acquired knowledge to support subsequent studies. We have extended the work carried out in the neuGRID and N4U projects in providing a so-called Virtual Laboratory to provide the foundation for a generic VRE in which sets of biomedical data (images, laboratory test results, patient records, epidemiological analyses etc.) and the workflows (pipelines) used to process those data, together with their provenance data and results sets are captured in the CRISTAL software. This paper outlines the functionality provided for a VRE by the Open Source CRISTAL software and examines how that can provide the foundations for a practice-based knowledge base for biomedicine and, potentially, for a wider research community
Designing Reusable Systems that Can Handle Change - Description-Driven Systems : Revisiting Object-Oriented Principles
In the age of the Cloud and so-called Big Data systems must be increasingly
flexible, reconfigurable and adaptable to change in addition to being developed
rapidly. As a consequence, designing systems to cater for evolution is becoming
critical to their success. To be able to cope with change, systems must have
the capability of reuse and the ability to adapt as and when necessary to
changes in requirements. Allowing systems to be self-describing is one way to
facilitate this. To address the issues of reuse in designing evolvable systems,
this paper proposes a so-called description-driven approach to systems design.
This approach enables new versions of data structures and processes to be
created alongside the old, thereby providing a history of changes to the
underlying data models and enabling the capture of provenance data. The
efficacy of the description-driven approach is exemplified by the CRISTAL
project. CRISTAL is based on description-driven design principles; it uses
versions of stored descriptions to define various versions of data which can be
stored in diverse forms. This paper discusses the need for capturing holistic
system description when modelling large-scale distributed systems.Comment: 8 pages, 1 figure and 1 table. Accepted by the 9th Int Conf on the
Evaluation of Novel Approaches to Software Engineering (ENASE'14). Lisbon,
Portugal. April 201
Facilitating evolution during design and implementation
The volumes and complexity of data that companies need to handle are increasing at an accelerating rate. In order to compete effectively and ensure their commercial sustainability, it is becoming crucial for them to achieve robust traceability in both their data and the evolving designs of their systems. This is addressed by the CRISTAL software which was originally developed at CERN by UWE, Bristol, for one of the particle detectors at the Large Hadron Collider, and has been subsequently transferred into the commercial world. Companies have been able to demonstrate increased agility, generate additional revenue, and improve the efficiency and cost-effectiveness with which they develop and implement systems in various areas, including business process management (BPM), healthcare and accounting applications. CRISTAL’s ability to manage data and its provenance at the terabyte scale, with full traceability over extended timescales, together with its description-driven approach, has provided the flexible adaptability required to future proof dynamically evolving software for these businesses
Towards Provenance and Traceability in CRISTAL for HEP
This paper discusses the CRISTAL object lifecycle management system and its
use in provenance data management and the traceability of system events. This
software was initially used to capture the construction and calibration of the
CMS ECAL detector at CERN for later use by physicists in their data analysis.
Some further uses of CRISTAL in different projects (CMS, neuGRID and N4U) are
presented as examples of its flexible data model. From these examples,
applications are drawn for the High Energy Physics domain and some initial
ideas for its use in data preservation HEP are outlined in detail in this
paper. Currently investigations are underway to gauge the feasibility of using
the N4U Analysis Service or a derivative of it to address the requirements of
data and analysis logging and provenance capture within the HEP long term data
analysis environment.Comment: 5 pages and 1 figure. 20th International Conference on Computing in
High Energy and Nuclear Physics (CHEP13). 14-18th October 2013. Amsterdam,
Netherlands. To appear in Journal of Physics Conference Serie
NeuroProv: Provenance data visualisation for neuroimaging analyses
© 2019 Elsevier Ltd Visualisation underpins the understanding of scientific data both through exploration and explanation of analysed data. Provenance strengthens the understanding of data by showing the process of how a result has been achieved. With the significant increase in data volumes and algorithm complexity, clinical researchers are struggling with information tracking, analysis reproducibility and the verification of scientific output. In addition, data coming from various heterogeneous sources with varying levels of trust in a collaborative environment adds to the uncertainty of the scientific outputs. This provides the motivation for provenance data capture and visualisation support for analyses. In this paper a system, NeuroProv is presented, to visualise provenance data in order to aid in the process of verification of scientific outputs, comparison of analyses, progression and evolution of results for neuroimaging analyses. The experimental results show the effectiveness of visualising provenance data for neuroimaging analyses
- …