34 research outputs found

    Data harmonization in PET imaging

    Get PDF
    Medical imaging physics has advanced a lot in recent years, providing clinicians and researchers with increasingly detailed images that are well suited to be analyzed with a quantitative approach typical of hard sciences, based on measurements and analysis of clinical interest quantities extracted from images themselves. Such an approach is placed in the context of quantitative imaging. The possibility of sharing data quickly, the development of machine learning and data mining techniques, the increasing availability of computational power and digital data storage which characterize this age constitute a great opportunity for quantitative imaging studies. The interest in large multicentric databases that gather images from single research centers is growing year after year. Big datasets offer very interesting research perspectives, primarily because they allow to increase statistical power of studies. At the same time, they raised a compatibility issue between data themselves. Indeed images acquired with different scanners and protocols could be very different about quality and measures extracted from images with different quality might be not compatible with each other. Harmonization techniques have been developed to circumvent this problem. Harmonization refers to all efforts to combine data from different sources and provide users with a comparable view of data from different studies. Harmonization can be done before acquiring data, by choosing a-priori appropriate acquisition protocols through a preliminary joint effort between research centers, or it can be done a-posteriori i.e. images are grouped into a single dataset and then any effects on measures caused by technical acquisition factors are removed. Although the a-priori harmonization guarantees best results, it is not often used for practical and/or technical reasons. In this thesis I will focus on a-posteriori harmonization. It is important to note that when we consider multicentric studies, in addition to the technical variability related to scanners and acquisition protocols, there may be a demographic variability that makes single centers samples not statistically equivalent to each other. The wide individual variability that characterize human beings, even more pronounced when patients are enrolled from very different geographical areas, can certainly exacerbate this issue. In addition, we must consider that biological processes are complex phenomena: quantitative imaging measures can be affected by numerous confounding demographic variables even apparently unrelated to measures themselves. A good harmonization method should be able to preserve inter-individual variability and remove at the same time all the effects due acquisition technical factors. Heterogene ity in acquisition together with a great inter-individual variability make harmonization very hard to achieve. Harmonization methods currently used in literature are able to preserve only the inter-subjects variability described by a set of known confounding variables, while all the unknown confounding variables are wrongly removed. This might lead to incorrect harmonization, especially if the unknown confounders play an important role. This issue is emphasized in practice, as sometimes happens that demographic variables that are known to play a major role are unknown. The final goal of my thesis is a proposal for an harmonization method developed in the context of amyloid Positron Emission Tomography (PET) which aim to remove the effects of variability induced by technical factors and at the same time are able to keep all the inter-individual differences. Since knowing all the demographic confounders is almost impossible, both practically and a theoretically, my proposal does not require the knowledge of these variables. The main point is to characterize image quality through a set of quality measures evaluated in regions of interest (ROIs) which are required to be as independent as possible from anatomical and clinical variability in order to exclusively highlight the effect of technical factors on images texture. Ideally, this allows to decouple the between-subjects variability from the technical ones: the latter can be directly removed while the former is automatically preserved. Specifically, I defined and validated 3 quality measures based on images texture properties. In addition I used a quality metric already existing, and I considered the reconstruction matrix dimension to take into account image resolution. My work has been performed using a multicentric dataset consisting of 1001 amyloid PET images. Before dealing specifically with harmonization, I handled some important issues: I built a relational database to organize and manage data and then I developed an automated algorithm for images pre-processing to achieve registration and quantification. This work might also be used in other imaging contexts: in particular I believe it could be applied in fluorodeoxyglucose (FDG) PET and tau PET. The consequences of harmonization I developed have been explored at a preliminary level. My proposal should be considered as a starting point as I mainly dealt with the issues of quality measures, while the harmonization of the variables in itself was done with a linear regression model. Although harmonization through linear models is often used, more sophisticated techniques are present in literature. It would be interesting to combine them with my work. Further investigations would be desirable in future

    NeuroProv: Provenance data visualisation for neuroimaging analyses

    Get PDF
    © 2019 Elsevier Ltd Visualisation underpins the understanding of scientific data both through exploration and explanation of analysed data. Provenance strengthens the understanding of data by showing the process of how a result has been achieved. With the significant increase in data volumes and algorithm complexity, clinical researchers are struggling with information tracking, analysis reproducibility and the verification of scientific output. In addition, data coming from various heterogeneous sources with varying levels of trust in a collaborative environment adds to the uncertainty of the scientific outputs. This provides the motivation for provenance data capture and visualisation support for analyses. In this paper a system, NeuroProv is presented, to visualise provenance data in order to aid in the process of verification of scientific outputs, comparison of analyses, progression and evolution of results for neuroimaging analyses. The experimental results show the effectiveness of visualising provenance data for neuroimaging analyses

    Data provenance tracking as the basis for a biomedical virtual research environment

    Get PDF
    In complex data analyses it is increasingly important to capture information about the usage of data sets in addition to their preservation over time to ensure reproducibility of results, to verify the work of others and to ensure appropriate conditions data have been used for specific analyses. Scientific workflow based studies are beginning to realize the benefit of capturing this provenance of data and the activities used to process, transform and carry out studies on those data. This is especially true in biomedicine where the collection of data through experiment is costly and/or difficult to reproduce and where that data needs to be preserved over time. One way to support the development of workflows and their use in (collaborative) biomedical analyses is through the use of a Virtual Research Environment. The dynamic and distributed nature of Grid/Cloud computing, however, makes the capture and processing of provenance information a major research challenge. Furthermore most workflow provenance management services are designed only for data-flow oriented workflows and researchers are now realising that tracking data or workflows alone or separately is insufficient to support the scientific process. What is required for collaborative research is traceable and reproducible provenance support in a full orchestrated Virtual Research Environment (VRE) that enables researchers to define their studies in terms of the datasets and processes used, to monitor and visualize the outcome of their analyses and to log their results so that others users can call upon that acquired knowledge to support subsequent studies. We have extended the work carried out in the neuGRID and N4U projects in providing a so-called Virtual Laboratory to provide the foundation for a generic VRE in which sets of biomedical data (images, laboratory test results, patient records, epidemiological analyses etc.) and the workflows (pipelines) used to process those data, together with their provenance data and results sets are captured in the CRISTAL software. This paper outlines the functionality provided for a VRE by the Open Source CRISTAL software and examines how that can provide the foundations for a practice-based knowledge base for biomedicine and, potentially, for a wider research community

    The Deployment of an Enhanced Model-Driven Architecture for Business Process Management

    Full text link
    Business systems these days need to be agile to address the needs of a changing world. Business modelling requires business process management to be highly adaptable with the ability to support dynamic workflows, inter-application integration (potentially between businesses) and process reconfiguration. Designing systems with the in-built ability to cater for evolution is also becoming critical to their success. To handle change, systems need the capability to adapt as and when necessary to changes in users requirements. Allowing systems to be self-describing is one way to facilitate this. Using our implementation of a self-describing system, a so-called description-driven approach, new versions of data structures or processes can be created alongside older versions providing a log of changes to the underlying data schema and enabling the gathering of traceable (provenance) data. The CRISTAL software, which originated at CERN for handling physics data, uses versions of stored descriptions to define versions of data and workflows which can be evolved over time and thereby to handle evolving system needs. It has been customised for use in business applications as the Agilium-NG product. This paper reports on how the Agilium-NG software has enabled the deployment of an unique business process management solution that can be dynamically evolved to cater for changing user requirement.Comment: 11 pages, 4 figures, 1 table, 22nd International Database Engineering & Applications Symposium (IDEAS 2018). arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.5753, arXiv:1502.0154

    ARIANNA: A research environment for neuroimaging studies in autism spectrum disorders

    Get PDF
    The complexity and heterogeneity of Autism Spectrum Disorders (ASD) require the implementation of dedicated analysis techniques to obtain the maximum from the interrelationship among many variables that describe affected individuals, spanning from clinical phenotypic characterization and genetic profile to structural and functional brain images. The ARIANNA project has developed a collaborative interdisciplinary research environment that is easily accessible to the community of researchers working on ASD (https://arianna.pi.infn.it). The main goals of the project are: to analyze neuroimaging data acquired in multiple sites with multivariate approaches based on machine learning; to detect structural and functional brain characteristics that allow the distinguishing of individuals with ASD from control subjects; to identify neuroimaging-based criteria to stratify the population with ASD to support the future development of personalized treatments. Secure data handling and storage are guaranteed within the project, as well as the access to fast grid/cloud-based computational resources. This paper outlines the web-based architecture, the computing infrastructure and the collaborative analysis workflows at the basis of the ARIANNA interdisciplinary working environment. It also demonstrates the full functionality of the research platform. The availability of this innovative working environment for analyzing clinical and neuroimaging information of individuals with ASD is expected to support researchers in disentangling complex data thus facilitating their interpretation

    Data Infrastructure for Medical Research

    Get PDF
    While we are witnessing rapid growth in data across the sciences and in many applications, this growth is particularly remarkable in the medical domain, be it because of higher resolution instruments and diagnostic tools (e.g. MRI), new sources of structured data like activity trackers, the wide-spread use of electronic health records and many others. The sheer volume of the data is not, however, the only challenge to be faced when using medical data for research. Other crucial challenges include data heterogeneity, data quality, data privacy and so on. In this article, we review solutions addressing these challenges by discussing the current state of the art in the areas of data integration, data cleaning, data privacy, scalable data access and processing in the context of medical data. The techniques and tools we present will give practitioners — computer scientists and medical researchers alike — a starting point to understand the challenges and solutions and ultimately to analyse medical data and gain better and quicker insights

    MRI analysis for Hippocampus segmentation on a distributed infrastructure

    Get PDF
    Medical image computing raises new challenges due to the scale and the complexity of the required analyses. Medical image databases are currently available to supply clinical diagnosis. For instance, it is possible to provide diagnostic information based on an imaging biomarker comparing a single case to the reference group (controls or patients with disease). At the same time many sophisticated and computationally intensive algorithms have been implemented to extract useful information from medical images. Many applications would take great advantage by using scientific workflow technology due to its design, rapid implementation and reuse. However this technology requires a distributed computing infrastructure (such as Grid or Cloud) to be executed efficiently. One of the most used workflow manager for medical image processing is the LONI pipeline (LP), a graphical workbench developed by the Laboratory of Neuro Imaging (http://pipeline.loni.usc.edu). In this article we present a general approach to submit and monitor workflows on distributed infrastructures using LONI Pipeline, including European Grid Infrastructure (EGI) and Torque-based batch farm. In this paper we implemented a complete segmentation pipeline in brain magnetic resonance imaging (MRI). It requires time-consuming and data-intensive processing and for which reducing the computing time is crucial to meet clinical practice constraints. The developed approach is based on web services and can be used for any medical imaging application

    Large-scale analysis of neuroimaging data on commercial clouds with content-aware resource allocation strategies

    Get PDF
    The combined use of mice that have genetic mutations (transgenic mouse models) of human pathology and advanced neuroimaging methods (such as magnetic resonance imaging) has the potential to radically change how we approach disease understanding, diagnosis and treatment. Morphological changes occurring in the brain of transgenic animals as a result of the interaction between environment and genotype can be assessed using advanced image analysis methods, an effort described as ‘mouse brain phenotyping’. However, the computational methods involved in the analysis of high-resolution brain images are demanding. While running such analysis on local clusters is possible, not all users have access to such infrastructure and even for those that do, having additional computational capacity can be beneficial (e.g. to meet sudden high throughput demands). In this paper we use a commercial cloud platform for brain neuroimaging and analysis. We achieve a registration-based multi-atlas, multi-template anatomical segmentation, normally a lengthy-in-time effort, within a few hours. Naturally, performing such analyses on the cloud entails a monetary cost, and it is worthwhile identifying strategies that can allocate resources intelligently. In our context a critical aspect is the identification of how long each job will take. We propose a method that estimates the complexity of an image-processing task, a registration, using statistical moments and shape descriptors of the image content. We use this information to learn and predict the completion time of a registration. The proposed approach is easy to deploy, and could serve as an alternative for laboratories that may require instant access to large high-performance-computing infrastructures. To facilitate adoption from the community we publicly release the source code

    Facilitating evolution during design and implementation

    Get PDF
    The volumes and complexity of data that companies need to handle are increasing at an accelerating rate. In order to compete effectively and ensure their commercial sustainability, it is becoming crucial for them to achieve robust traceability in both their data and the evolving designs of their systems. This is addressed by the CRISTAL software which was originally developed at CERN by UWE, Bristol, for one of the particle detectors at the Large Hadron Collider, and has been subsequently transferred into the commercial world. Companies have been able to demonstrate increased agility, generate additional revenue, and improve the efficiency and cost-effectiveness with which they develop and implement systems in various areas, including business process management (BPM), healthcare and accounting applications. CRISTAL’s ability to manage data and its provenance at the terabyte scale, with full traceability over extended timescales, together with its description-driven approach, has provided the flexible adaptability required to future proof dynamically evolving software for these businesses
    corecore