15 research outputs found
Towards easy and efficient processing of ultra-high resolution brain images
Ultra-high resolution 3D brain imaging is of great importance to the field of neuroscience as it provides a deep insight into brain anatomy and function. Such images may range between a few 100 gigabytes to terabytes in size and do not typically fit into computer memory. Lack of accessibility to the processing of these images is a threat to open science. This thesis aims to design a web system that will handle the storage and processing of ultra-high resolution neuroimaging data.
The system architecture uses technologies such as Hadoop Distributed File System and Apache Spark. For the seamless integration neuroimaging pipelines into our system, we adopted NIfTI as our distributed data format and require that all neuroimaging pipelines be described in common formats such as Boutiques or BIDS.
The large images are split into chunks, and also, recreated from the chunks. The effects of 2D slices and 3D blocks are investigated. Different algorithms to minimize number of seeks were designed and implemented. Results indicate that clustered reading of blocks achieves a significant reduction in processing time, and partitioning data into slices is most effective.
The scalability of processing large images with Spark using a simple non-containerized and containerized pipeline was investigated. It was found that processing time of both algorithms scale well. As data may need to be written to and read from disk for containerized pipeline processing, the speedup provided by Spark's in-memory computing was also investigated. In-memory computing was found to provide significant speedup, however, this speedup may be less significant in more compute-intensive pipelines
A new algorithm to split and merge ultra-high resolution 3D images
Splitting and merging ultra-high resolution 3D images is a requirement for parallel or distributed processing operations. Naive algorithms to split and merge 3D blocks from ultra-high resolution images perform very poorly, due to the number of seeks required to reconstruct spatially-adjacent blocks from linear data organizations on disk. The current solution to deal with this problem is to use file formats that preserve spatial proximity on disk, but this comes with additional complexity. We introduce a new algorithm called Multiple reads/writes to split and merge ultra-high resolution 3D images efficiently from simple file formats. Multiple reads/writes only access contiguous bytes in the reconstructed image, which leads to substantial performance improvements compared to existing algorithms. We parallelize our algorithm using multi-threading, which further improves the performance for data stored on a Hadoop cluster. We also show that on-the-fly lossless compression with the lz4 algorithm reduces the split and merge time further
Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework
The burgeoning growth of public domain data and the increasing complexity of
deep learning model architectures have underscored the need for more efficient
data representation and analysis techniques. This paper is motivated by the
work of (Helal, 2023) and aims to present a comprehensive overview of
tensorization. This transformative approach bridges the gap between the
inherently multidimensional nature of data and the simplified 2-dimensional
matrices commonly used in linear algebra-based machine learning algorithms.
This paper explores the steps involved in tensorization, multidimensional data
sources, various multiway analysis methods employed, and the benefits of these
approaches. A small example of Blind Source Separation (BSS) is presented
comparing 2-dimensional algorithms and a multiway algorithm in Python. Results
indicate that multiway analysis is more expressive. Contrary to the intuition
of the dimensionality curse, utilising multidimensional datasets in their
native form and applying multiway analysis methods grounded in multilinear
algebra reveal a profound capacity to capture intricate interrelationships
among various dimensions while, surprisingly, reducing the number of model
parameters and accelerating processing. A survey of the multi-away analysis
methods and integration with various Deep Neural Networks models is presented
using case studies in different application domains.Comment: 34 pages, 8 figures, 4 table
NiftyPET: A high-throughput software platform for high quantitative accuracy and precision PET imaging and analysis
We present a standalone, scalable and high-throughput software platform for PET
image reconstruction and analysis. We focus on high fidelity modelling of the acquisition processes
to provide high accuracy and precision quantitative imaging, especially for large axial field of
view scanners. All the core routines are implemented using parallel computing available from
within the Python package NiftyPET, enabling easy access, manipulation and visualisation of
data at any processing stage.
The pipeline of the platform starts from MR and raw PET input data and is divided into
the following processing stages: (1) list-mode data processing; (2) accurate attenuation coeffi-
cient map generation; (3) detector normalisation; (4) exact forward and back projection between
sinogram and image space; (5) estimation of reduced-variance random events; (6) high accuracy
fully 3D estimation of scatter events; (7) voxel-based partial volume correction; (8) region- and
voxel-level image analysis.
We demonstrate the advantages of this platform using an amyloid brain scan where all the
processing is executed from a single and uniform computational environment in Python. The high
accuracy acquisition modelling is achieved through span-1 (no axial compression) ray tracing
for true, random and scatter events. Furthermore, the platform offers uncertainty estimation
of any image derived statistic to facilitate robust tracking of subtle physiological changes in
longitudinal studies. The platform also supports the development of new reconstruction and
analysis algorithms through restricting the axial field of view to any set of rings covering a
region of interest and thus performing fully 3D reconstruction and corrections using real data
significantly faster. All the software is available as open source with the accompanying wiki-page
and test data
Enabling Scalable Neurocartography: Images to Graphs for Discovery
In recent years, advances in technology have enabled researchers to ask new questions predicated on the collection and analysis of big datasets that were previously too large to study. More specifically, many fundamental questions in neuroscience require studying brain tissue at a large scale to discover emergent properties of neural computation, consciousness, and etiologies of brain disorders. A major challenge is to construct larger, more detailed maps (e.g., structural wiring diagrams) of the brain, known as connectomes.
Although raw data exist, obstacles remain in both algorithm development and scalable image analysis to enable access to the knowledge within these data volumes. This dissertation develops, combines and tests state-of-the-art algorithms to estimate graphs and glean other knowledge across six orders of magnitude, from millimeter-scale magnetic resonance imaging to nanometer-scale electron microscopy.
This work enables scientific discovery across the community and contributes to the tools and services offered by NeuroData and the Open Connectome Project. Contributions include creating, optimizing and evaluating the first known fully-automated brain graphs in electron microscopy data and magnetic resonance imaging data; pioneering approaches to generate knowledge from X-Ray tomography imaging; and identifying and solving a variety of image analysis challenges associated with building graphs suitable for discovery. These methods were applied across diverse datasets to answer questions at scales not previously explored
The prognostic value of advanced MR in gliomas
This work examines the prognostic value of advanced MR at selected time points during the early stages of treatment in glioma patients. In this thesis, serial imaging of glioma patients was conducted using diffusion tensor imaging (DTI), dynamic contrast enhanced (DCE) and dynamic susceptibility contrast (DSC) MRI. A methodology for the processing and registration of multiparametric MRI was developed in order to simultaneously sample whole tumour measurements of multiple MR parameters with the same volume of interest.Differences between glioma grades were investigated using functional MR parameters and tested using Kruskal-Wallis tests. A 2-stage logistic regression model was developed to grade lesions from the preoperative MR, with the model retaining the apparent diffusion coefficient, radial diffusivity, anisotropic component of diffusion, vessel permeability and extravascular extracellular space parameters for glioma grading. A multi-echo single voxel spectroscopic sequence was independently investigated for the classification of gliomas into different grades.From preoperative MR, progression-free survival was predicted using the multiparametric MR data. Individual parameters were investigated using Kaplan-Meier survival analysis, before Cox regression modelling was used for a multiparametric analysis. Radial diffusivity, spin–lattice relaxation rate and blood volume fraction calculated from the DTI and DCE MRI were retained in the final model.MR parameter values were also investigated during the early stages of adjuvant treatment. Patients were scanned before and after chemoradiotherapy, with the change in MR parameters as well as the absolute values investigated for their prognostic information. Cox regression analysis was also performed for the adjuvant treatment imaging, with measures of the apparent diffusion coefficient, spin–lattice relaxation rate, vessel permeability and extravascular extracellular space, derived from the DTI and DCE datasets most predictive of progression-free survival.In conclusion, this thesis demonstrates multiparametric MR of gliomas during the early stages of treatment contains useful prognostic information relating to grade and progression-free survival interval
Proceedings of the 8th Python in Science conference
International audienceThe SciPy conference provides a unique opportunity to learn and affect what is happening in the realm of scientific computing with Python. Attendees have the opportunity to review the available tools and how they apply to specific problems. By providing a forum for developers to share their Python expertise with the wider commercial, academic, and research communities, this conference fosters collaboration and facilitates the sharing of software components, techniques and a vision for high level language use in scientific computing