72 research outputs found

    Sparse Matrix-Based HPC Tomography

    Full text link
    Tomographic imaging has benefited from advances in X-ray sources, detectors and optics to enable novel observations in science, engineering and medicine. These advances have come with a dramatic increase of input data in the form of faster frame rates, larger fields of view or higher resolution, so high performance solutions are currently widely used for analysis. Tomographic instruments can vary significantly from one to another, including the hardware employed for reconstruction: from single CPU workstations to large scale hybrid CPU/GPU supercomputers. Flexibility on the software interfaces and reconstruction engines are also highly valued to allow for easy development and prototyping. This paper presents a novel software framework for tomographic analysis that tackles all aforementioned requirements. The proposed solution capitalizes on the increased performance of sparse matrix-vector multiplication and exploits multi-CPU and GPU reconstruction over MPI. The solution is implemented in Python and relies on CuPy for fast GPU operators and CUDA kernel integration, and on SciPy for CPU sparse matrix computation. As opposed to previous tomography solutions that are tailor-made for specific use cases or hardware, the proposed software is designed to provide flexible, portable and high-performance operators that can be used for continuous integration at different production environments, but also for prototyping new experimental settings or for algorithmic development. The experimental results demonstrate how our implementation can even outperform state-of-the-art software packages used at advanced X-ray sources worldwide

    Multi-GPU Acceleration of Iterative X-ray CT Image Reconstruction

    Get PDF
    X-ray computed tomography is a widely used medical imaging modality for screening and diagnosing diseases and for image-guided radiation therapy treatment planning. Statistical iterative reconstruction (SIR) algorithms have the potential to significantly reduce image artifacts by minimizing a cost function that models the physics and statistics of the data acquisition process in X-ray CT. SIR algorithms have superior performance compared to traditional analytical reconstructions for a wide range of applications including nonstandard geometries arising from irregular sampling, limited angular range, missing data, and low-dose CT. The main hurdle for the widespread adoption of SIR algorithms in multislice X-ray CT reconstruction problems is their slow convergence rate and associated computational time. We seek to design and develop fast parallel SIR algorithms for clinical X-ray CT scanners. Each of the following approaches is implemented on real clinical helical CT data acquired from a Siemens Sensation 16 scanner and compared to the straightforward implementation of the Alternating Minimization (AM) algorithm of O’Sullivan and Benac [1]. We parallelize the computationally expensive projection and backprojection operations by exploiting the massively parallel hardware architecture of 3 NVIDIA TITAN X Graphical Processing Unit (GPU) devices with CUDA programming tools and achieve an average speedup of 72X over a straightforward CPU implementation. We implement a multi-GPU based voxel-driven multislice analytical reconstruction algorithm called Feldkamp-Davis-Kress (FDK) [2] and achieve an average overall speedup of 1382X over the baseline CPU implementation by using 3 TITAN X GPUs. Moreover, we propose a novel adaptive surrogate-function based optimization scheme for the AM algorithm, resulting in more aggressive update steps in every iteration. On average, we double the convergence rate of our baseline AM algorithm and also improve image quality by using the adaptive surrogate function. We extend the multi-GPU and adaptive surrogate-function based acceleration techniques to dual-energy reconstruction problems as well. Furthermore, we design and develop a GPU-based deep Convolutional Neural Network (CNN) to denoise simulated low-dose X-ray CT images. Our experiments show significant improvements in the image quality with our proposed deep CNN-based algorithm against some widely used denoising techniques including Block Matching 3-D (BM3D) and Weighted Nuclear Norm Minimization (WNNM). Overall, we have developed novel fast, parallel, computationally efficient methods to perform multislice statistical reconstruction and image-based denoising on clinically-sized datasets

    DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives

    Full text link
    We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU).Comment: LDAV 2018, October 201

    A Streaming Multi-GPU Implementation of Image Simulation Algorithms for Scanning Transmission Electron Microscopy

    Full text link
    Simulation of atomic resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. Here we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000x for PRISM and 30x for multislice are achieved relative to traditional multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic, using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic

    Multiple dataset visualization (MDV) framework for scalar volume data

    Get PDF
    Many applications require comparative analysis of multiple datasets representing different samples, conditions, time instants, or views in order to develop a better understanding of the scientific problem/system under consideration. One effective approach for such analysis is visualization of the data. In this PhD thesis, we propose an innovative multiple dataset visualization (MDV) approach in which two or more datasets of a given type are rendered concurrently in the same visualization. MDV is an important concept for the cases where it is not possible to make an inference based on one dataset, and comparisons between many datasets are required to reveal cross-correlations among them. The proposed MDV framework, which deals with some fundamental issues that arise when several datasets are visualized together, follows a multithreaded architecture consisting of three core components, data preparation/loading, visualization and rendering. The visualization module - the major focus of this study, currently deals with isosurface extraction and texture-based rendering techniques. For isosurface extraction, our all-in-memory approach keeps datasets under consideration and the corresponding geometric data in the memory. Alternatively, the only-polygons- or points-in-memory only keeps the geometric data in memory. To address the issues related to storage and computation, we develop adaptive data coherency and multiresolution schemes. The inter-dataset coherency scheme exploits the similarities among datasets to approximate the portions of isosurfaces of datasets using the isosurface of one or more reference datasets whereas the intra/inter-dataset multiresolution scheme processes the selected portions of each data volume at varying levels of resolution. The graphics hardware-accelerated approaches adopted for MDV include volume clipping, isosurface extraction and volume rendering, which use 3D textures and advanced per fragment operations. With appropriate user-defined threshold criteria, we find that various MDV techniques maintain a linear time-N relationship, improve the geometry generation and rendering time, and increase the maximum N that can be handled (N: number of datasets). Finally, we justify the effectiveness and usefulness of the proposed MDV by visualizing 3D scalar data (representing electron density distributions in magnesium oxide and magnesium silicate) from parallel quantum mechanical simulation

    Efficient Computation of K-Nearest Neighbor Graphs for Large High-Dimensional Data Sets on GPU Clusters

    Get PDF
    The k-Nearest Neighbor Graph (k-NNG) and the related k-Nearest Neighbor (k-NN) methods have a wide variety of applications in areas such as bioinformatics, machine learning, data mining, clustering analysis, and pattern recognition. Our application of interest is manifold embedding. Due to the large dimensionality of the input data (\u3c15k), spatial subdivision based techniques such OBBs, k-d tree, BSP etc., are not viable. The only alternative is the brute-force search, which has two distinct parts. The first finds distances between individual vectors in the corpus based on a pre-defined metric. Given the distance matrix, the second step selects k nearest neighbors for each member of the query data set. This thesis presents the development and implementation of a distributed exact k-Nearest Neighbor Graph (k-NNG) construction method. The proposed method uses Graphics Processing Units (GPUs) and exploits multiple levels of parallelism for distributed computational systems using GPUs. It is scalable for different cluster sizes, with each compute node in the cluster containing multiple GPUs. The distance computation is formulated as a basic matrix multiplication and reduction operation. The optimized CUBLAS matrix multiplication library is used for this purpose. Various distance metrics such as Euclidian, cosine, and Pearson are supported. For k-NNG construction, two different methods are presented. The first is based on an approach called batch index sorting to build the k-NNG with three sorting operations. This method uses the optimized radix sort implementation in the Thrust library for GPU. The second is an efficient implementation using the latest GPU functionalities of a variant of the quick select algorithm. Overall, the batch index sorting based k-NNG method is approximately 13x faster than a distributed MATLAB implementation. The quick select algorithm itself has a 5x speedup over state-of-the art GPU methods. This has enabled the processing of k-NNG construction on a data set containing 20 million image vectors, each with dimension 15,000, as part of a manifold embedding technique for analyzing the conformations of biomolecules

    eXplainable data processing

    Get PDF
    Seminario realizado en U & P U Patel Department of Computer Engineering, Chandubhai S. Patel Institute of Technology, Charotar University of Science And Technology (CHARUSAT), Changa-388421, Gujarat, India 2021[EN]Deep Learning y has created many new opportunities, it has unfortunately also become a means for achieving ill-intentioned goals. Fake news, disinformation campaigns, and manipulated images and videos have plagued the internet which has had serious consequences on our society. The myriad of information available online means that it may be difficult to distinguish between true and fake news, leading many users to unknowingly share fake news, contributing to the spread of misinformation. The use of Deep Learning to create fake images and videos has become known as deepfake. This means that there are ever more effective and realistic forms of deception on the internet, making it more difficult for internet users to distinguish reality from fictio

    Computational Methods and Graphical Processing Units for Real-time Control of Tomographic Adaptive Optics on Extremely Large Telescopes.

    Get PDF
    Ground based optical telescopes suffer from limited imaging resolution as a result of the effects of atmospheric turbulence on the incoming light. Adaptive optics technology has so far been very successful in correcting these effects, providing nearly diffraction limited images. Extremely Large Telescopes will require more complex Adaptive Optics configurations that introduce the need for new mathematical models and optimal solvers. In addition, the amount of data to be processed in real time is also greatly increased, making the use of conventional computational methods and hardware inefficient, which motivates the study of advanced computational algorithms, and implementations on parallel processors. Graphical Processing Units (GPUs) are massively parallel processors that have so far demonstrated a very high increase in speed compared to CPUs and other devices, and they have a high potential to meet the real-time restrictions of adaptive optics systems. This thesis focuses on the study and evaluation of existing proposed computational algorithms with respect to computational performance, and their implementation on GPUs. Two basic methods, one direct and one iterative are implemented and tested and the results presented provide an evaluation of the basic concept upon which other algorithms are based, and demonstrate the benefits of using GPUs for adaptive optics

    Extracting the Structure and Conformations of Biological Entities from Large Datasets

    Get PDF
    In biology, structure determines function, which often proceeds via changes in conformation. Efficient means for determining structure exist, but mapping conformations continue to present a serious challenge. Single-particles approaches, such as cryogenic electron microscopy (cryo-EM) and emerging diffract & destroy X-ray techniques are, in principle, ideally positioned to overcome these challenges. But the algorithmic ability to extract information from large heterogeneous datasets consisting of unsorted snapshots - each emanating from an unknown orientation of an object in an unknown conformation - remains elusive. It is the objective of this thesis to describe and validate a powerful suite of manifold-based algorithms able to extract structural and conformational information from large datasets. These computationally efficient algorithms offer a new approach to determining the structure and conformations of viruses and macromolecules. After an introduction, we demonstrate a distributed, exact k-Nearest Neighbor Graph (k-NNG) construction method, in order to establish a firm algorithmic basis for manifold-based analysis. The proposed algorithm uses Graphics Processing Units (GPUs) and exploits multiple levels of parallelism in distributed computational environment and it is scalable for different cluster sizes, with each compute node in the cluster containing multiple GPUs. Next, we present applications of manifold-based analysis in determining structure and conformational variability. Using the Diffusion Map algorithm, a new approach is presented, which is capable of determining structure of symmetric objects, such as viruses, to 1/100th of the object diameter, using low-signal diffraction snapshots. This is demonstrated by means of a successful 3D reconstruction of the Satellite Tobacco Necrosis Virus (STNV) to atomic resolution from simulated diffraction snapshots with and without noise. We next present a new approach for determining discrete conformational changes of the enzyme Adenylate kinase (ADK) from very large datasets of up to 20 million snapshots, each with ~104 pixels. This exceeds by an order of magnitude the largest dataset previously analyzed. Finally, we present a theoretical framework and an algorithmic pipeline for capturing continuous conformational changes of the ribosome from ultralow-signal (-12dB) experimental cryo-EM. Our analysis shows a smooth, concerted change in molecular structure in two-dimensional projection, which might be indicative of the way the ribosome functions as a molecular machine. The thesis ends with a summary and future prospects

    Novel high performance techniques for high definition computer aided tomography

    Get PDF
    Mención Internacional en el título de doctorMedical image processing is an interdisciplinary field in which multiple research areas are involved: image acquisition, scanner design, image reconstruction algorithms, visualization, etc. X-Ray Computed Tomography (CT) is a medical imaging modality based on the attenuation suffered by the X-rays as they pass through the body. Intrinsic differences in attenuation properties of bone, air, and soft tissue result in high-contrast images of anatomical structures. The main objective of CT is to obtain tomographic images from radiographs acquired using X-Ray scanners. The process of building a 3D image or volume from the 2D radiographs is known as reconstruction. One of the latest trends in CT is the reduction of the radiation dose delivered to patients through the decrease of the amount of acquired data. This reduction results in artefacts in the final images if conventional reconstruction methods are used, making it advisable to employ iterative reconstruction algorithms. There are numerous reconstruction algorithms available, from which we can highlight two specific types: traditional algorithms, which are fast but do not enable the obtaining of high quality images in situations of limited data; and iterative algorithms, slower but more reliable when traditional methods do not reach the quality standard requirements. One of the priorities of reconstruction is the obtaining of the final images in near real time, in order to reduce the time spent in diagnosis. To accomplish this objective, new high performance techniques and methods for accelerating these types of algorithms are needed. This thesis addresses the challenges of both traditional and iterative reconstruction algorithms, regarding acceleration and image quality. One common approach for accelerating these algorithms is the usage of shared-memory and heterogeneous architectures. In this thesis, we propose a novel simulation/reconstruction framework, namely FUX-Sim. This framework follows the hypothesis that the development of new flexible X-ray systems can benefit from computer simulations, which may also enable performance to be checked before expensive real systems are implemented. Its modular design abstracts the complexities of programming for accelerated devices to facilitate the development and evaluation of the different configurations and geometries available. In order to obtain near real execution times, low-level optimizations for the main components of the framework are provided for Graphics Processing Unit (GPU) architectures. Other alternative tackled in this thesis is the acceleration of iterative reconstruction algorithms by using distributed memory architectures. We present a novel architecture that unifies the two most important computing paradigms for scientific computing nowadays: High Performance Computing (HPC). The proposed architecture combines Big Data frameworks with the advantages of accelerated computing. The proposed methods presented in this thesis provide more flexible scanner configurations as they offer an accelerated solution. Regarding performance, our approach is as competitive as the solutions found in the literature. Additionally, we demonstrate that our solution scales with the size of the problem, enabling the reconstruction of high resolution images.El procesamiento de imágenes médicas es un campo interdisciplinario en el que participan múltiples áreas de investigación como la adquisición de imágenes, diseño de escáneres, algoritmos de reconstrucción de imágenes, visualización, etc. La tomografía computarizada (TC) de rayos X es una modalidad de imágen médica basada en el cálculo de la atenuación sufrida por los rayos X a medida que pasan por el cuerpo a escanear. Las diferencias intrínsecas en la atenuación de hueso, aire y tejido blando dan como resultado imágenes de alto contraste de estas estructuras anatómicas. El objetivo principal de la TC es obtener imágenes tomográficas a partir estas radiografías obtenidas mediante escáneres de rayos X. El proceso de construir una imagen o volumen en 3D a partir de las radiografías 2D se conoce como reconstrucción. Una de las últimas tendencias en la tomografía computarizada es la reducción de la dosis de radiación administrada a los pacientes a través de la reducción de la cantidad de datos adquiridos. Esta reducción da como resultado artefactos en las imágenes finales si se utilizan métodos de reconstrucción convencionales, por lo que es aconsejable emplear algoritmos de reconstrucción iterativos. Existen numerosos algoritmos de reconstrucción disponibles a partir de los cuales podemos destacar dos categorías: algoritmos tradicionales, rápidos pero no permiten obtener imágenes de alta calidad en situaciones en las que los datos son limitados; y algoritmos iterativos, más lentos pero más estables en situaciones donde los métodos tradicionales no alcanzan los requisitos en cuanto a la calidad de la imagen. Una de las prioridades de la reconstrucción es la obtención de las imágenes finales en tiempo casi real, con el fin de reducir el tiempo de diagnóstico. Para lograr este objetivo, se necesitan nuevas técnicas y métodos de alto rendimiento para acelerar estos algoritmos. Esta tesis aborda los desafíos de los algoritmos de reconstrucción tradicionales e iterativos, con respecto a la aceleración y la calidad de imagen. Un enfoque común para acelerar estos algoritmos es el uso de arquitecturas de memoria compartida y heterogéneas. En esta tesis, proponemos un nuevo sistema de simulación/reconstrucción, llamado FUX-Sim. Este sistema se construye alrededor de la hipótesis de que el desarrollo de nuevos sistemas de rayos X flexibles puede beneficiarse de las simulaciones por computador, en los que también se puede realizar un control del rendimiento de los nuevos sistemas a desarrollar antes de su implementación física. Su diseño modular abstrae las complejidades de la programación para aceleradores con el objetivo de facilitar el desarrollo y la evaluación de las diferentes configuraciones y geometrías disponibles. Para obtener ejecuciones en casi tiempo real, se proporcionan optimizaciones de bajo nivel para los componentes principales del sistema en las arquitecturas GPU. Otra alternativa abordada en esta tesis es la aceleración de los algoritmos de reconstrucción iterativa mediante el uso de arquitecturas de memoria distribuidas. Presentamos una arquitectura novedosa que unifica los dos paradigmas informáticos más importantes en la actualidad: computación de alto rendimiento (HPC) y Big Data. La arquitectura propuesta combina sistemas Big Data con las ventajas de los dispositivos aceleradores. Los métodos propuestos presentados en esta tesis proporcionan configuraciones de escáner más flexibles y ofrecen una solución acelerada. En cuanto al rendimiento, nuestro enfoque es tan competitivo como las soluciones encontradas en la literatura. Además, demostramos que nuestra solución escala con el tamaño del problema, lo que permite la reconstrucción de imágenes de alta resolución.This work has been mainly funded thanks to a FPU fellowship (FPU14/03875) from the Spanish Ministry of Education. It has also been partially supported by other grants: • DPI2016-79075-R. “Nuevos escenarios de tomografía por rayos X”, from the Spanish Ministry of Economy and Competitiveness. • TIN2016-79637-P Towards unification of HPC and Big Data Paradigms from the Spanish Ministry of Economy and Competitiveness. • Short-term scientific missions (STSM) grant from NESUS COST Action IC1305. • TIN2013-41350-P, Scalable Data Management Techniques for High-End Computing Systems from the Spanish Ministry of Economy and Competitiveness. • RTC-2014-3028-1 NECRA Nuevos escenarios clinicos con radiología avanzada from the Spanish Ministry of Economy and Competitiveness.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: José Daniel García Sánchez.- Secretario: Katzlin Olcoz Herrero.- Vocal: Domenico Tali
    corecore