94 research outputs found

    Parallel 3D Fast Wavelet Transform comparison on CPUs and GPUs

    Get PDF
    We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multicore CPUs and manycore GPUs. On the GPU side, we focus on CUDA and OpenCL programming to develop methods for an efficient mapping on manycores. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned techniques like tiling and blocking are exploited to optimize the use of memory. We evaluate these proposals and make a comparison between a new Fermi Tesla C2050 and an Intel Core 2 QuadQ6700. Speedups of the CUDA version are the best results, improving the execution times on CPU, ranging from 5.3x to 7.4x for different image sizes, and up to 81 times faster when communications are neglected. Meanwhile, OpenCL obtains solid gains which range from 2x factors on small frame sizes to 3x factors on larger ones

    Efficient architectures of heterogeneous fpga-gpu for 3-d medical image compression

    Get PDF
    The advent of development in three-dimensional (3-D) imaging modalities have generated a massive amount of volumetric data in 3-D images such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US). Existing survey reveals the presence of a huge gap for further research in exploiting reconfigurable computing for 3-D medical image compression. This research proposes an FPGA based co-processing solution to accelerate the mentioned medical imaging system. The HWT block implemented on the sbRIO-9632 FPGA board is Spartan 3 (XC3S2000) chip prototyping board. Analysis and performance evaluation of the 3-D images were been conducted. Furthermore, a novel architecture of context-based adaptive binary arithmetic coder (CABAC) is the advanced entropy coding tool employed by main and higher profiles of H.264/AVC. This research focuses on GPU implementation of CABAC and comparative study of discrete wavelet transform (DWT) and without DWT for 3-D medical image compression systems. Implementation results on MRI and CT images, showing GPU significantly outperforming single-threaded CPU implementation. Overall, CT and MRI modalities with DWT outperform in term of compression ratio, peak signal to noise ratio (PSNR) and latency compared with images without DWT process. For heterogeneous computing, MRI images with various sizes and format, such as JPEG and DICOM was implemented. Evaluation results are shown for each memory iteration, transfer sizes from GPU to CPU consuming more bandwidth or throughput. For size 786, 486 bytes JPEG format, both directions consumed bandwidth tend to balance. Bandwidth is relative to the transfer size, the larger sizing will take more latency and throughput. Next, OpenCL implementation for concurrent task via dedicated FPGA. Finding from implementation reveals, OpenCL on batch procession mode with AOC techniques offers substantial results where the amount of logic, area, register and memory increased proportionally to the number of batch. It is because of the kernel will copy the kernel block refer to batch number. Therefore memory bank increased periodically related to kernel block. It was found through comparative study that the tree balance and unroll loop architecture provides better achievement, in term of local memory, latency and throughput

    On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework

    Get PDF
    The astonishing development of diverse and different hardware platforms is twofold: on one side, the challenge for the exascale performance for big data processing and management; on the other side, the mobile and embedded devices for data collection and human machine interaction. This drove to a highly hierarchical evolution of programming models. GVirtuS is the general virtualization system developed in 2009 and firstly introduced in 2010 enabling a completely transparent layer among GPUs and VMs. This paper shows the latest achievements and developments of GVirtuS, now supporting CUDA 6.5, memory management and scheduling. Thanks to the new and improved remoting capabilities, GVirtus now enables GPU sharing among physical and virtual machines based on x86 and ARM CPUs on local workstations, computing clusters and distributed cloud appliances

    Compression of image sequences in interactive medical teleconsultations

    Get PDF
    Interactive medical teleconsultations are an important tool in the modern medical practice. Their applications include remote diagnostics, conferences, workshops and classes for students. In many cases standard medium or low-end machines are employed and the teleconsultation systems must be able to provide high quality of user experience with very limited resources. Particularly problematic are large datasets, consisting of image sequences, which need to be accessed fluently. The main issue is insufficient internal memory, therefore proper compression methods are crucial. However, a scenario where image sequences are kept in a compressed format in the internal memory and decompressed on-the-fly when displayed, is difficult to implement due to performance issues. In this paper we present methods for both lossy and lossless compression of medical image sequences, which require only compatibility with Pixel Shader 2.0 standard, which is present even on relatively old, low-end devices. Based on the evaluation of quality, size reduction and performance, the methods are proved to be suitable and beneficial for the medical teleconsultation applications

    A CPU-GPU Hybrid Approach for Accelerating Cross-correlation Based Strain Elastography

    Get PDF
    Elastography is a non-invasive imaging modality that uses ultrasound to estimate the elasticity of soft tissues. The resulting images are called 'elastograms'. Elastography techniques are promising as cost-effective tools in the early detection of pathological changes in soft tissues. The quality of elastographic images depends on the accuracy of the local displacement estimates. Cross-correlation based displacement estimators are precise and sensitive. However cross-correlation based techniques are computationally intense and may limit the use of elastography as a real-time diagnostic tool. This study investigates the use of parallel general purpose graphics processing unit (GPGPU) engines for speeding up generation of elastograms at real-time frame rates while preserving elastographic image quality. To achieve this goal, a cross-correlation based time-delay estimation algorithm was developed in C programming language and was profiled to locate performance blocks. The hotspots were addressed by employing software pipelining, read-ahead and eliminating redundant computations. The algorithm was then analyzed for parallelization on GPGPU and the stages that would map well to the GPGPU hardware were identified. By employing optimization principles for efficient memory access and efficient execution, a net improvement of 67x with respect to the original optimized C version of the estimator was achieved. For typical diagnostic depths of 3-4cm and elastographic processing parameters, this implementation can yield elastographic frame rates in the order of 50fps. It was also observed that all of the stages in elastography cannot be offloaded to the GPGPU for computation because some stages have sub-optimal memory access patterns. Additionally, data transfer from graphics card memory to system memory can be efficiently overlapped with concurrent CPU execution. Therefore a hybrid model of computation where computational load is optimally distributed between CPU and GPGPU was identified as an optimal approach to adequately tackle the speed-quality problem in real-time imaging. The results of this research suggest that use of GPGPU as a co-processor to CPU may allow generation of elastograms at real time frame rates without significant compromise in image quality, a scenario that could be very favorable in real-time clinical elastography

    Body of Knowledge for Graphics Processing Units (GPUs)

    Get PDF
    Graphics Processing Units (GPU) have emerged as a proven technology that enables high performance computing and parallel processing in a small form factor. GPUs enhance the traditional computer paradigm by permitting acceleration of complex mathematics and providing the capability to perform weighted calculations, such as those in artificial intelligence systems. Despite the performance enhancements provided by this type of microprocessor, there exist tradeoffs in regards to reliability and radiation susceptibility, which may impact mission success. This report provides an insight into GPU architecture and its potential applications in space and other similar markets. It also discusses reliability, qualification, and radiation considerations for testing GPUs

    New Image Processing Methods for Ultrasound Musculoskeletal Applications

    Get PDF
    In the past few years, ultrasound (US) imaging modalities have received increasing interest as diagnostic tools for orthopedic applications. The goal for many of these novel ultrasonic methods is to be able to create three-dimensional (3D) bone visualization non-invasively, safely and with high accuracy and spatial resolution. Availability of accurate bone segmentation and 3D reconstruction methods would help correctly interpreting complex bone morphology as well as facilitate quantitative analysis. However, in vivo ultrasound images of bones may have poor quality due to uncontrollable motion, high ultrasonic attenuation and the presence of imaging artifacts, which can affect the quality of the bone segmentation and reconstruction results. In this study, we investigate the use of novel ultrasonic processing methods that can significantly improve bone visualization, segmentation and 3D reconstruction in ultrasound volumetric data acquired in applications in vivo. Specifically, in this study, we investigate the use of new elastography-based, Doppler-based and statistical shape model-based methods that can be applied to ultrasound bone imaging applications with the overall major goal of obtaining fast yet accurate 3D bone reconstructions. This study is composed to three projects, which all have the potential to significantly contribute to this major goal. The first project deals with the fast and accurate implementation of correlation-based elastography and poroelastography techniques for real-time assessment of the mechanical properties of musculoskeletal tissues. The rationale behind this project is that, iii in the future, elastography-based features can be used to reduce false positives in ultrasonic bone segmentation methods based on the differences between the mechanical properties of soft tissues and the mechanical properties of hard tissues. In this study, a hybrid computation model is designed, implemented and tested to achieve real time performance without compromise in elastographic image quality . In the second project, a Power Doppler-based signal enhancement method is designed and tested with the intent of increasing the contrast between soft tissue and bone while suppressing the contrast between soft tissue and connective tissue, which is often a cause of false positives in ultrasonic bone segmentation problems. Both in-vitro and in-vivo experiments are performed to statistically analyze the performance of this method. In the third project, a statistical shape model based bone surface segmentation method is proposed and investigated. This method uses statistical models to determine if a curve detected in a segmented ultrasound image belongs to a bone surface or not. Both in-vitro and in-vivo experiments are performed to statistically analyze the performance of this method. I conclude this Dissertation with a discussion on possible future work in the field of ultrasound bone imaging and assessment
    corecore