76 research outputs found

    Resource-Constrained Optimizations For Synthetic Aperture Radar On-Board Image Processing

    Get PDF
    Synthetic Aperture Radar (SAR) can be used to create realistic and high-resolution 2D or 3D reconstructions of landscapes. The data capture is typically deployed using radar instruments in specially equipped, low flying planes, resulting in a large amount of raw data, which needs to be processed for image reconstruction. However, due to limited on-board processing capacities on the plane (power, size, weight, cooling, communication bandwidth to ground stations, etc.) and the need to capture many images during a single flight, the raw data must be processed on-board and then sent to the ground station efficiently as image products. In this paper we describe the processing architecture of the digital beamforming SAR (DBFSAR) of the German Areaospace Center (DLR) and the special steps that had to be taken to enable the on-board processing. We explain the required software optimizations and under which conditions their integration in the SAR imaging process leads to (near) real-time capability. We further describe the lessons learned in our work and discuss how they can be applied to other processing scenarios with limited resource availability

    PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

    Full text link
    High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

    High Performance Multiview Video Coding

    Get PDF
    Following the standardization of the latest video coding standard High Efficiency Video Coding in 2013, in 2014, multiview extension of HEVC (MV-HEVC) was published and brought significantly better compression performance of around 50% for multiview and 3D videos compared to multiple independent single-view HEVC coding. However, the extremely high computational complexity of MV-HEVC demands significant optimization of the encoder. To tackle this problem, this work investigates the possibilities of using modern parallel computing platforms and tools such as single-instruction-multiple-data (SIMD) instructions, multi-core CPU, massively parallel GPU, and computer cluster to significantly enhance the MVC encoder performance. The aforementioned computing tools have very different computing characteristics and misuse of the tools may result in poor performance improvement and sometimes even reduction. To achieve the best possible encoding performance from modern computing tools, different levels of parallelism inside a typical MVC encoder are identified and analyzed. Novel optimization techniques at various levels of abstraction are proposed, non-aggregation massively parallel motion estimation (ME) and disparity estimation (DE) in prediction unit (PU), fractional and bi-directional ME/DE acceleration through SIMD, quantization parameter (QP)-based early termination for coding tree unit (CTU), optimized resource-scheduled wave-front parallel processing for CTU, and workload balanced, cluster-based multiple-view parallel are proposed. The result shows proposed parallel optimization techniques, with insignificant loss to coding efficiency, significantly improves the execution time performance. This , in turn, proves modern parallel computing platforms, with appropriate platform-specific algorithm design, are valuable tools for improving the performance of computationally intensive applications

    Multi-GPU design and performance evaluation of homomorphic encryption on GPU clusters

    Get PDF
    We present a multi-GPU design, implementation and performance evaluation of the Halevi-Polyakov-Shoup (HPS) variant of the Fan-Vercauteren (FV) levelled Fully Homomorphic Encryption (FHE) scheme. Our design follows a data parallelism approach and uses partitioning methods to distribute the workload in FV primitives evenly across available GPUs. The design is put to address space and runtime requirements of FHE computations. It is also suitable for distributed-memory architectures, and includes efficient GPU-to-GPU data exchange protocols. Moreover, it is user-friendly as user intervention is not required for task decomposition, scheduling or load balancing. We implement and evaluate the performance of our design on two homogeneous and heterogeneous NVIDIA GPU clusters: K80, and a customized P100. We also provide a comparison with a recent shared-memory-based multi-core CPU implementation using two homomorphic circuits as workloads: vector addition and multiplication. Moreover, we use our multi-GPU Levelled-FHE to implement the inference circuit of two Convolutional Neural Networks (CNNs) to perform homomorphically image classification on encrypted images from the MNIST and CIFAR - 10 datasets. Our implementation provides 1 to 3 orders of magnitude speedup compared with the CPU implementation on vector operations. In terms of scalability, our design shows reasonable scalability curves when the GPUs are fully connected.This work is supported by A*STAR under its RIE2020 Advanced Manufacturing and Engineering (AME) Programmtic Programme (Award A19E3b0099).Peer ReviewedPostprint (author's final draft

    A CUDA-powered method for the feature extraction and unsupervised analysis of medical images

    Get PDF
    Funder: Università degli Studi di Milano - BicoccaAbstractImage texture extraction and analysis are fundamental steps in computer vision. In particular, considering the biomedical field, quantitative imaging methods are increasingly gaining importance because they convey scientifically and clinically relevant information for prediction, prognosis, and treatment response assessment. In this context, radiomic approaches are fostering large-scale studies that can have a significant impact in the clinical practice. In this work, we present a novel method, called CHASM (Cuda, HAralick &amp; SoM), which is accelerated on the graphics processing unit (GPU) for quantitative imaging analyses based on Haralick features and on the self-organizing map (SOM). The Haralick features extraction step relies upon the gray-level co-occurrence matrix, which is computationally burdensome on medical images characterized by a high bit depth. The downstream analyses exploit the SOM with the goal of identifying the underlying clusters of pixels in an unsupervised manner. CHASM is conceived to leverage the parallel computation capabilities of modern GPUs. Analyzing ovarian cancer computed tomography images, CHASM achieved up to 19.5×\sim 19.5\times ∼ 19.5 × and 37×\sim 37\times ∼ 37 × speed-up factors for the Haralick feature extraction and for the SOM execution, respectively, compared to the corresponding C++ coded sequential versions. Such computational results point out the potential of GPUs in the clinical research.</jats:p

    Electromagnetic ray-tracing for the investigation of multipath and vibration signatures in radar imagery

    Get PDF
    Synthetic Aperture Radar (SAR) imagery has been used extensively within UK Defence and Intelligence for many years. Despite this, the exploitation of SAR imagery is still challenging to the inexperienced imagery analyst as the non-literal image provided for exploitation requires careful consideration of the imaging geometry, the target being imaged and the physics of radar interactions with objects. It is therefore not surprising to note that in 2017 the most useful tool available to a radar imagery analyst is a contextual optical image of the same area. This body of work presents a way to address this by adopting recent advances in radar signal processing and computational geometry to develop a SAR simulator called SARCASTIC (SAR Ray-Caster for the Intelligence Community) that can rapidly render a scene with the precise collection geometry of an image being exploited. The work provides a detailed derivation of the simulator from first principals. It is then validated against a range of real-world SAR collection systems. The work shows that such a simulator can provide an analyst with the necessary tools to extract intelligence from a collection that is unavailable to a conventional imaging system. The thesis then describes a new technique that allows a vibrating target to be detected within a SAR collection. The simulator is used to predict a unique scattering signature - described as a one-sided paired echo. Finally an experiment is described that was performed by Cranfield University to specifications determined by SARCASTIC which show that the unique radar signature can actually occur within a SAR collection

    Architectural Support for Medical Imaging

    Full text link
    Advancements in medical imaging research are continuously providing doctors with better diagnostic information, removing the need for unnecessary surgeries and increasing accuracy in predicting life-threatening conditions. However, newly developed techniques are currently limited by the capabilities of existing computer hardware, restricting them to expensive, custom-designed machines that only the largest hospital systems can afford or even worse, precluding them entirely. Many of these issues are due to existing hardware being ill-suited for these types of algorithms and not designed with medical imaging in mind. In this thesis we discuss our efforts to motivate and democratize architectural support for advanced medical imaging tasks with MIRAQLE, a medical image reconstruction benchmark suite. In particular, MIRAQLE focuses on advanced image reconstruction techniques for 3D ultrasound, low-dose X-ray CT, and dynamic MRI. For each imaging modality we provide a detailed background and parallel implementations to enable future hardware development. In addition to providing baseline algorithms for these workloads, we also develop a unique analysis tool that provides image quality feedback for each simulation. This allows hardware designers to explore acceptable image quality trade-offs in algorithm-hardware co-design, potentially allowing for even more efficient solutions than hardware innovations alone could provide. We also motivate the need for such tools by discussing Sonic Millip3De, our low-power, highly parallel hardware for 3D ultrasound. Using Sonic Millip3De, we illustrate the orders-of-magnitude power efficiency improvement that better medical imaging hardware can provide, especially when developed with a hardware-software co-design. We also show validation of the design using a scaled-down FPGA proof-of-concept and discuss our further refinement of the hardware to support a wider range of applications and produce higher frame rates. Overall, with this thesis we hope to enable application specific hardware support for the critical medical imaging tasks in MIRAQLE to make them practical for wide clinical use.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137105/1/rsamp_1.pd

    GPGPU application in fusion science

    Get PDF
    GPGPUs have firmly earned their reputation in HPC (High Performance Computing) as hardware for massively parallel computation. However their application in fusion science is quite marginal and not considered a mainstream approach to numerical problems. Computation advances have increased immensely over the last decade and continue to accelerate. GPGPU boards were always an alternative and exotic approach to problem solving and scientific programming, which was cultivated only by enthusiasts and specialized programmers. Today it is about 10 years, since the first fully programmable GPUs appeared on the market. And due to exponential growth in processing power over the years GPGPUs are not the alternative choice any more, but they became the main choice for big problem solving. Originally developed for and dominating in fields such as image and media processing, image rendering, video encoding/decoding, image scaling, stereo vision and pattern recognition GPGPUs are also becoming mainstream computation platforms in scientific fields such as signal processing, physics, finance and biology. This PhD contains solutions and approaches to two relevant problems for fusion and plasma science using GPGPU processing. First problem belongs to the realms of plasma and accelerator physics. I will present number of plasma simulations built on a PIC (Particle In Cell) method such as plasma sheath simulation, electron beam simulation, negative ion beam simulation and space charge compensation simulation. Second problem belongs to the realms of tomography and real-time control. I will present number of simulated tomographic plasma reconstructions of Fourier-Bessel type and their analysis all in real-time oriented approach, i.e. GPGPU based implementations are integrated into MARTe environment. MARTe is a framework for real-time application developed at JET (Joint European Torus) and used in several european fusion labs. These two sets of problems represent a complete spectrum of GPGPU operation capabilities. PIC based problems are large complex simulations operated as batch processes, which do not have a time constraint and operate on huge amounts of memory. While tomographic plasma reconstructions are online (realtime) processes, which have a strict latency/time constraints suggested by the time scales of real-time control and operate on relatively small amounts of memory. Such a variety of problems covers a very broad range of disciplines and fields of science: such as plasma physics, NBI (Neutral Beam Injector) physics, tokamak physics, parallel computing, iterative/direct matrix solvers, PIC method, tomography and so on. PhD thesis also includes an extended performance analysis of Nvidia GPU cards considering the applicability to the real-time control and real-time performance. In order to approach the aforementioned problems I as a PhD candidate had to gain knowledge in those relevant fields and build a vast range of practical skills such as: parallel/sequential CPU programming, GPU programming, MARTe programming, MatLab programming, IDL programming and Python programming

    Data Management for Dynamic Multimedia Analytics and Retrieval

    Get PDF
    Multimedia data in its various manifestations poses a unique challenge from a data storage and data management perspective, especially if search, analysis and analytics in large data corpora is considered. The inherently unstructured nature of the data itself and the curse of dimensionality that afflicts the representations we typically work with in its stead are cause for a broad range of issues that require sophisticated solutions at different levels. This has given rise to a huge corpus of research that puts focus on techniques that allow for effective and efficient multimedia search and exploration. Many of these contributions have led to an array of purpose-built, multimedia search systems. However, recent progress in multimedia analytics and interactive multimedia retrieval, has demonstrated that several of the assumptions usually made for such multimedia search workloads do not hold once a session has a human user in the loop. Firstly, many of the required query operations cannot be expressed by mere similarity search and since the concrete requirement cannot always be anticipated, one needs a flexible and adaptable data management and query framework. Secondly, the widespread notion of staticity of data collections does not hold if one considers analytics workloads, whose purpose is to produce and store new insights and information. And finally, it is impossible even for an expert user to specify exactly how a data management system should produce and arrive at the desired outcomes of the potentially many different queries. Guided by these shortcomings and motivated by the fact that similar questions have once been answered for structured data in classical database research, this Thesis presents three contributions that seek to mitigate the aforementioned issues. We present a query model that generalises the notion of proximity-based query operations and formalises the connection between those queries and high-dimensional indexing. We complement this by a cost-model that makes the often implicit trade-off between query execution speed and results quality transparent to the system and the user. And we describe a model for the transactional and durable maintenance of high-dimensional index structures. All contributions are implemented in the open-source multimedia database system Cottontail DB, on top of which we present an evaluation that demonstrates the effectiveness of the proposed models. We conclude by discussing avenues for future research in the quest for converging the fields of databases on the one hand and (interactive) multimedia retrieval and analytics on the other

    Feature extraction and fusion for classification of remote sensing imagery

    Get PDF
    corecore