46,421 research outputs found

    HIGH PERFORMANCE COMPUTING FOR RECONNAISSANCE APPLICATIONS

    Get PDF
    Parallel programming is vital to fully utilize the multicore architectures that dominate the processor market. The market, however, is constantly evolving, with new processors and new architectures getting released annually. Using an open parallel processing language, such as OpenCL (Open Computing Language), enables the use of a single program across multiple architectures. It also enables a method of evaluation between multiple devices so the best choice can be made for a given application. In this research, OpenCL is used to evaluate the performance of two signal processing algorithms across two graphics processing units and one central processing unit. Experimental results show that for each algorithm, a specific device can clearly be shown to outperform the others.Ensign, United States NavyApproved for public release; distribution is unlimited

    PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

    Full text link
    High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

    Enhancement and Edge-Preserving Denoising: An OpenCL-Based Approach for Remote Sensing Imagery

    Get PDF
    Image enhancement and edge-preserving denoising are relevant steps before classification or other postprocessing techniques for remote sensing images. However, multisensor array systems are able to simultaneously capture several low-resolution images from the same area on different wavelengths, forming a high spatial/spectral resolution image and raising a series of new challenges. In this paper, an open computing language based parallel implementation approach is presented for near real-time enhancement based on Bayesian maximum entropy (BME), as well as an edge-preserving denoising algorithm for remote sensing imagery, which uses the local linear Stein’s unbiased risk estimate (LLSURE). BME was selected for its results on synthetic aperture radar image enhancement, whereas LLSURE has shown better noise removal properties than other commonly used methods. Within this context, image processing methods are algorithmically adapted via parallel computing techniques and efficiently implemented using CPUs and commodity graphics processing units (GPUs). Experimental results demonstrate the reduction of computational load of real-world image processing for near real-time GPU adapted implementation.ITESO, A.C

    DPIVSoft-OpenCL: a multicore CPU-GPU accelerated open source code for 2D Particle Image Velocimetry

    Get PDF
    We present a translation of the original Matlab DPIVSoft code to a complete open source code implemented in Python, to perform Particle Image Velocimetry (PIV) in two-dimensions, in parallel, and with interrogation window shifting along with the double-pass window deformation approach using multiple iterations for each pass. The added value of the code is the use of the Open Computing Language (OpenCL) library to parallelize the original code on multiple Intel Central Processing Units (CPUs) and/or Graphics Processing Units (GPUs), so it can be run on all commercially available GPUs. Examples of flow application are included in the text using synthetic images generated from DNS data from John Hopkins Turbulence Database (JHTD) (Perlman, 2007), showing about 90x speedup over the previous Matlab implementation for a given test case.This research has been supported by one grant from the Ministerio de Economía y Competitividad of Spain (Grant No. DPI2016-76151-C2-1-R) and partially by the project B4-2019-11, 0837002010 from the Universidad de Málaga and the project PID2021-124692OA-I00 from the Ministerio de Ciencia e Innovación // Partial funding for open access charge: Universidad de Málaga / CBU

    General‐purpose computation on GPUs for high performance cloud computing

    Get PDF
    This is the peer reviewed version of the following article: Expósito, R. R., Taboada, G. L., Ramos, S., Touriño, J., & Doallo, R. (2013). General‐purpose computation on GPUs for high performance cloud computing. Concurrency and Computation: Practice and Experience, 25(12), 1628-1642., which has been published in final form at https://doi.org/10.1002/cpe.2845. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] Cloud computing is offering new approaches for High Performance Computing (HPC) as it provides dynamically scalable resources as a service over the Internet. In addition, General‐Purpose computation on Graphical Processing Units (GPGPU) has gained much attention from scientific computing in multiple domains, thus becoming an important programming model in HPC. Compute Unified Device Architecture (CUDA) has been established as a popular programming model for GPGPUs, removing the need for using the graphics APIs for computing applications. Open Computing Language (OpenCL) is an emerging alternative not only for GPGPU but also for any parallel architecture. GPU clusters, usually programmed with a hybrid parallel paradigm mixing Message Passing Interface (MPI) with CUDA/OpenCL, are currently gaining high popularity. Therefore, cloud providers are deploying clusters with multiple GPUs per node and high‐speed network interconnects in order to make them a feasible option for HPC as a Service (HPCaaS). This paper evaluates GPGPU for high performance cloud computing on a public cloud computing infrastructure, Amazon EC2 Cluster GPU Instances (CGI), equipped with NVIDIA Tesla GPUs and a 10 Gigabit Ethernet network. The analysis of the results, obtained using up to 64 GPUs and 256‐processor cores, has shown that GPGPU is a viable option for high performance cloud computing despite the significant impact that virtualized environments still have on network overhead, which still hampers the adoption of GPGPU communication‐intensive applications. CopyrightMinisterio de Ciencia e Innovación; TIN2010-1673

    gpucc: An Open-Source GPGPU Compiler

    Get PDF
    Abstract Graphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models for these devices are NVIDIA's CUDA and the cross-platform OpenCL standard. Until now, there has not been a fully open-source compiler targeting the CUDA environment, hampering general compiler and architecture research and making deployment difficult in datacenter or supercomputer environments. In this paper, we present gpucc, an LLVM-based, fully open-source, CUDA compatible compiler for high performance computing. It performs various general and CUDAspecific optimizations to generate high performance code. The Clang-based frontend supports modern language features such as those in C++11 and C++14. Compile time is 8% faster than NVIDIA's toolchain (nvcc) and it reduces compile time by up to 2.4x for pathological compilations (>100 secs), which tend to dominate build times in parallel build environments. Compared to nvcc, gpucc's runtime performance is on par for several open-source benchmarks, such as Rodinia (0.8% faster), SHOC (0.5% slower), or Tensor (3.7% faster). It outperforms nvcc on internal large-scale end-to-end benchmarks by up to 51.0%, with a geometric mean of 22.9%
    corecore