Search CORE

46,421 research outputs found

HIGH PERFORMANCE COMPUTING FOR RECONNAISSANCE APPLICATIONS

Author: Stevens Christopher J.
Publication venue: Monterey, California. Naval Postgraduate School
Publication date: 01/06/2014
Field of study

Parallel programming is vital to fully utilize the multicore architectures that dominate the processor market. The market, however, is constantly evolving, with new processors and new architectures getting released annually. Using an open parallel processing language, such as OpenCL (Open Computing Language), enables the use of a single program across multiple architectures. It also enables a method of evaluation between multiple devices so the best choice can be made for a given application. In this research, OpenCL is used to evaluate the performance of two signal processing algorithms across two graphics processing units and one central processing unit. Experimental results show that for each algorithm, a specific device can clearly be shown to outperform the others.Ensign, United States NavyApproved for public release; distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref

Enhancement and Edge-Preserving Denoising: An OpenCL-Based Approach for Remote Sensing Imagery

Author: Carrasco-Álvarez Roberto
Castillo-Atoche Alejandro
Ortegón-Aguilar Jaime
Pérez-Martínez Omar
Villalón-Turrubiates Iván E.
Vázquez-Castillo Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2016
Field of study

Image enhancement and edge-preserving denoising are relevant steps before classification or other postprocessing techniques for remote sensing images. However, multisensor array systems are able to simultaneously capture several low-resolution images from the same area on different wavelengths, forming a high spatial/spectral resolution image and raising a series of new challenges. In this paper, an open computing language based parallel implementation approach is presented for near real-time enhancement based on Bayesian maximum entropy (BME), as well as an edge-preserving denoising algorithm for remote sensing imagery, which uses the local linear Stein’s unbiased risk estimate (LLSURE). BME was selected for its results on synthetic aperture radar image enhancement, whereas LLSURE has shown better noise removal properties than other commonly used methods. Within this context, image processing methods are algorithmically adapted via parallel computing techniques and efficiently implemented using CPUs and commodity graphics processing units (GPUs). Experimental results demonstrate the reduction of computational load of real-world image processing for near real-time GPU adapted implementation.ITESO, A.C

Repositorio Institucional del ITESO

DPIVSoft-OpenCL: a multicore CPU-GPU accelerated open source code for 2D Particle Image Velocimetry

Author: Aguilar-Cabello Jorge
Del-Pino-Peñas Carlos Manuel
Parras-Anguita Luis
Publication venue: Elsevier
Publication date: 21/11/2022
Field of study

We present a translation of the original Matlab DPIVSoft code to a complete open source code implemented in Python, to perform Particle Image Velocimetry (PIV) in two-dimensions, in parallel, and with interrogation window shifting along with the double-pass window deformation approach using multiple iterations for each pass. The added value of the code is the use of the Open Computing Language (OpenCL) library to parallelize the original code on multiple Intel Central Processing Units (CPUs) and/or Graphics Processing Units (GPUs), so it can be run on all commercially available GPUs. Examples of flow application are included in the text using synthetic images generated from DNS data from John Hopkins Turbulence Database (JHTD) (Perlman, 2007), showing about 90x speedup over the previous Matlab implementation for a given test case.This research has been supported by one grant from the Ministerio de Economía y Competitividad of Spain (Grant No. DPI2016-76151-C2-1-R) and partially by the project B4-2019-11, 0837002010 from the Universidad de Málaga and the project PID2021-124692OA-I00 from the Ministerio de Ciencia e Innovación // Partial funding for open access charge: Universidad de Málaga / CBU

Repositorio Institucional Universidad de Málaga

General‐purpose computation on GPUs for high performance cloud computing

Author: Doallo Ramón
Expósito Roberto R.
López Taboada Guillermo
Ramos Garea Sabela
Touriño Juan
Publication venue: 'Wiley'
Publication date: 01/01/2013
Field of study

This is the peer reviewed version of the following article: Expósito, R. R., Taboada, G. L., Ramos, S., Touriño, J., & Doallo, R. (2013). General‐purpose computation on GPUs for high performance cloud computing. Concurrency and Computation: Practice and Experience, 25(12), 1628-1642., which has been published in final form at https://doi.org/10.1002/cpe.2845. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] Cloud computing is offering new approaches for High Performance Computing (HPC) as it provides dynamically scalable resources as a service over the Internet. In addition, General‐Purpose computation on Graphical Processing Units (GPGPU) has gained much attention from scientific computing in multiple domains, thus becoming an important programming model in HPC. Compute Unified Device Architecture (CUDA) has been established as a popular programming model for GPGPUs, removing the need for using the graphics APIs for computing applications. Open Computing Language (OpenCL) is an emerging alternative not only for GPGPU but also for any parallel architecture. GPU clusters, usually programmed with a hybrid parallel paradigm mixing Message Passing Interface (MPI) with CUDA/OpenCL, are currently gaining high popularity. Therefore, cloud providers are deploying clusters with multiple GPUs per node and high‐speed network interconnects in order to make them a feasible option for HPC as a Service (HPCaaS). This paper evaluates GPGPU for high performance cloud computing on a public cloud computing infrastructure, Amazon EC2 Cluster GPU Instances (CGI), equipped with NVIDIA Tesla GPUs and a 10 Gigabit Ethernet network. The analysis of the results, obtained using up to 64 GPUs and 256‐processor cores, has shown that GPGPU is a viable option for high performance cloud computing despite the significant impact that virtualized environments still have on network overhead, which still hampers the adoption of GPGPU communication‐intensive applications. CopyrightMinisterio de Ciencia e Innovación; TIN2010-1673

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

gpucc: An Open-Source GPGPU Compiler

Author: Artem Belevich
Artem Belevich
Bjarke Roune
Bjarke Roune
Chris Leary
Chris Leary
Eli Bendersky
Eli Bendersky
Jacques Pienaar
Jacques Pienaar
Jingyue Wu
Jingyue Wu
Mark Heffernan
Mark Heffernan
Rob Springer
Rob Springer
Robert Hundt
Robert Hundt
Xuetian Weng
Xuetian Weng
Publication venue
Publication date: 06/03/2020
Field of study

Abstract Graphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models for these devices are NVIDIA's CUDA and the cross-platform OpenCL standard. Until now, there has not been a fully open-source compiler targeting the CUDA environment, hampering general compiler and architecture research and making deployment difficult in datacenter or supercomputer environments. In this paper, we present gpucc, an LLVM-based, fully open-source, CUDA compatible compiler for high performance computing. It performs various general and CUDAspecific optimizations to generate high performance code. The Clang-based frontend supports modern language features such as those in C++11 and C++14. Compile time is 8% faster than NVIDIA's toolchain (nvcc) and it reduces compile time by up to 2.4x for pathological compilations (>100 secs), which tend to dominate build times in parallel build environments. Compared to nvcc, gpucc's runtime performance is on par for several open-source benchmarks, such as Rodinia (0.8% faster), SHOC (0.5% slower), or Tensor (3.7% faster). It outperforms nvcc on internal large-scale end-to-end benchmarks by up to 51.0%, with a geometric mean of 22.9%

CiteSeerX