Search CORE

474 research outputs found

GPU Computing Taxonomy

Author: Osman Abdelrahman Ahmed Mohamed
Publication venue: 'IntechOpen'
Publication date: 19/07/2017
Field of study

Over the past few years, a number of efforts have been made to obtain benefits from graphic processing unit (GPU) devices by using them in parallel computing. The main advantage of GPU computing is that it provides cheap parallel processing environments for those who need to solve single program multiple data (SPMD) problems. In this chapter, a GPU computing taxonomy is proposed for classifying GPU computing into four different classes depending on different strategies of combining CPUs and GPUs

IntechOpen

Crossref

CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications

Author: Dou Yong
Lei Guoqing
Li Rongchun
Ma Meng
Wan Wen
Xia Fei
Zou Dan
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Springer - Publisher Connector

PubMed Central

GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications

Author: Phillips Adam
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/05/2014
Field of study

A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applications is presented. The works objectives are to (1) develop stable and efficient algorithms utilizing multiple NVIDIA GPUs with CUDA to accelerate common matrix computations, (2) optimize these algorithms through CPU/GPU memory allocation, GPU kernel development, CPU/GPU communication, data transfer and bandwidth control to (3) develop parallel CFD applications for Navier Stokes and Lattice Boltzmann analysis methods. Special consideration will be given to performing the linear algebra algorithms under certain matrix types (banded, dense, diagonal, sparse, symmetric and triangular). Benchmarks are performed for all analyses with baseline CPU times being determined to find speed-up factors and measure computational capability of the GPU accelerated algorithms. The GPU implemented algorithms used in this work along with the optimization techniques performed are measured against preexisting work and test matrices available in the NIST Matrix Market. CFD analysis looked to strengthen the assessment of this work by providing a direct engineering application to analysis that would benefit from matrix optimization techniques and accelerated algorithms. Overall, this work desired to develop optimization for selected linear algebra and matrix computations performed with modern GPU architectures and CUDA developer which were applied directly to mathematical and engineering applications through CFD analysis

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Exploration Into The Performance Of Asymmetric D-Ary Heap-Based Algorithms For The Hsa Architecture

Author: Adams Stephen
Publication venue: eGrove
Publication date: 01/01/2014
Field of study

eGrove (Univ. of Mississippi)

Power And Hotspot Modeling For Modern GPUs

Author: Hassan Md Mainul
Publication venue: eGrove
Publication date: 01/01/2015
Field of study

As General Purpose GPUs (GPGPU) are increasingly becoming a prominent component of high performance computing platforms, power and thermal dissipation are getting more attention. The trade-offs among performance, power, and heat must be well modeled and evaluated from the early stage of GPU design. This necessitates a tool that allows GPU architects to quickly and accurately evaluate their design. There are a few models for GPU power but most of them estimate power at a higher level than architecture, which are therefore missing hardware reconfigurability. In this thesis, we propose a framework that models power and heat dissipation at the hardware architecture level, which allows for configuring and investigating individual hardware components. Our framework is also capable of visualizing the heat map of the processor over different clock cycles. To the best of our knowledge, this is the first comprehensive framework that integrates and visualizes power consumption and heat dissipation of GPUs

eGrove (Univ. of Mississippi)

Optimizing Parallel Reduction In Cuda To Reach GPU Peak Performance

Author: Hasta Deni Tri
Mahardito Adityo
Suhendra Adang
Publication venue
Publication date
Field of study

GPUs are massively multi threaded many core chips that have hundreds of cores and thousands of concurrent threads swith high performance and memory bandwidth. Now days, GPUs have already been used to accelerate some numerically intensive high performance computing applications in parallel not only used to graphic processing. This thesis aims primarily to demonstrate the programming model approaches that can be maximize the performance of GPUs. This is accomplished by a proof of maximum reach of bandwidth memory and get speed up from the GPU that used to process parallel computation. The programming environment that used is NVIDIA’s CUDA, it is parallel architecture and model for generalpurpose computing on a GPU

Gunadarma University Repository

A CPU-GPU Hybrid Approach for Accelerating Cross-correlation Based Strain Elastography

Author: Deka Sthiti
Publication venue
Publication date
Field of study

Elastography is a non-invasive imaging modality that uses ultrasound to estimate the elasticity of soft tissues. The resulting images are called 'elastograms'. Elastography techniques are promising as cost-effective tools in the early detection of pathological changes in soft tissues. The quality of elastographic images depends on the accuracy of the local displacement estimates. Cross-correlation based displacement estimators are precise and sensitive. However cross-correlation based techniques are computationally intense and may limit the use of elastography as a real-time diagnostic tool. This study investigates the use of parallel general purpose graphics processing unit (GPGPU) engines for speeding up generation of elastograms at real-time frame rates while preserving elastographic image quality. To achieve this goal, a cross-correlation based time-delay estimation algorithm was developed in C programming language and was profiled to locate performance blocks. The hotspots were addressed by employing software pipelining, read-ahead and eliminating redundant computations. The algorithm was then analyzed for parallelization on GPGPU and the stages that would map well to the GPGPU hardware were identified. By employing optimization principles for efficient memory access and efficient execution, a net improvement of 67x with respect to the original optimized C version of the estimator was achieved. For typical diagnostic depths of 3-4cm and elastographic processing parameters, this implementation can yield elastographic frame rates in the order of 50fps. It was also observed that all of the stages in elastography cannot be offloaded to the GPGPU for computation because some stages have sub-optimal memory access patterns. Additionally, data transfer from graphics card memory to system memory can be efficiently overlapped with concurrent CPU execution. Therefore a hybrid model of computation where computational load is optimally distributed between CPU and GPGPU was identified as an optimal approach to adequately tackle the speed-quality problem in real-time imaging. The results of this research suggest that use of GPGPU as a co-processor to CPU may allow generation of elastograms at real time frame rates without significant compromise in image quality, a scenario that could be very favorable in real-time clinical elastography

Texas A&M Repository

UVaFTLE: Lagrangian finite time Lyapunov exponent extraction for fluid dynamic applications

Author: Carratalá Sáez Rocío
Llanos Ferraris Diego Rafael
López Huguet Sergio
Sierra Pallarés José Benito
Torres de la Sierra Yuri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

Producción CientíficaThe determination of Lagrangian Coherent Structures (LCS) is becoming very important in several disciplines, including cardiovascular engineering, aerodynam- ics, and geophysical fluid dynamics. From the computational point of view, the extraction of LCS consists of two main steps: The flowmap computation and the resolution of Finite Time Lyapunov Exponents (FTLE). In this work, we focus on the design, implementation, and parallelization of the FTLE resolution. We offer an in-depth analysis of this procedure, as well as an open source C implementation (UVaFTLE) parallelized using OpenMP directives to attain a fair parallel efficiency in shared-memory environments. We have also implemented CUDA kernels that allow UVaFTLE to leverage as many NVIDIA GPU devices as desired in order to reach the best parallel efficiency. For the sake of reproducibility and in order to con- tribute to open science, our code is publicly available through GitHub. Moreover, we also provide Docker containers to ease its usage.Ministerio de Economía, Industria y Competitividad, Consejo Asesor de Educación de Castilla y León y Programas del Fondo de Desarrollo (FEDER): Proyecto PCAS (TIN2017-88614-R) y Proyecto PROPHET-2 (VA226P20).Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación y “European Union NextGenerationEU/PRTR” : (MCIN/ AEI/10.13039/501100011033) - (grant TED2021-130367B-I00)Junta de Castilla y León (project VA182P20)Red Española de Supercomputación (RES) (projects IM-2022-2-0015 and IM-2022-3-0021)Publicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

Repositorio Documental de la Universidad de Valladolid