research on image blur algorithm optimization using opencl

Abstract

现代GPU一般都提供特定硬件(如纹理部件、光栅化部件及各种片上缓存)以加速二维图像的处理和显示过程,相应的编程模型(CUDA、OpenCL)都定 义了特定程序设计接口(CUDA的纹理内存,OpenCL的图像对象)以便图像应用能利用相关硬件支持。以典型图像模糊化处理算法在AMD平台GPU的优 化为例,探讨了OpenCL的图像对象在图像算法优化上的适用范围,尤其是分析了其相对于更通用的基于全局内存加片上局部存储进行性能优化的方法的优劣。 实验结果表明,图像对象只有在图像为四通道且计算过程中需要缓存的数据量较小时才能带来较好的性能改善,其余情况采用全局内存加局部存储都能获得较好性能 。优化后的算法性能相对于精心实现的CPU版加速比为200~1000;相对于NVIDIA NPP库相应函数的性能加速比为1.3~5。Modern GPUs generally provide specific hardware(such as texture,grating components and various on-chip cache) to accelerate the 2D image processing and displaying process.Programming model defines specific APIs to facilitate image applications taking advantage of image-related GPU hardware,such as CUDAs texture memory and OpenCLs Images Object.Taking the optimization of image blur algorithm on AMD GPU as an example,the paper made a deep insight into the using of OpenCLs image object on image applications,especially its advantage and disadvantage compared to the more general optimization method based on global memory and the on-chip local memory.The experimental results demonstrate that the image object can provide better performance only when the processing image is four-channel and the amount of data to be cached is small.For other cases,optimizing with global memory and local memory can get better performance.After optimization,the speedup reaches 200x to 1000x in comparison with the well optimized CPU code,and the speedup over NVIDIA NPP version is upto 1.3x to 5x

    Similar works