research on image integral algorithm optimization based on opencl

Abstract

图像积分图算法在快速特征检测中有着广泛的应用,通过GPU对其进行性能加速有着重要的现实意义。然而由于GPU硬件架构的复杂性和不同硬件体系架构间的差异性,完成图像积分图算法在GPU上的优化,进而实现不同GPU平台间的性能移植是一件非常困难的工作。在分析不同GPU平台底层硬件架构的基础上,从片外访存带宽利用率、计算资源利用率和数据本地化等多个角度考察了不同优化方法在不同GPU硬件平台上对性能的影响。并在此基础上实现了基于OpenCL的图像积分图算法。实验结果表明,优化后的算法在AMD和NVIDIA GPU上分别取得了11.26和12.38倍的性能加速,优化后的GPU kernel比NVIDIA NPP库中的相应函数也分别取得了55.01%和65.17%的性能提升。验证了提出的优化方法的有效性和性能可移植性。国家自然科学基金资助项目(61133005,61272136,61100073)|国家863项目(2012AA010902,2012AA010903)|ISCAS-AMD联合fusion软件中心资助Image integral algorithm is widely used in fast feature detection,and improving the performance of this algorithm through GPU has an important practical significance.However,due to the complexity of the GPU hardware architecture and the architectural differences between different GPUs,how to complete the optimization of this algorithm and achieve performance portability on different GPU platforms is still a hard work.This paper analysed the differences between theunderlying hardware architectures of GPU,and studied the effects of performance on different GPU platforms using different optimization methods from the utilization of the off-chip memory bandwidth,the utilization of the computing resource,data locality and other aspects.And based on this,we implemented the image integral algorithm based on OpenCL.Experimental results show that optimized algorithm gets 11.26 and 12.38 times speedup on AMD and NVIDIA GPU respectively,and the performance of the optimized kernel improves 55.01% and 65.17%than the CUDA version in NVIDIA NPP library,which verifies the effectiveness and cross-platform ability of optimization methods

    Similar works