Search CORE

15 research outputs found

Degradation of HPAM-containing wastewater with integrated process of UASB and immobilized microorganism reactor

Author: 刘江红
潘洋
芦艳
谷璟瑶
贾云鹏
Publication venue
Publication date: 05/07/2013
Field of study

随着聚合物驱油技术在油田的广泛推广,部分水解聚丙烯酰胺(HPAM)在为油田提高采收率的同时,对当地环境也产生了相当大的危害。本文在对含聚污水水质分析和可生化性分析的基础上对污水进行可生化性调整,运用“气浮-uASb-水解酸化-微生物固定化反应器“组合工艺对含聚污水进行了生化处理模拟实验。模拟实验分为静态模拟和动态模拟两部分。静态模拟实验中,降解6天后,聚丙烯酰胺降解率达到89.7%。动态模拟试验中,组合工艺处理2天以后,HPAM降解率为88.65%,原油总去除率为99.40%,出水COd总去除率为93.40%。利用扫描电镜(SEM)和红外光谱分析聚丙烯酰胺降解产物,显示HPAM由大分子物质断裂成小分子物质,HPAM的酰胺基转化为羧基。Polymer flooding technique has become an important method of enhanced oil recovery.But the application of partially hydrolyzed polyacrylamide(HPAM) in oilfield meets many environmental problems.Based on water analysis and biodegradability analysis of HPAM-containing wastewater,improved biodegradability of wastewater was achieved.Laboratory simulation treatment of wastewater was performed by the "flotation-up-flow anaerobic sludge bed(UASB)-hydrolysis acidificationimmobilized microorganism reactor" integrated process.There were two parts in simulation experiment,one was static,and the other was dynamic.After 2 d static biochemical treatment,the degradation ratio of HPAM in wastewater was up to 89.7%.And in dynamic simulation experiment,after 2 d treatment,the degradation ratio of HPAM was up to 88.65%,that of oil was 99.40%,that of COD was 93.40%.HPAM degradation products were analyzed by scanning electron microscopy(SEM) and infrared spectroscopy.The results showed that HPAM was transformed from bigger molecules into smaller molecules and acylamino group in HPAM was transformed into carboxyl group.国家863计划项目(2008AA06Z304

Xiamen University Institutional Repository

research on image integral algorithm optimization based on opencl

Author: 张云泉
徐建良
贾海鹏
Publication venue
Publication date: 01/01/2013
Field of study

图像积分图算法在快速特征检测中有着广泛的应用,通过GPU对其进行性能加速有着重要的现实意义。然而由于GPU硬件架构的复杂性和不同硬件体系架构间的差异性,完成图像积分图算法在GPU上的优化,进而实现不同GPU平台间的性能移植是一件非常困难的工作。在分析不同GPU平台底层硬件架构的基础上,从片外访存带宽利用率、计算资源利用率和数据本地化等多个角度考察了不同优化方法在不同GPU硬件平台上对性能的影响。并在此基础上实现了基于OpenCL的图像积分图算法。实验结果表明,优化后的算法在AMD和NVIDIA GPU上分别取得了11.26和12.38倍的性能加速,优化后的GPU kernel比NVIDIA NPP库中的相应函数也分别取得了55.01%和65.17%的性能提升。验证了提出的优化方法的有效性和性能可移植性。国家自然科学基金资助项目(61133005,61272136,61100073)|国家863项目(2012AA010902,2012AA010903)|ISCAS-AMD联合fusion软件中心资助Image integral algorithm is widely used in fast feature detection,and improving the performance of this algorithm through GPU has an important practical significance.However,due to the complexity of the GPU hardware architecture and the architectural differences between different GPUs,how to complete the optimization of this algorithm and achieve performance portability on different GPU platforms is still a hard work.This paper analysed the differences between theunderlying hardware architectures of GPU,and studied the effects of performance on different GPU platforms using different optimization methods from the utilization of the off-chip memory bandwidth,the utilization of the computing resource,data locality and other aspects.And based on this,we implemented the image integral algorithm based on OpenCL.Experimental results show that optimized algorithm gets 11.26 and 12.38 times speedup on AMD and NVIDIA GPU respectively,and the performance of the optimized kernel improves 55.01% and 65.17%than the CUDA version in NVIDIA NPP library,which verifies the effectiveness and cross-platform ability of optimization methods

Institute Of Software, Chinese Academy Of Sciences

panoramic mosaicing based on multi-domain analysis and global optimization

Author: 刘鹏
彭启民
葛诚
贾云得
Publication venue
Publication date: 01/01/2006
Field of study

Institute Of Software, Chinese Academy Of Sciences

parallelism and research on functions with continuously independent data and intensive memory access using opencl

Author: 张云泉
蒋丽媛
贾海鹏
龙国平
Publication venue
Publication date: 01/01/2013
Field of study

连续的数据无关是指计算目标矩阵连续的元素时使用的源矩阵元素之间没有关系且也为连续的,访存密集型是指函数的计算量较小,但是有大量的数据传输操作。在OpenCL框架下,以bitwise函数为例,研究和实现了连续数据无关访存密集型函数在GPU平台上的并行与优化。在考察向量化、线程组织方式和指令选择优化等多个优化角度在不同的GPU硬件平台上对性能的影响之后,实现了这个函数的跨平台性能移植。实验结果表明,在不考虑数据传输的前提下,优化后的函数与这个函数在OpenCV库中的CPU版本相比,在AMD HD 5850GPU达到了平均40倍的性能加速比;在AMD HD 7970GPU达到了平均90倍的性能加速比;在NVIDIA Tesla C2050GPU上达到了平均60倍的性能加速比;同时,与这个函数在OpenCV库中的CUDA实现相比,在NVIDIA Tesla C2050平台上也达到了1.5倍的性能加速。国家自然科学基金资助项目(60303020,60533020),国家自然科学基金资助重点项目(60503020),国家自然科学基金青年基金课题(61100072)|国家“863”计划基金资助项目(2012AA010902)|ISCAS-AMD联合fusion软件中心资助Continuously independent data type means when calculating the continuous elements of destination matrix, the used elements of source matrices are also continuous and there are no relationship among them. Intensive memory access function is the function that has less computation but a lot of data transfer operations. This paper took the bitwise function as the example, studied and implemented the parallel and the optimizing methods of the continuously independent data and intensive memory access function on GPU platforms. Based on the OpenCL framework, this paper studied and compared various optimizing methods,such as vectorizing, threads organizing, and instruction selecting, and finally used these methods to implement the cross-platform transfer of the bitwise function among different platforms. The study tested the function's execution time without data transfer both on AMD GPU and NVIDIA GPU platforms. On the AMD Radeon HD 5850 platform, the performance has reached 40 times faster than the CPU version in OpenCV library, 90 times faster on AMD Radeon HD 7970 platform, and 60 times faster on NVIDIA GPU Tesla C2050 platform. On NVIDIA GPU Tesla C2050 platform, the speedup is 1.5 comparing with the CUDA version in OpenCV library

Institute Of Software, Chinese Academy Of Sciences

research on mean shift algorithm using opencl on multiple many-core platforms

Author: 庞旭
张云泉
贾海鹏
颜深根
龙国平
Publication venue
Publication date: 01/01/2013
Field of study

OpenCL作为一种面向多种平台、通用目的的编程标准,已经对许多应用程序进行了加速。由于平台硬件和软件环境的差异,通用的优化方法不一定在所有平台都有很好的加速。通过对均值平移算法在GPU和APU平台的优化,探讨了不同平台各种优化方法的贡献力,一方面研究各个平台的计算特性,另一方面体会不同优化方法的优劣,在优劣的相互转化中寻求最优的解决方案。实验表明,算法并行优化前、后在AMD 5850、Tesla C2050和APU A6-3650上分别达到了9.68、5.74和1.27倍加速,并行相比串行程序达到79.73、93.88和2.22倍加速,前两个平台OpenCL版本相比,CUDA版本的OpenCV程序达到1.27和1.24倍加速。国家自然科学基金项目(60303020,60533020),国家自然科学基金重点项目(60503020),国家自然科学基金青年基金项目(61100072)|国家“863”计划基金资助项目(2012AA010902)|ISCAS-AMD联合fusion软件中心资助As a general-purpose programming standard for multiple platforms, OpenCL has accelerated many applications. Due to the differences of different platforms in hardware and software environments, general optimization methods may not accelerate the application well for all. Taking the optimization of the mean shift algorithm on GPU and APU platforms as an example, the paper provided several insights on contributions of various optimization methods on different platforms. On one hand, we explored the architectures of different platforms. On the other hand, we compared the pros and cons of different optimization methods. Based on meticulous evaluations of the pros and cons,we looked for the optimal solution. Experimental results show that, on AMD 5850, Tesla C2050 and APU A6-3650 platforms, the optimized algorithm achieves 9.68*,5.74* and 1.27* speedups, respectively, and 79.73*, 93.88* and 2.22* speedups compared to the serial version, respectively, and 1.27* and 1.24* speedups compared to the CUDA version OpenCV program for the first two platforms, respectively

Institute Of Software, Chinese Academy Of Sciences

research on laplace image enhancement algorithm optimization based on opencl

Author: 张云泉
徐建良
李焱
贾海鹏
龙国平
Publication venue
Publication date: 01/01/2012
Field of study

OpenCL是面向异构计算平台的通用编程框架,然而由于硬件体系结构的差异,如何在平台间功能移植的基础上实现性能移植仍是有待研究的问题。当前已有算法优化研究一般只针对单一硬件平台,它们很难实现在不同平台上的高效运行。在分析了不同GPU平台底层硬件架构的基础上,从Global Memory的访存效率、GPU计算资源的有效利用率及其硬件资源的限制等多个角度考察了不同优化方法在不同GPU硬件平台上对性能的影响;并在此基础上实现了基于OpenCL的拉普拉斯图像增强算法。实验结果表明,优化后的算法在不考虑数据传输时间的前提下,在AMD和NVIDIA GPU上都取得了3.7～136.1倍、平均56.7倍的性能加速,优化后的kernel比NVIDIA NPP库中相应函数也取得了12.3%～346.7%、平均143.1%的性能提升,验证了提出的优化方法的有效性和性能可移植性。OpenCL is a general-purpose programming framework forheterogeneous computing platforms,however,due to the differences in hardware architecture,how to achieve performance portability on different platforms based on the function portability is still to be studied.Currently most of the researches on algorithm optimization are aimed at a single hardware platform,and difficult to achieve the efficient running on different platforms.This paper analysed the differences between the underlying hardware architectures of GPU,and studied the effects of different GPU platforms using different optimization methods on performance from the access efficiency of global memory,full use of the GPU compute resource,the constraints with hardware resource and other aspects.Based on this,the Laplace image enhancement algorithm based on OpenCL was implemented.Experimental results show that optimized algorithm gets 3.7~136.1times and 56.7 times on average speedup(without calculate the data transfer time) on both AMD and NVIDIA GPU,and the performance of the optimized kernel increases 12.3%~346.7% and 143.1% on average than the CUDA version in NVIDIA NPP library,which verifies the effectiveness and cross-platform ability of optimization methods

Institute Of Software, Chinese Academy Of Sciences

Multi-source localization with binary sensor networks

Author: 吴成东
张云洲
程龙
纪鹏
贾子熙
Publication venue
Publication date: 01/01/2011
Field of study

针对多源定位模型计算比较复杂的情况,基于Neyman-Pearson准则对二元传感器网络的多源探测模型进行了研究,然后在2个信号源的情况下,提出利用Fisher准则将传感器分为两部分,每部分传感器与相应信号源对应,并在此基础上提出利用加权减负加正(WSNAP,weighted subtract on negative add on positive)算法对多信号源进行定位计算。仿真结果表明:Fisher准则能以较高的正确率的将报警传感器分为两部分; 与质心算法和加正(AP,add positive)算法相比较,所提出的方法计算复杂度较低、定位精度更高,并利用数据库对文中的结论进行了验证

Shenyang Institute of Automation,Chinese Academy Of Sciences

Research and Application of Microbial Paraffin-removal Technology

Author: 刘江红
徐瑞丹
王鉴
贾云鹏
陈逸桐
Publication venue
Publication date: 25/05/2013
Field of study

利用从大庆含蜡原油中分离、纯化得到的微生物清防蜡菌种和高产表活剂菌种,经鉴定清防蜡菌种和高产表活剂菌种均为芽孢杆菌属.以菌种对固体石蜡的降解率为指标,按照不同的比例将清防蜡菌种和高产表活剂菌种混合接种.当清防蜡菌种与高产表活剂菌种的复配比例是5∶3时,培养7d后,清蜡率达到59%,防蜡率达到57.4%,原油粘度降粘率为44.7%,原油凝固点降低了3.4℃,培养液表面张力降低46.5%.采用微生物清防蜡技术对大庆外围榆树林油田的3口井进行现场试验,井12-36日产油增长41.2%,洗井周期由40d延长至149d,减少洗井次数4次;井13-39日产油增长33.3%,洗井周期由45d延长至158d,减少洗井次数5次;井14-43日产油增长37.5%,洗井周期由30d延长至122d,减少洗井次数5次.Microorganisms were obtained by means of separation and purification experiments from waxy oil production wells in Daqing Oilfield.The paraffin-degrading strain and the biosurfactant-producing strain were identified as Bacillus sp..As the indicator of the paraffin degradation rate,the paraffin-degrading strain and biosurfactant-producing strain were added in different proportions.When the optimum proportion was 5∶3,the paraffin degradation rate could reach 59%,the paraffin prevention rate could reach 57.4%,the reduction rate of oil viscosity was 44.7%,the oil freezing point reduced by 3.4℃,and the reduction rate of culture surface tension was 46.5%after 7days.Microbial paraffin-removal technology was used for paraffin removal in Wells 12-36,13-39and 14-43in Yushulin Zone Daqing Oilfield.The results indicated that the daily crude oil increased by 41.2%,hot washing cycles prolonged from 40dto 149d and washing numbers reduced by 4times in Well 12-36,the daily crude oil increased by 33.3%,hot washing cycles prolonged from 45dto 158dand hot washing numbers reduced by 5times in Well 13-39,the daily crude oil increased by 37.5%,hot washing cycles prolonged from 30dto 122dand hot washing numbers reduced by 5times in Well 14-43.黑龙江省教育厅科学技术研究项目(12531064

Xiamen University Institutional Repository

基于OpenCL的拉普拉斯图像增强算法优化研究

Author: 张云泉
徐建良
李焱
贾海鹏
龙国平
Publication venue
Publication date: 01/01/2011
Field of study

OpenCL是面向异构计算平台的通用编程框架，然而由于硬件体系结构的差异，如何在平台间功能移植的基础上，实现性能移植仍是有待研究的问题。当前已有算法优化研究一般只针对单一硬件平台，很难实现在不同平台上的高效运行。本文在分析了不同GPU平台底层硬件架构的基础上，从Global Memory的访存效率、GPU计算资源的有效利用率及其硬件资源的限制等多个角度考察了不同优化方法在不同GPU硬件平台上对性能的影响。并在此基础上实现了基于OpenCL的拉普拉斯图像增强算法，实验结果表明优化后的算法在不考虑数据传输时间的前提下，在AMD和NVIDIA GPU上都取得了3.7~136.1倍，平均56.7倍的性能加速，优化后的kernel比NVIDIA NPP库中相应函数也取得了12.3％~346.7％，平均143.1％的性能提升，验证了本文提出的优化方法的有效性和性能可移植性。中国计算机学

Institute Of Software, Chinese Academy Of Sciences

Research on Image Remap Algorithm Optimization Based on OpenCL

Author: 吴再龙
张云泉
徐建良
贾海鹏
龙国平
Publication venue
Publication date: 01/01/2013
Field of study

图像重映射(Remap)算法是典型的图像变化算法。在图像放缩、扭曲、旋转等领域有着广泛的应用。随着图片规模和分辨率的不断提高，对图形映射算法的性能提出了越来越高的要求。本文在充分考虑不同GPU平台硬件体系结构差异的基础上，系统研究了在OpenCL框架下图像映射(Remap)算法在不同GPU平台上的高效实现方式。并从片外内存访存优化，向量化计算，减少动态指令等多个优化角度考察了不同优化方法在不同GPU平台上对性能的影响，提出了在不同GPU平台间实现性能移植的可能性。实验结果表明，优化后的算法在不考虑数据传输时间的前提下，在AMD HD5850 GPU上相对于CPU版本取得114.3~491.5倍的加速比，相对于CUDA版本(现有GPU算法的实现)得到1.01~1.86的加速比，在NIVIDIA C2050 GPU上相对CPU版本取得100.7~369.8倍的加速比，相对于CUDA版本得到0.95~1.58的加速比。有效验证了本文提出的优化方法的有效性和性能可移植性。 As a typical algorithm for image transformation, remap algorithm is widely used in image zooming, warping, rotating and some others. With continuous increase of image’s scale and resolution, higher performance of graphic mapping algorithm has been more and more demanded. Taking full account of the differences of the hardware architectures on different GPU platforms, it is systematically studied in this paper that how remap algorithm based on OpenCL can run effectively on different GPU platforms. By applying memory access optimization of global memory, vectorization calculation, reducing judgments branch and some other optimization methods, we investigated the effects of different optimization on different platforms and suggested the possibility of realizing cross-platform portability. Experimental results showed that without counting the data transfer time, the speedup-ratio is 114.3~491.5 times for AMD HD5850 GPU to CPU version, and 1.01~1.86 times to CUDA version (with present GPU algorithm), and for NIVIDIA C2050 GPU, the speedup-ratio is 100.7~369.8 times to CPU and 0.95~1.58 times to CUDA. These well proved the validity and portability of the optimization methods proposed in this paper

Institute Of Software, Chinese Academy Of Sciences