785 research outputs found

    Particle Swarm Optimization with CUDA

    Get PDF
    In recent years, particles’ optimization algorithm has highly been used as an effective method in solving complex and difficult optimization problems. Since particles algorithm is based on recurring population and it can be very inefficient in terms of the time required for implementation and speed to solve optimization problems with large-scale including ones which need a very large population to search in problem solution space. The main reason for this issue is that this algorithm optimization process requires a large number of function evaluations which are usually run serially. This article aims to implement particles’ optimization algorithm in parallel on graphics processing unit and to improve running efficiency and speed. The implementation results on the graphics processor show that the performance of this algorithm has greatly increased as to its implementation in parallel and with change in kernel implementation. In fact, in this study, implementation and velocity evaluation of particles algorithm implementation in parallel and based on CUDA framework has been investigated and compared. Then, there have been efforts to improve acceleration in this method in part and a new method will be proposed in CUDA framework to improve acceleration in particles algorithm and graphic processor setting

    Agent-based modelling and Swarm Intelligence in systems engineering

    Get PDF
    El objetivo de la tesis doctoral es evaluar la utilidad de las técnicas Modelado Basado en Agentes, algoritmos de optimización Swarm Intelligence y programación paralela sobre tarjeta gráfica en el campo de la Ingeniería de Sistemas y Automática. Se ha realizado un revisión bibliográfica y desarrollado un marco de desarrollo de la técnica de Modelado Basado en Agentes. Esta técnica se ha empleado para realizar un modelo de un reactor de fangos activados (que se engloba dentro del proceso de depuración de aguas residuales). Se ha desarrollado una notación complementaria para la descripción de modelos basados en agentes desde el punto de vista de la ingeniería de sistemas. Se ha presentado asimismo un algoritmo de optimización basado en agentes bajo la filosofía Swarm Intelligence. Se han trabajado con las técnicas de paralelización sobre tarjeta gráfica para reducir los tiempos de simulación de modelos y algoritmos. Se trata por lo tanto de un tesis de integración de varias tecnologías.Departamento de Ingeniería de Sistemas y Automátic

    Towards enhancing coding productivity for GPU programming using static graphs

    Get PDF
    The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs. GPU capabilities have been increasing significantly in terms of performance and memory capacity. However, there are still some problems in terms of scalability and limitations to the amount of work that a GPU can perform at a time. To minimize the overhead associated with the launch of GPU kernels, as well as to maximize the use of GPU capacity, we have combined the new CUDA Graph API with the CUDA programming model (including CUDA math libraries) and the OpenACC programming model. We use as test cases two different, well-known and widely used problems in HPC and AI: the Conjugate Gradient method and the Particle Swarm Optimization. In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA. In this case, we are able to significantly outperform the NVIDIA reference code, reaching an acceleration of up to 11× thanks to a better implementation, which can benefit from the new CUDA Graph capabilities. In the second test case (Particle Swarm Optimization), we complement the OpenACC functionality with the use of CUDA Graph, achieving again accelerations of up to one order of magnitude, with average speedups ranging from 2× to 4×, and performance very close to a reference and optimized CUDA code. Our main target is to achieve a higher coding productivity model for GPU programming by using Static Graphs, which provides, in a very transparent way, a better exploitation of the GPU capacity. The combination of using Static Graphs with two of the current most important GPU programming models (CUDA and OpenACC) is able to reduce considerably the execution time w.r.t. the use of CUDA and OpenACC only, achieving accelerations of up to more than one order of magnitude. Finally, we propose an interface to incorporate the concept of Static Graphs into the OpenACC Specifications.his research was funded by EPEEC project from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No. 801051. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan, accessed on 13 April 2022).Peer ReviewedPostprint (published version

    Static Graphs for Coding Productivity in OpenACC

    Get PDF
    The main contribution of this work is to increase the coding productivity for GPU programming by using the concept of Static Graphs. To do so, we have combined the new CUDA Graph API with the OpenACC programming model. We use as test cases a well-known and widely used problems in HPC and AI: the Particle Swarm Optimization. We complement the OpenACC functionality with the use of CUDA Graph, achieving accelerations of more than one order of magnitude, and a performance very close to a reference and optimized CUDA code. Finally, we propose a new specification to incorporate the concept of Static Graphs into the OpenACC specification.This project has received funding from the EPEEC project from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No. 801051.Peer ReviewedPostprint (author's final draft

    A General Framework for Accelerating Swarm Intelligence Algorithms on FPGAs, GPUs and Multi-core CPUs

    Get PDF
    Swarm intelligence algorithms (SIAs) have demonstrated excellent performance when solving optimization problems including many real-world problems. However, because of their expensive computational cost for some complex problems, SIAs need to be accelerated effectively for better performance. This paper presents a high-performance general framework to accelerate SIAs (FASI). Different from the previous work which accelerate SIAs through enhancing the parallelization only, FASI considers both the memory architectures of hardware platforms and the dataflow of SIAs, and it reschedules the framework of SIAs as a converged dataflow to improve the memory access efficiency. FASI achieves higher acceleration ability by matching the algorithm framework to the hardware architectures. We also design deep optimized structures of the parallelization and convergence of FASI based on the characteristics of specific hardware platforms. We take the quantum behaved particle swarm optimization algorithm (QPSO) as a case to evaluate FASI. The results show that FASI improves the throughput of SIAs and provides better performance through optimizing the hardware implementations. In our experiments, FASI achieves a maximum of 290.7Mbit/s throughput which is higher than several existing systems, and FASI on FPGAs achieves a better speedup than that on GPUs and multi-core CPUs. FASI is up to 123 times and not less than 1.45 times faster in terms of optimization time on Xilinx Kintex Ultrascale xcku040 when compares to Intel Core i7-6700 CPU/ NVIDIA GTX1080 GPU. Finally, we compare the differences of deploying FASI on hardware platforms and provide some guidelines for promoting the acceleration performance according to the hardware architectures
    • …
    corecore