Search CORE

1,372 research outputs found

Parallelization Strategies for Ant Colony Optimisation on GPUs

Author: Amos Martyn
Cecilia Jose M.
Garcia Jose M.
Nisbet Andy
Ujaldon Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Ant Colony Optimisation (ACO) is an effective population-based meta-heuristic for the solution of a wide variety of problems. As a population-based algorithm, its computation is intrinsically massively parallel, and it is there- fore theoretically well-suited for implementation on Graphics Processing Units (GPUs). The ACO algorithm comprises two main stages: Tour construction and Pheromone update. The former has been previously implemented on the GPU, using a task-based parallelism approach. However, up until now, the latter has always been implemented on the CPU. In this paper, we discuss several parallelisation strategies for both stages of the ACO algorithm on the GPU. We propose an alternative data-based parallelism scheme for Tour construction, which fits better on the GPU architecture. We also describe novel GPU programming strategies for the Pheromone update stage. Our results show a total speed-up exceeding 28x for the Tour construction stage, and 20x for Pheromone update, and suggest that ACO is a potentially fruitful area for future research in the GPU domain.Comment: Accepted by 14th International Workshop on Nature Inspired Distributed Computing (NIDISC 2011), held in conjunction with the 25th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2011

arXiv.org e-Print Archive

CiteSeerX

Northumbria University Research Portal

Crossref

Accelerating ant colony optimization-based edge detection on the GPU using CUDA

Author: Dawson L.
Stewart I.A.
Publication venue
Publication date: 01/07/2014
Field of study

Ant Colony Optimization (ACO) is a nature-inspired metaheuristic that can be applied to a wide range of optimization problems. In this paper we present the first parallel implementation of an ACO-based (image processing) edge detection algorithm on the Graphics Processing Unit (GPU) using NVIDIA CUDA. We extend recent work so that we are able to implement a novel data-parallel approach that maps individual ants to thread warps. By exploiting the massively parallel nature of the GPU, we are able to execute significantly more ants per ACO-iteration allowing us to reduce the total number of iterations required to create an edge map. We hope that reducing the execution time of an ACO-based implementation of edge detection will increase its viability in image processing and computer vision

Durham Research Online

Crossref

GPU accelerated Nature Inspired Methods for Modelling Large Scale Bi-Directional Pedestrian Movement

Author: Dutta Sankha Baran
Friesen Marcia
McLeod Robert
Publication venue
Publication date: 16/12/2014
Field of study

Pedestrian movement, although ubiquitous and well-studied, is still not that well understood due to the complicating nature of the embedded social dynamics. Interest among researchers in simulating pedestrian movement and interactions has grown significantly in part due to increased computational and visualization capabilities afforded by high power computing. Different approaches have been adopted to simulate pedestrian movement under various circumstances and interactions. In the present work, bi-directional crowd movement is simulated where an equal numbers of individuals try to reach the opposite sides of an environment. Two movement methods are considered. First a Least Effort Model (LEM) is investigated where agents try to take an optimal path with as minimal changes from their intended path as possible. Following this, a modified form of Ant Colony Optimization (ACO) is proposed, where individuals are guided by a goal of reaching the other side in a least effort mode as well as a pheromone trail left by predecessors. The basic idea is to increase agent interaction, thereby more closely reflecting a real world scenario. The methodology utilizes Graphics Processing Units (GPUs) for general purpose computing using the CUDA platform. Because of the inherent parallel properties associated with pedestrian movement such as proximate interactions of individuals on a 2D grid, GPUs are well suited. The main feature of the implementation undertaken here is that the parallelism is data driven. The data driven implementation leads to a speedup up to 18x compared to its sequential counterpart running on a single threaded CPU. The numbers of pedestrians considered in the model ranged from 2K to 100K representing numbers typical of mass gathering events. A detailed discussion addresses implementation challenges faced and averted

arXiv.org e-Print Archive

Crossref

Applying ACO To Large Scale TSP Instances

Author: A DeléVacq
JH Holland
JM Cecilia
M Dorigo
M Dorigo
M Randall
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/09/2017
Field of study

Ant Colony Optimisation (ACO) is a well known metaheuristic that has proven successful at solving Travelling Salesman Problems (TSP). However, ACO suffers from two issues; the first is that the technique has significant memory requirements for storing pheromone levels on edges between cities and second, the iterative probabilistic nature of choosing which city to visit next at every step is computationally expensive. This restricts ACO from solving larger TSP instances. This paper will present a methodology for deploying ACO on larger TSP instances by removing the high memory requirements, exploiting parallel CPU hardware and introducing a significant efficiency saving measure. The approach results in greater accuracy and speed. This enables the proposed ACO approach to tackle TSP instances of up to 200K cities within reasonable timescales using a single CPU. Speedups of as much as 1200 fold are achieved by the technique

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Working with OpenCL to Speed Up a Genetic Programming Financial Forecasting Algorithm: Initial Results

Author: Backus J.
Edwards R.
Koza J.
Martinez-Jaramillo S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/07/2014
Field of study

The genetic programming tool EDDIE has been shown to be a successful financial forecasting tool, however it has suffered from an increase in execution time as new features have been added. Speed is an important aspect in financial problems, especially in the field of algorithmic trading, where a delay in taking a decision could cost millions. To offset this performance loss, EDDIE has been modified to take advantage of multi-core CPUs and dedicated GPUs. This has been achieved by modifying the candidate solution evaluation to use an OpenCL kernel, allowing the parallel evaluation of solutions. Our computational results have shown improvements in the running time of EDDIE when the evaluation was delegated to the OpenCL kernel running on a multi-core CPU, with speed ups up to 21 times faster than the original EDDIE algorithm. While most previous works in the literature reported significantly improvements in performance when running an OpenCL kernel on a GPU device, we did not observe this in our results. Further investigation revealed that memory copying overheads and branching code in the kernel are potentially causes of the (under-)performance of the OpenCL kernel when running on the GPU device

CiteSeerX

Crossref

Kent Academic Repository

Generic Techniques in General Purpose GPU Programming with Applications to Ant Colony and Image Processing Algorithms

Author: DAWSON LAURENCE,JAMES
Publication venue
Publication date: 01/01/2015
Field of study

In 2006 NVIDIA introduced a new unified GPU architecture facilitating general-purpose computation on the GPU. The following year NVIDIA introduced CUDA, a parallel programming architecture for developing general purpose applications for direct execution on the new unified GPU. CUDA exposes the GPU's massively parallel architecture of the GPU so that parallel code can be written to execute much faster than its sequential counterpart. Although CUDA abstracts the underlying architecture, fully utilising and scheduling the GPU is non-trivial and has given rise to a new active area of research. Due to the inherent complexities pertaining to GPU development, in this thesis we explore and find efficient parallel mappings of existing and new parallel algorithms on the GPU using NVIDIA CUDA. We place particular emphasis on metaheuristics, image processing and designing reusable techniques and mappings that can be applied to other problems and domains. We begin by focusing on Ant Colony Optimisation (ACO), a nature inspired heuristic approach for solving optimisation problems. We present a versatile improved data-parallel approach for solving the Travelling Salesman Problem using ACO resulting in significant speedups. By extending our initial work, we show how existing mappings of ACO on the GPU are unable to compete against their sequential counterpart when common CPU optimisation strategies are employed and detail three distinct candidate set parallelisation strategies for execution on the GPU. By further extending our data-parallel approach we present the first implementation of an ACO-based edge detection algorithm on the GPU to reduce the execution time and improve the viability of ACO-based edge detection. We finish by presenting a new color edge detection technique using the volume of a pixel in the HSI color space along with a parallel GPU implementation that is able to withstand greater levels of noise than existing algorithms

Durham e-Theses

A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications

Author: Genlin Ji
Shuihua Wang
Yudong Zhang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Particle swarm optimization (PSO) is a heuristic global optimization method, proposed originally by Kennedy and Eberhart in 1995. It is now one of the most commonly used optimization techniques. This survey presented a comprehensive investigation of PSO. On one hand, we provided advances with PSO, including its modifications (including quantum-behaved PSO, bare-bones PSO, chaotic PSO, and fuzzy PSO), population topology (as fully connected, von Neumann, ring, star, random, etc.), hybridization (with genetic algorithm, simulated annealing, Tabu search, artificial immune system, ant colony algorithm, artificial bee colony, differential evolution, harmonic search, and biogeography-based optimization), extensions (to multiobjective, constrained, discrete, and binary optimization), theoretical analysis (parameter selection and tuning, and convergence analysis), and parallel implementation (in multicore, multiprocessor, GPU, and cloud computing forms). On the other hand, we offered a survey on applications of PSO to the following eight fields: electrical and electronic engineering, automation control systems, communication theory, operations research, mechanical engineering, fuel and energy, medicine, chemistry, and biology. It is hoped that this survey would be beneficial for the researchers studying PSO algorithms

Crossref

Directory of Open Access Journals