340 research outputs found
GPU accelerated Nature Inspired Methods for Modelling Large Scale Bi-Directional Pedestrian Movement
Pedestrian movement, although ubiquitous and well-studied, is still not that
well understood due to the complicating nature of the embedded social dynamics.
Interest among researchers in simulating pedestrian movement and interactions
has grown significantly in part due to increased computational and
visualization capabilities afforded by high power computing. Different
approaches have been adopted to simulate pedestrian movement under various
circumstances and interactions. In the present work, bi-directional crowd
movement is simulated where an equal numbers of individuals try to reach the
opposite sides of an environment. Two movement methods are considered. First a
Least Effort Model (LEM) is investigated where agents try to take an optimal
path with as minimal changes from their intended path as possible. Following
this, a modified form of Ant Colony Optimization (ACO) is proposed, where
individuals are guided by a goal of reaching the other side in a least effort
mode as well as a pheromone trail left by predecessors. The basic idea is to
increase agent interaction, thereby more closely reflecting a real world
scenario. The methodology utilizes Graphics Processing Units (GPUs) for general
purpose computing using the CUDA platform. Because of the inherent parallel
properties associated with pedestrian movement such as proximate interactions
of individuals on a 2D grid, GPUs are well suited. The main feature of the
implementation undertaken here is that the parallelism is data driven. The data
driven implementation leads to a speedup up to 18x compared to its sequential
counterpart running on a single threaded CPU. The numbers of pedestrians
considered in the model ranged from 2K to 100K representing numbers typical of
mass gathering events. A detailed discussion addresses implementation
challenges faced and averted
A Review on GPU Based Parallel Computing for NP Problems
Now a days there are different number of optimization problems are present. Which are NP problems to solve this problems parallel metaheuristic algorithm are required. Graph theories are most commonly studied combinational problems. In this paper providing the new move towards solve this combinational problem with GPU based parallel computing using CUDA architecture. Comparing those problem with relevant to the transfer rate, effective memory utilization and speedup etc. to acquire the paramount possible solution. By applying the different algorithms on the optimization problem to catch the efficient memory exploitation, synchronized execution, saving time and increasing speedup of execution. Due to this the speedup factor is enhance and get the best optimal solution
Generic Techniques in General Purpose GPU Programming with Applications to Ant Colony and Image Processing Algorithms
In 2006 NVIDIA introduced a new unified GPU architecture facilitating general-purpose computation on the GPU. The following year NVIDIA introduced CUDA, a parallel programming architecture for developing general purpose applications for direct execution on the new unified GPU. CUDA exposes the GPU's massively parallel architecture of the GPU so that parallel code can be written to execute much faster than its sequential counterpart. Although CUDA abstracts the underlying architecture, fully utilising and scheduling the GPU is non-trivial and has given rise to a new active area of research. Due to the inherent complexities pertaining to GPU development, in this thesis we explore and find efficient parallel mappings of existing and new parallel algorithms on the GPU using NVIDIA CUDA. We place particular emphasis on metaheuristics, image processing and designing reusable techniques and mappings that can be applied to other problems and domains.
We begin by focusing on Ant Colony Optimisation (ACO), a nature inspired heuristic approach for solving optimisation problems. We present a versatile improved data-parallel approach for solving the Travelling Salesman Problem using ACO resulting in significant speedups. By extending our initial work, we show how existing mappings of ACO on the GPU are unable to compete against their sequential counterpart when common CPU optimisation strategies are employed and detail three distinct candidate set parallelisation strategies for execution on the GPU. By further extending our data-parallel approach we present the first implementation of an ACO-based edge detection algorithm on the GPU to reduce the execution time and improve the viability of ACO-based edge detection. We finish by presenting a new color edge detection technique using the volume of a pixel in the HSI color space along with a parallel GPU implementation that is able to withstand greater levels of noise than existing algorithms
Accelerating ant colony optimization-based edge detection on the GPU using CUDA
Ant Colony Optimization (ACO) is a nature-inspired metaheuristic that can be applied to a wide range of optimization problems. In this paper we present the first parallel implementation of an ACO-based (image processing) edge detection algorithm on the Graphics Processing Unit (GPU) using NVIDIA CUDA. We extend recent work so that we are able to implement a novel data-parallel approach that maps individual ants to thread warps. By exploiting the massively parallel nature of the GPU, we are able to execute significantly more ants per ACO-iteration allowing us to reduce the total number of iterations required to create an edge map. We hope that reducing the execution time of an ACO-based implementation of edge detection will increase its viability in image processing and computer vision
Parallelization Strategies for Ant Colony Optimisation on GPUs
Ant Colony Optimisation (ACO) is an effective population-based meta-heuristic
for the solution of a wide variety of problems. As a population-based
algorithm, its computation is intrinsically massively parallel, and it is
there- fore theoretically well-suited for implementation on Graphics Processing
Units (GPUs). The ACO algorithm comprises two main stages: Tour construction
and Pheromone update. The former has been previously implemented on the GPU,
using a task-based parallelism approach. However, up until now, the latter has
always been implemented on the CPU. In this paper, we discuss several
parallelisation strategies for both stages of the ACO algorithm on the GPU. We
propose an alternative data-based parallelism scheme for Tour construction,
which fits better on the GPU architecture. We also describe novel GPU
programming strategies for the Pheromone update stage. Our results show a total
speed-up exceeding 28x for the Tour construction stage, and 20x for Pheromone
update, and suggest that ACO is a potentially fruitful area for future research
in the GPU domain.Comment: Accepted by 14th International Workshop on Nature Inspired
Distributed Computing (NIDISC 2011), held in conjunction with the 25th
IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS
2011
Improvement of Fuzzy Geographically Weighted Clustering-Ant Colony Optimization Performance using Context-Based Clustering and CUDA Parallel Programming
Geo-demographic analysis (GDA) is the study of population characteristics by geographical area. Fuzzy Geographically Weighted Clustering (FGWC) is an effective algorithm used in GDA. Improvement of FGWC has been done by integrating a metaheuristic algorithm, Ant Colony Optimization (ACO), as a global optimization tool to increase the clustering accuracy in the initial stage of the FGWC algorithm. However, using ACO in FGWC increases the time to run the algorithm compared to the standard FGWC algorithm. In this paper, context-based clustering and CUDA parallel programming are proposed to improve the performance of the improved algorithm (FGWC-ACO). Context-based clustering is a method that focuses on the grouping of data based on certain conditions, while CUDA parallel programming is a method that uses the graphical processing unit (GPU) as a parallel processing tool. The Indonesian Population Census 2010 was used as the experimental dataset. It was shown that the proposed methods were able to improve the performance of FGWC-ACO without reducing the clustering quality of the original method. The clustering quality was evaluated using the clustering validity index
- …