106 research outputs found

    Generic Techniques in General Purpose GPU Programming with Applications to Ant Colony and Image Processing Algorithms

    Get PDF
    In 2006 NVIDIA introduced a new unified GPU architecture facilitating general-purpose computation on the GPU. The following year NVIDIA introduced CUDA, a parallel programming architecture for developing general purpose applications for direct execution on the new unified GPU. CUDA exposes the GPU's massively parallel architecture of the GPU so that parallel code can be written to execute much faster than its sequential counterpart. Although CUDA abstracts the underlying architecture, fully utilising and scheduling the GPU is non-trivial and has given rise to a new active area of research. Due to the inherent complexities pertaining to GPU development, in this thesis we explore and find efficient parallel mappings of existing and new parallel algorithms on the GPU using NVIDIA CUDA. We place particular emphasis on metaheuristics, image processing and designing reusable techniques and mappings that can be applied to other problems and domains. We begin by focusing on Ant Colony Optimisation (ACO), a nature inspired heuristic approach for solving optimisation problems. We present a versatile improved data-parallel approach for solving the Travelling Salesman Problem using ACO resulting in significant speedups. By extending our initial work, we show how existing mappings of ACO on the GPU are unable to compete against their sequential counterpart when common CPU optimisation strategies are employed and detail three distinct candidate set parallelisation strategies for execution on the GPU. By further extending our data-parallel approach we present the first implementation of an ACO-based edge detection algorithm on the GPU to reduce the execution time and improve the viability of ACO-based edge detection. We finish by presenting a new color edge detection technique using the volume of a pixel in the HSI color space along with a parallel GPU implementation that is able to withstand greater levels of noise than existing algorithms

    Recent Advances on GPU Computing in Operations Research

    Get PDF
    In the last decade, Graphics Processing Units (GPUs) have gained an increasing popularity as accelerators for High Performance Computing (HPC) applications. Recent GPUs are not only powerful graphics engines but also highly threaded parallel computing processors that can achieve sustainable speedup as compared with CPUs. In this context, researchers try to exploit the capability of this architecture to solve difficult problems in many domains in science and engineering. In this article, we present recent advances on GPU Computing in Operations Research. We focus in particular on Integer Programming and Linear Programming

    Recent Advances on GPU Computing in Operations Research

    Get PDF
    Abstract-In the last decade, Graphics Processing Units (GPUs) have gained an increasing popularity as accelerators for High Performance Computing (HPC) applications. Recent GPUs are not only powerful graphics engines but also highly threaded parallel computing processors that can achieve sustainable speedup as compared with CPUs. In this context, researchers try to exploit the capability of this architecture to solve difficult problems in many domains in science and engineering. In this article, we present recent advances on GPU Computing in Operations Research. We focus in particular on Integer Programming and Linear Programming

    Data-driven prognostics and logistics optimisation:A deep learning journey

    Get PDF

    Data-driven prognostics and logistics optimisation:A deep learning journey

    Get PDF

    Accelerating local search algorithms for travelling salesman problem using gpu effectively

    Get PDF
    The main purpose of this study is to demonstrate the advantages of the GPU usage to solve computationally hard optimization problems. Thus, to solve the Travelling Salesman Problem, 2-opt and 3-opt methods were implemented in parallel. These search techniques compare every possible valid combination of the certain exchange system. It means that large numbers of calculations and comparisons are required. Through the parallelization of these methods via the GPU, performance has increased remarkably compared to performance in the CPU. Because of the distinctive manner of work and the complicated memory structure of GPU, implementations can be tough. Imprecise usage of GPU causes considerable decrease in the performance of the algorithm. Therefore, in addition to comparisons between GPU and CPU performances, the effect of GPU resource allocations on the GPU performance was examined. Allocating resources in different ways, several experiments on various sized travelling salesman problems were tested. According to the experiments, a technique was specified to utilize GPU resources ideally. Although GPU devices evolve day to day, some resources of them have still quite restricted capacity. For this reason, when it came to large scale problems, a special on-chip memory of the GPU device remained incapable. In order to overcome this issue, some helpful approaches were proposed. Basically, the problem was divided into parts. Parallelism was applied to each part separately. To sum up, the aim of this research is to give some useful insights about effective GPU usage and making researchers in the optimization area familiar with it

    HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via communication-optimized CPU data offloading)

    Full text link
    Full-graph training on graph neural networks (GNN) has emerged as a promising training method for its effectiveness. Full-graph training requires extensive memory and computation resources. To accelerate this training process, researchers have proposed employing multi-GPU processing. However the scalability of existing frameworks is limited as they necessitate maintaining the training data for every layer in GPU memory. To efficiently train on large graphs, we present HongTu, a scalable full-graph GNN training system running on GPU-accelerated platforms. HongTu stores vertex data in CPU memory and offloads training to GPUs. HongTu employs a memory-efficient full-graph training framework that reduces runtime memory consumption by using partition-based training and recomputation-caching-hybrid intermediate data management. To address the issue of increased host-GPU communication caused by duplicated neighbor access among partitions, HongTu employs a deduplicated communication framework that converts the redundant host-GPU communication to efficient inter/intra-GPU data access. Further, HongTu uses a cost model-guided graph reorganization method to minimize communication overhead. Experimental results on a 4XA100 GPU server show that HongTu effectively supports billion-scale full-graph GNN training while reducing host-GPU data communication by 25%-71%. Compared to the full-graph GNN system DistGNN running on 16 CPU nodes, HongTu achieves speedups ranging from 7.8X to 20.2X. For small graphs where the training data fits into the GPUs, HongTu achieves performance comparable to existing GPU-based GNN systems.Comment: 28 pages 11 figures, SIGMOD202

    A Deep Reinforcement Learning - based Hyperheuristic for the Flexible Traveling Repairman Problem with Drones

    Get PDF
    Masteroppgave i informatikkINF399MAMN-PROGMAMN-IN
    corecore