Search CORE

106 research outputs found

Generic Techniques in General Purpose GPU Programming with Applications to Ant Colony and Image Processing Algorithms

Author: DAWSON LAURENCE,JAMES
Publication venue
Publication date: 01/01/2015
Field of study

In 2006 NVIDIA introduced a new unified GPU architecture facilitating general-purpose computation on the GPU. The following year NVIDIA introduced CUDA, a parallel programming architecture for developing general purpose applications for direct execution on the new unified GPU. CUDA exposes the GPU's massively parallel architecture of the GPU so that parallel code can be written to execute much faster than its sequential counterpart. Although CUDA abstracts the underlying architecture, fully utilising and scheduling the GPU is non-trivial and has given rise to a new active area of research. Due to the inherent complexities pertaining to GPU development, in this thesis we explore and find efficient parallel mappings of existing and new parallel algorithms on the GPU using NVIDIA CUDA. We place particular emphasis on metaheuristics, image processing and designing reusable techniques and mappings that can be applied to other problems and domains. We begin by focusing on Ant Colony Optimisation (ACO), a nature inspired heuristic approach for solving optimisation problems. We present a versatile improved data-parallel approach for solving the Travelling Salesman Problem using ACO resulting in significant speedups. By extending our initial work, we show how existing mappings of ACO on the GPU are unable to compete against their sequential counterpart when common CPU optimisation strategies are employed and detail three distinct candidate set parallelisation strategies for execution on the GPU. By further extending our data-parallel approach we present the first implementation of an ACO-based edge detection algorithm on the GPU to reduce the execution time and improve the viability of ACO-based edge detection. We finish by presenting a new color edge detection technique using the volume of a pixel in the HSI color space along with a parallel GPU implementation that is able to withstand greater levels of noise than existing algorithms

Durham e-Theses

Recent Advances on GPU Computing in Operations Research

Author: Didier El Baz
Vincent Boyer
Publication venue
Publication date: 24/04/2020
Field of study

In the last decade, Graphics Processing Units (GPUs) have gained an increasing popularity as accelerators for High Performance Computing (HPC) applications. Recent GPUs are not only powerful graphics engines but also highly threaded parallel computing processors that can achieve sustainable speedup as compared with CPUs. In this context, researchers try to exploit the capability of this architecture to solve difficult problems in many domains in science and engineering. In this article, we present recent advances on GPU Computing in Operations Research. We focus in particular on Integer Programming and Linear Programming

CiteSeerX

Recent Advances on GPU Computing in Operations Research

Author: Didier El Baz
Vincent Boyer
Publication venue
Publication date: 24/04/2020
Field of study

Abstract-In the last decade, Graphics Processing Units (GPUs) have gained an increasing popularity as accelerators for High Performance Computing (HPC) applications. Recent GPUs are not only powerful graphics engines but also highly threaded parallel computing processors that can achieve sustainable speedup as compared with CPUs. In this context, researchers try to exploit the capability of this architecture to solve difficult problems in many domains in science and engineering. In this article, we present recent advances on GPU Computing in Operations Research. We focus in particular on Integer Programming and Linear Programming

CiteSeerX

Data-driven prognostics and logistics optimisation:A deep learning journey

Author: de Oliveira da Costa Paulo Roberto
Publication venue: Eindhoven University of Technology
Publication date: 03/02/2022
Field of study

Pure OAI Repository

Data-driven prognostics and logistics optimisation:A deep learning journey

Author: de Oliveira da Costa Paulo Roberto
Publication venue: Eindhoven University of Technology
Publication date: 03/02/2022
Field of study

Pure OAI Repository

Accelerating local search algorithms for travelling salesman problem using gpu effectively

Author: Ermiş Gizem
Publication venue
Publication date: 01/01/2015
Field of study

The main purpose of this study is to demonstrate the advantages of the GPU usage to solve computationally hard optimization problems. Thus, to solve the Travelling Salesman Problem, 2-opt and 3-opt methods were implemented in parallel. These search techniques compare every possible valid combination of the certain exchange system. It means that large numbers of calculations and comparisons are required. Through the parallelization of these methods via the GPU, performance has increased remarkably compared to performance in the CPU. Because of the distinctive manner of work and the complicated memory structure of GPU, implementations can be tough. Imprecise usage of GPU causes considerable decrease in the performance of the algorithm. Therefore, in addition to comparisons between GPU and CPU performances, the effect of GPU resource allocations on the GPU performance was examined. Allocating resources in different ways, several experiments on various sized travelling salesman problems were tested. According to the experiments, a technique was specified to utilize GPU resources ideally. Although GPU devices evolve day to day, some resources of them have still quite restricted capacity. For this reason, when it came to large scale problems, a special on-chip memory of the GPU device remained incapable. In order to overcome this issue, some helpful approaches were proposed. Basically, the problem was divided into parts. Parallelism was applied to each part separately. To sum up, the aim of this research is to give some useful insights about effective GPU usage and making researchers in the optimization area familiar with it

Sabanci University Research Database

HongTu: Scalable Full-Graph GNN Training on Multiple GPUs (via communication-optimized CPU data offloading)

Author: Chen Yao
He Bingsheng
Wang Qiange
Wong Weng-Fai
Publication venue
Publication date: 24/11/2023
Field of study

Full-graph training on graph neural networks (GNN) has emerged as a promising training method for its effectiveness. Full-graph training requires extensive memory and computation resources. To accelerate this training process, researchers have proposed employing multi-GPU processing. However the scalability of existing frameworks is limited as they necessitate maintaining the training data for every layer in GPU memory. To efficiently train on large graphs, we present HongTu, a scalable full-graph GNN training system running on GPU-accelerated platforms. HongTu stores vertex data in CPU memory and offloads training to GPUs. HongTu employs a memory-efficient full-graph training framework that reduces runtime memory consumption by using partition-based training and recomputation-caching-hybrid intermediate data management. To address the issue of increased host-GPU communication caused by duplicated neighbor access among partitions, HongTu employs a deduplicated communication framework that converts the redundant host-GPU communication to efficient inter/intra-GPU data access. Further, HongTu uses a cost model-guided graph reorganization method to minimize communication overhead. Experimental results on a 4XA100 GPU server show that HongTu effectively supports billion-scale full-graph GNN training while reducing host-GPU data communication by 25%-71%. Compared to the full-graph GNN system DistGNN running on 16 CPU nodes, HongTu achieves speedups ranging from 7.8X to 20.2X. For small graphs where the training data fits into the GPUs, HongTu achieves performance comparable to existing GPU-based GNN systems.Comment: 28 pages 11 figures, SIGMOD202

arXiv.org e-Print Archive

A Deep Reinforcement Learning - based Hyperheuristic for the Flexible Traveling Repairman Problem with Drones

Author: Birkenes Vegard Øverland
Publication venue: The University of Bergen
Publication date: 11/08/2023
Field of study

Masteroppgave i informatikkINF399MAMN-PROGMAMN-IN

University of Bergen