Search CORE

16 research outputs found

A Review on GPU Based Parallel Computing for NP Problems

Author: Swati S. Dhable, Santosh Kumar
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/12/2016
Field of study

Now a days there are different number of optimization problems are present. Which are NP problems to solve this problems parallel metaheuristic algorithm are required. Graph theories are most commonly studied combinational problems. In this paper providing the new move towards solve this combinational problem with GPU based parallel computing using CUDA architecture. Comparing those problem with relevant to the transfer rate, effective memory utilization and speedup etc. to acquire the paramount possible solution. By applying the different algorithms on the optimization problem to catch the efficient memory exploitation, synchronized execution, saving time and increasing speedup of execution. Due to this the speedup factor is enhance and get the best optimal solution

International Journal on Recent and Innovation Trends in Computing and Communication

Optimización PSO paralelizada para scheduling de flow-shop

Author: Blanco Anibal M.
Frutos Mariano
Iparraguirre Javier
Salmieri Leandro N.
Publication venue
Publication date: 20/02/2020
Field of study

El problema de scheduling de flow-shop (programación de la producción en una fábrica de flujo continuo) es de tipo NP-Hard, incluso para un número reducido de trabajos y de máquinas. Debido a su gran interés industrial, ha sido estudiado intensamente en las últimas décadas con el objeto de diseñar algoritmos que proporcionen soluciones de buena calidad en tiempos de cómputo aceptables para instancias de interés práctico. En este trabajo se presenta un algoritmo basado en optimización por enjambre de partículas (PSO) para el problema de scheduling de flow-shop. También se implementó una versión paralelizada que hace uso de placas gráficas NVIDIA utilizando la tecnología CUDA para acelerar las ejecuciones.Sociedad Argentina de Informática e Investigación Operativ

Optimización PSO paralelizada para scheduling de flow-shop

Author: Blanco Anibal M.
Frutos Mariano
Iparraguirre Javier
Salmieri Leandro N.
Publication venue
Publication date: 01/09/2019
Field of study

Servicio de Difusión de la Creación Intelectual

Optimización PSO paralelizada para scheduling de flow-shop

Author: Blanco Anibal M.
Frutos Mariano
Iparraguirre Javier
Salmieri Leandro N.
Publication venue
Publication date: 01/09/2019
Field of study

Methodology for modified whale optimization algorithm for solving appliances scheduling problem

Author: Hairani Norfazlirda
Khalid Khalizul
Mohd Bakeri Noorhadila
Mohd Nawi Mohd Nasrun
Omar Mohd Faizal
Publication venue: 'Akademia Baru Publishing'
Publication date: 01/01/2020
Field of study

Whale Optimization Algorithm (WOA) is considered as one of the newest metaheuristic algorithms to be used for solving a type of NP-hard problems. WOA is known of having slow convergence and at the same time, the computation of the algorithm will also be increased exponentially with multiple objectives and huge request from n users. The current constraints surely limit for solving and optimizing the quality of Demand Side Management (DSM) case, such as the energy consumption of indoor comfort index parameters which consist of thermal comfort, air quality, humidity and vision comfort.To address these issues, this proposed work will firstly justify and validate the constraints related to the appliances scheduling problem, and later proposes a new model of the Cluster based Multi-Objective WOA with multiple restart strategy. In order to achieve the objectives, different initialization strategy and cluster-based approaches will be used for tuning the main parameter of WOA under different MapReduce application which helps to control exploration and exploitation, and the proposed model will be tested on a set of well-known test functions and finally, will be applied on a real case project i.e. appliances scheduling problem. It is anticipating that the approach can expedite the convergence of meta-heuristic technique with quality solution

UUM Repository

Efficient architectures of heterogeneous fpga-gpu for 3-d medical image compression

Author: Muharam Azlan
Publication venue
Publication date: 01/04/2019
Field of study

The advent of development in three-dimensional (3-D) imaging modalities have generated a massive amount of volumetric data in 3-D images such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US). Existing survey reveals the presence of a huge gap for further research in exploiting reconfigurable computing for 3-D medical image compression. This research proposes an FPGA based co-processing solution to accelerate the mentioned medical imaging system. The HWT block implemented on the sbRIO-9632 FPGA board is Spartan 3 (XC3S2000) chip prototyping board. Analysis and performance evaluation of the 3-D images were been conducted. Furthermore, a novel architecture of context-based adaptive binary arithmetic coder (CABAC) is the advanced entropy coding tool employed by main and higher profiles of H.264/AVC. This research focuses on GPU implementation of CABAC and comparative study of discrete wavelet transform (DWT) and without DWT for 3-D medical image compression systems. Implementation results on MRI and CT images, showing GPU significantly outperforming single-threaded CPU implementation. Overall, CT and MRI modalities with DWT outperform in term of compression ratio, peak signal to noise ratio (PSNR) and latency compared with images without DWT process. For heterogeneous computing, MRI images with various sizes and format, such as JPEG and DICOM was implemented. Evaluation results are shown for each memory iteration, transfer sizes from GPU to CPU consuming more bandwidth or throughput. For size 786, 486 bytes JPEG format, both directions consumed bandwidth tend to balance. Bandwidth is relative to the transfer size, the larger sizing will take more latency and throughput. Next, OpenCL implementation for concurrent task via dedicated FPGA. Finding from implementation reveals, OpenCL on batch procession mode with AOC techniques offers substantial results where the amount of logic, area, register and memory increased proportionally to the number of batch. It is because of the kernel will copy the kernel block refer to batch number. Therefore memory bank increased periodically related to kernel block. It was found through comparative study that the tree balance and unroll loop architecture provides better achievement, in term of local memory, latency and throughput

UTHM Institutional Repository

Accelerating supply chains with Ant Colony Optimization across range of hardware solutions

Author: Dzalbs I
Kalganova T
Publication venue: 'Elsevier BV'
Publication date: 22/01/2020
Field of study

This pre-print, arXiv:2001.08102v1 [cs.NE], was published subsequently by Elsevier in Computers and Industrial Engineering, vol. 147, 106610, pp. 1-14 on 29 Jun 2020 and is available at https://doi.org/10.1016/j.cie.2020.106610Ant Colony algorithm has been applied to various optimization problems, however most of the previous work on scaling and parallelism focuses on Travelling Salesman Problems (TSPs). Although, useful for benchmarks and new idea comparison, the algorithmic dynamics does not always transfer to complex real-life problems, where additional meta-data is required during solution construction. This paper looks at real-life outbound supply chain problem using Ant Colony Optimization (ACO) and its scaling dynamics with two parallel ACO architectures - Independent Ant Colonies (IAC) and Parallel Ants (PA). Results showed that PA was able to reach a higher solution quality in fewer iterations as the number of parallel instances increased. Furthermore, speed performance was measured across three different hardware solutions - 16 core CPU, 68 core Xeon Phi and up to 4 Geforce GPUs. State of the art, ACO vectorization techniques such as SS-Roulette were implemented using C++ and CUDA. Although excellent for TSP, it was concluded that for the given supply chain problem GPUs are not suitable due to meta-data access footprint required. Furthermore, compared to their sequential counterpart, vectorized CPU AVX2 implementation achieved 25.4x speedup on CPU while Xeon Phi with its AVX512 instruction set reached 148x on PA with Vectorized (PAwV). PAwV is therefore able to scale at least up to 1024 parallel instances on the supply chain network problem solved

arXiv.org e-Print Archive

Brunel University Research Archive