Search CORE

124 research outputs found

Performance of Parallel Computing in Bubble Sort Algorithm

Author: Rihartanto Rihartanto
Rizal Ansar
Susanto Arief
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/01/2017
Field of study

The performance of an algorithm can be improved by using a parallel computing programming approach. In this study, the performance of bubble sort algorithm on various computer specifications has been applied. Experimental results have shown that parallel computing programming can save significant time performance by 61%-65% compared to serial computing programming

ZENODO

Indonesian Journal of Electrical Engineering and Computer Science

Parallel software applications in high-energy physics

Author: Biskup Marek
Publication venue: CERN
Publication date: 01/01/2008
Field of study

Parallel programming allows the speed of computations to be increased by using multiple processors or computers working jointly on the same task. In parallel programming dif culties that are not present in sequential programming can be encountered, for instance communication between processors. The way of writing a parallel program depends strictly on the architecture of a parallel system. An ef cient program of this kind not only performs its computations faster than its sequential version, but also effectively uses the CPU time. Parallel programming has been present in high-energy physics for years. The lecture is an introduction to parallel computing in general. It discusses the motivation for parallel computations, hardware architectures of parallel systems and the key concepts of a parallel programming. It also relates parallel computing to high-energy physics and presents a parallel programming application in the eld, namely PROOF

CERN Document Server

Leveraging the potential of task-based programming with OpenMP task graphs

Author: Yu Chenle
Publication venue: Universitat Politècnica de Catalunya
Publication date: 25/10/2024
Field of study

Tesi amb menció de Doctorat Internacional(English) The task execution model is widely used in computer engineering, it helps developers to design, develop and understand software systems. OpenMP is the de-facto programming model to parallelize sequential algorithms on shared-memory machines. Coupled with the task parallelization, OpenMP is able to conveniently parallelize structured and non-structured applications, it also allows users to offload work onto accelerators as target tasks. However, the runtime overhead incurred by the OpenMP tasking model is an important concern for users to develop OpenMP task programs. This work focuses on improving OpenMP tasking model. Firstly, we carried out an analysis of the performance overhead and bottleneck of mainstream task implementations and proposed a solution in the OpenMP specification to tackle it. To elaborate, we observe that a significant portion of the overhead in the tasking model stems from thread contention, where multiple threads compete to access shared resources simultaneously, such as task queues, causing these threads to stall. As the number of cores in modern architectures increases, this further hampers the scalability of OpenMP tasking. We propose a mechanism that creates graphs representing sets of OpenMP tasks. Once built, executing such graphs incurs less runtime overhead by drastically reducing the access to shared resources. This mechanism is exposed to the users as a new OpenMP directive, namely Taskgraph. Secondly, we implemented the proposed solution, the taskgraph directive, in both GCC (prototype implementation) and LLVM compilers (complete implementation). Initially, our focus was on the GCC compiler, particularly its runtime system: libgomp. Our prototype implementation in this compiler demonstrated promising performance improvement using taskgraph. However, it also revealed a performance bottleneck in libgomp: all tasks are scheduled into a common queue, leading to significant contention and resulting in poor performance and scalability compared to LLVM. The complete implementation of the taskgraph framework is in the LLVM compiler. Our modifications in the compiler range from the front-end to the middle-end of the compiler, in addition to its runtime library: libomp. This framework allows users to declare taskgraph directives in OpenMP C/C++ code to create graphs conveniently, at either compile time or run-time. The experiments show that the taskgraph framework outperforms the original task implementations from GCC and LLVM. We carried out the experiments on nodes of the Marenostrum4 supercomputer. Finally, we enhance the OpenMP offloading mechanism by leveraging taskgraph. Particularly, we implemented the transformation of taskgraph to CUDA graphs. Consequently, our framework enhances the interoperability of OpenMP with other programming models (in this case, CUDA) and improves the performance of OpenMP accelerator model by alleviating the synchronization overhead. With these contributions, this thesis ameliorates both the OpenMP tasking and accelerator models. The framework has been used by other Ph.D. students to develop their research, for example, Cyril Cetre from Thales Research and Technology successfully improved the performance of a cyber-physical application by utilizing static generation of CUDA graphs, as presented in this manuscript. Furthermore, the OpenMP Language Committee accepted our proposition to include the taskgraph directive into the OpenMP Specification v6.0. This thesis also contributed to the upstream LLVM repository. These commits are mainly focused on the record-and-replay mechanism of taskgraph, serving also as a basis for the official taskgraph implementation in the LLVM. We hope with these endeavors, this work will promote the use of OpenMP task in general.(Català) El model d’execució de tasques és àmpliament utilitzat en l’enginyeria informàtica, ajudant els programadors a dissenyar, desenvolupar i comprendre sistemes de programari. OpenMP és el model de programació estàndard per paral·lelitzar algoritmes seqüencials en màquines de memòria compartida. Juntament amb la paral·lelització de tasques, OpenMP és capaç de paral·lelitzar convenientment aplicacions no estructurades i carregar les tasques en acceleradors com a target task. Tanmateix, la sobrecàrrega (i.e., overhead) de temps d’execució que implica el model de tasques d’OpenMP és una preocupació important per als usuaris que desenvolupen programes de tasques amb OpenMP. Aquest manuscrit se centra en la millora del model de tasques d’OpenMP. Més concretament, la primera contribució d’aquesta tesi consisteix a analitzar la sobrecàrrega de rendiment i els “bottlenecks” de les implementacions de tasques principals i proposar una solució en l’especificació d’OpenMP per abordar-la. Com a segona contribució, vam implementar la solució proposada, la directiva taskgraph, en el compilador LLVM. Juntament amb la implementació, vam avaluar el rendiment de la nostra solució per garantir que resol els problemes de rendiment. Finalment, vam explorar la possibilitat de millorar el mecanisme de offloading d’OpenMP aprofitant CUDA Graph i taskgraph. En fer-ho, també vam millorar la interoperabilitat d’OpenMP. Amb aquestes contribucions, millorem el model de tasques tant en l’especificació actual d’OpenMP com en la implementació. Esperem que aquest treball promogui l’ús de tasques d’OpenMP en general.(Español) El modelo de ejecución de tareas es ampliamente utilizado en la ingeniería informática, ayudando a los programadores a diseñar, desarrollar y comprender sistemas de software. OpenMP es el modelo de programación estándar para paralelizar algoritmos secuenciales en máquinas de memoria compartida. Junto con la paralelización de tareas, OpenMP es capaz de paralelizar convenientemente aplicaciones no estructuradas y cargar las tareas en aceleradores como un target task. Sin embargo, la sobrecarga (i.e., overhead) de tiempo de ejecución que incurre el modelo de tareas de OpenMP es una preocupación importante para los usuarios que desarrollan programas de tareas en OpenMP. Este manuscrito se centra en la mejora del modelo de tareas de OpenMP. Más concretamente, la primera contribución de esta tesis consiste en analizar la sobrecarga de rendimiento y los “bottlenecks” de las implementaciones de tareas principales y proponer una solución en la especificación de OpenMP para abordarla. Como segunda contribución, implementamos la solución propuesta, la directiva taskgraph, en el compilador LLVM. Junto con la implementación, evaluamos el rendimiento de nuestra solución para garantizar que resuelve los problemas de rendimiento. Finalmente, exploramos la posibilidad de mejorar el mecanismo de offloading de OpenMP aprovechando CUDA Graph y taskgraph. Al hacerlo, también mejoramos la interoperabilidad de OpenMP. Con estas contribuciones, mejoramos el modelo de tareas tanto en la especificación actual de OpenMP como en la implementación. Esperamos que este trabajo Promueva el uso de tareas de OpenMP en general.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

UPCommons (Universitat Politècnica de Catalunya)

Parallel Programming Recipes

Author: Nguyenphuc Thuy C.
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2010
Field of study

Parallel programming has become vital for the success of commercial applications since Moore’s Law will now be used to double the processors (or cores) per chip every technology generation. The performance of applications depends on how software executions can be mapped on the multi-core chip, and how efficiently they run the cores. Currently, the increase of parallelism in software development is necessary, not only for taking advantage of multi-core capability, but also for adapting and surviving in the new silicon implementation. This project will provide the performance characteristics of parallelism for some common algorithms or computations using different parallel languages. Based on concrete experiments, where each algorithm is implemented on different languages and the program’s performance is measured, the project provides the recipes for the problem computations. The following are the central problems and algorithms of the project: Arithmetic Algebra: Maclaurin Series Calculation for ex, Dot-Product of Two Vectors: each vector has size n; Sort Algorithms: Bubble sort, Odd-Event sort; Graphics: Graphics rendering. The languages are chosen based on commonality in the current market and ease of use; i.e., OpenMP, MPI, and OpenCL. The purpose of this study is to provide reader a broad knowledge about parallel programming, the comparisons, in terms of performance and implementation cost, across languages and application types. It is hoped to be very useful for programmers/computer-architects to decide which language to use for a certain applications/problems and cost estimations for the projects. Also, it is hoped that the project can be expanded in the future so that more languages/technologies as well as applications can be analyze

SJSU ScholarWorks

Fast algorithm for real-time rings reconstruction

Author: Ammendola R.
Bauce Matteo
Biagioni A.
Capuani S.
Chiozzi Stefano
Cotta Ramusino Angelo
Di Domenico Giovanni
Fantechi R.
Fiorini Massimiliano
Giagu S.
Gianoli Alberto
Graverini E.
Lamanna Gianluca
Lonardo A.
Messina A.
Neri Ilaria
Palombo Marco
Pantaleo F.
Paolucci P.S.
Piandani R.
Pontisso L.
Rescigno M.
Simula F.
Sozzi Marco
Vicini P.
Publication venue: Verlag Deutsches Elektronen-Synchrotron
Publication date: 01/01/2015
Field of study

The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of μs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

DESY Publication Database

DESY

Archivio istituzionale della ricerca - Università di Ferrara

Archivio della ricerca- Università di Roma La Sapienza

CERN Document Server