34 research outputs found
Actes du deuxième colloque africain sur la recherche en informatique = Proceedings of the second African Conference on research in computer science
Dans ce travail nous donnons une borne inférieure du temps d'exécution parallèle pour le graphe de précédence des taches grille 2D surun modèle d'architecture multiprocesseur MIMD. Les coûts des communications sont pris en compte et sont considérés comme étant une fonction linéaire de la taille des données échangées entre processeurs. Ensuite nous montrons que la borne inférieure peut être atteinte en utilisant une architecture multiprocesseur à mémoire distribuée : l'anneau des processeurs. (Résumé d'auteur
Well balanced sparse matrix-vector multiplication on a parallel heterogeneous system
This paper discusses well balanced implementations of sparse matrix-vector multiplication on heterogeneous environments. A new heuristic is proposed for balancing the computing load over the processors proportionally to their power. This is done by defining a distribution model which splits the sparse matrix in k-way partitions, in order to minimize the total execution time. An implementation of the sparse matrix vector multiplication in heterogeneous environment using parallel object-oriented programming model POP-C++ shows that this ID-partitioning heuristic improve greatly the performance of the product, in comparison with block row decompositio
Real-time motion tracking using optical flow on multiple GPUs
Motion tracking algorithms are widely used in computer vision related research. However, the new video standards, especially those in high resolutions, cause that current implementations, even running on modern hardware, no longer meet the needs of real-time processing. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have recently been proposed. Although they present a great potential of a GPU platform, hardly any is able to process high definition video sequences efficiently. Thus, a need arose to develop a tool being able to address the outlined problem. In this paper we present software that implements optical flow motion tracking using the Lucas-Kanade algorithm. It is also integrated with the Harris corner detector and therefore the algorithm may perform sparse tracking, i.e. tracking of the meaningful pixels only. This allows to substantially lower the computational burden of the method. Moreover, both parts of the algorithm, i.e. corner selection and tracking, are implemented on GPU and, as a result, the software is immensely fast, allowing for real-time motion tracking on videos in Full HD or even 4K format. In order to deliver the highest performance, it also supports multiple GPU systems, where it scales up very well
Real-time motion tracking using optical flow on multiple GPUs
Motion tracking algorithms are widely used in computer vision related research. However, the new video standards, especially those in high resolutions, cause that current implementations, even running on modern hardware, no longer meet the needs of real-time processing. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have recently been proposed. Although they present a great potential of a GPU platform, hardly any is able to process high definition video sequences efficiently. Thus, a need arose to develop a tool being able to address the outlined problem. In this paper we present software that implements optical flow motion tracking using the Lucas-Kanade algorithm. It is also integrated with the Harris corner detector and therefore the algorithm may perform sparse tracking, i.e. tracking of the meaningful pixels only. This allows to substantially lower the computational burden of the method. Moreover, both parts of the algorithm, i.e. corner selection and tracking, are implemented on GPU and, as a result, the software is immensely fast, allowing for real-time motion tracking on videos in Full HD or even 4K format. In order to deliver the highest performance, it also supports multiple GPU systems, where it scales up very well
Optimal scheduling and granularity for a 2D-grid precedence graph on a MIMD computer
this paper, we address the point of the validation of these results to existing parallel distributed-memory architectures. We start by giving a lower bound of execution time for a fixed job granularity, when communication time is less than job computation time. We show further that this lower bound can be reached using a ring of processors. These results are straightforwardly derived from [2]. We deduce optimal job granularity and corresponding optimal execution time and number of processors. Experimental results obtained on an Intel Paragon with 24 processors are discussed. In our developments, we consider a multicomputer composed of p identical processors, communicating messages at a constant cost t com with overlapping capabilities of computation and communication. We suppose moreover that communications between 2 processors are sequentially processed and not pipelined. Let us first state the problem and describe the methodology of its solution, proposed in [3]. A 2D-grid precedence graph of n 1 by n 2 elementary tasks T i;j ,0 i ! n 1 , 0 j ! n
Tradeoff to minimize extra-computations and stopping criterion tests for parallel iterative schemes
tests for parallel iterative scheme
Integration of Grid Cost Model into ISS/VIOLA Meta-Scheduler Environment
The Broker with the cost function model of the ISS/VIOLA Meta-Scheduling System implementation is described in details. The Broker includes all the algorithmic steps needed to determine a well suited machine for an application component. This judicious choice is based on a deterministic cost function model including a set of parameters that can be adapted to policies set up by computing centres or application owners. All the quantities needed for the cost function can be found in the DataWarehouse, or are available through the schedulers of the different machines forming the Grid. An ISS-Simulator has been designed to simulate the real-life scheduling of existent clusters and to virtually include new parallel machines. It will be used to validate the cost model and to tune the different free parameters
Least squares spline regression with block-diagonal variance matrices
SIGLEAvailable from British Library Document Supply Centre- DSC:6029.319(NPL-DITC--40/84) / BLDSC - British Library Document Supply CentreGBUnited Kingdo