6 research outputs found

    Processing Dependent Tasks on a Heterogeneous Gpu Resource Architecture

    Get PDF
    Abstract: In this dissertation, a heterogeneous GPUs system means the system consists of a variety of different types of GPUs. Many problems in science and engineering can be represented as a two dimensional grid where updating of each grid point value is dependent on its nearest neighbor's values. The grid size used may be too large to be handled on a single computing node. If a distributed and heterogeneous processors system is applied two crucial issues are introduced, namely, minimizing inter-processors communication and load balancing. Firstly, a novel partitioning algorithm for heterogeneous processors (NPHP) is proposed which is based on gird shape to choose an efficient way to divide blocks as square as possible to minimize communication cost. Secondly, a functional performance model with communication (FPMC) is proposed to estimate the absolute speeds of processors accurately. This method can accurately divide the workload proportional to the speeds of GPUs. Based on these two partitioning algorithms, a heterogeneous GPU system (HG) is implemented. The HG is different from other distributed GPU systems because HG can process dependent tasks which indicate the tasks in HG can communicate with each other. Furthermore, a dynamic component is designed and implement in HG system. Hence the neighbor relationship can change at run time. Using this architecture HG can deal with more complex task dependent applications. To validate our approach, a HG system running heat transfer and Gaussian Elimination is implemented. The results of experiment demonstrate that the heterogeneous GPU system has an essential advantage over traditional homogeneous GPU and CPUs system. For the static neighbor application, heat transfer, HG is at least 8 times faster than a MPI program running on CPU. For the dynamic neighbor application, Gaussian Elimination, HG can get 2.75 times speedup. Also we propose and implement some optimizations to improve performance. These include NPHP which reduces communication cost by at least 10%, and FMPC which improves the load balance by 10% on average. Optimization in the form of the data reuse technology in the computing kernel to utilize shared memory to reduce the global memory accesses yields a 7 times speedup.Computer Scienc

    Decomposing and packing polygons / Dania el-Khechen.

    Get PDF
    In this thesis, we study three different problems in the field of computational geometry: the partitioning of a simple polygon into two congruent components, the partitioning of squares and rectangles into equal area components while minimizing the perimeter of the cuts, and the packing of the maximum number of squares in an orthogonal polygon. To solve the first problem, we present three polynomial time algorithms which given a simple polygon P partitions it, if possible, into two congruent and possibly nonsimple components P 1 and P 2 : an O ( n 2 log n ) time algorithm for properly congruent components and an O ( n 3 ) time algorithm for mirror congruent components. In our analysis of the second problem, we experimentally find new bounds on the optimal partitions of squares and rectangles into equal area components. The visualization of the best determined solutions allows us to conjecture some characteristics of a class of optimal solutions. Finally, for the third problem, we present three linear time algorithms for packing the maximum number of unit squares in three subclasses of orthogonal polygons: the staircase polygons, the pyramids and Manhattan skyline polygons. We also study a special case of the problem where the given orthogonal polygon has vertices with integer coordinates and the squares to pack are (2 {604} 2) squares. We model the latter problem with a binary integer program and we develop a system that produces and visualizes optimal solutions. The observation of such solutions aided us in proving some characteristics of a class of optimal solutions

    Matrix-matrix multiplication on heterogeneous platforms

    No full text

    Matrix-Matrix Multiplication on Heterogeneous Platforms

    No full text
    (eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations, and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different-speed resources while minimizing the communication volume. We formally state this problem and prove its NP-completeness. Next we introduce a (polynomial) column-based heuristic, which turns out to be very satisfactory: we derive a theoretical performance guarantee for the heuristic, and we assess its practical usefulness through MPI experiments.(fre) Dans ce rapport, nous nous intéressons au problè me de l'implémentation du produit matrice-matrice sur des plateformes hété rogènes. Nous considérons deux sortes de ressources de calculs hété rogènes: des réseaux de stations hétérogènes et des collections de clusters hétérogènes. Intuitivement, le problème est d'équilibrer la charge sur ces ressources de vitesses différentes tout en minimisant le volume des communications. Après avoir correctement formulé le problème, nous établissons sa NP-complétude. Ensuite nous présentons une heuristique (polynomiale) qui donne en pratique des résultats très satisfaisant : nous garantissons une performance théorique pour l'heuristique et nous prouvons son utilité pratique grace à des expèriences MPI
    corecore