3 research outputs found

    Parametric micro-level performance models for parallel computing and parallel implementation of hydrostatic MM5

    Get PDF
    This dissertation presents Parametric micro-level performance models and Parallel implementation of the hydrostatic version of MM5;Parametric micro-level (PM) performance models are introduced to address the important issue of how to realistically model parallel performance. These models can be used to predict execution times and identify performance bottlenecks. The accurate prediction and analysis of execution times is achieved by incorporating precise details of interprocessor communication, memory operations, auxiliary instructions, and effects of communication and computation schedules. The parameters provide the flexibility to study various algorithmic and architectural issues. The development and verification process, parameters and the scope of applicability of these models are discussed. A coherent view of performance is obtained from the execution profiles generated by PM models. The models are targeted at a large class numerical algorithms commonly implemented on both SIMD and MIMD machines. Specific models are presented for matrix multiplication, LU decomposition, and FFT on a 2-D processor array with distributed memory. A case study includes comparison of parallel machines and parallel algorithms. In a comparison of parallel machines, PM models are used to analyze execution times so as to relate the performance to architectural attributes of a machine. In a comparison of parallel algorithms, PM models are used to study performance of two LU decomposition algorithms: non-blocked and blocked. Two algorithms are compared to identify the tradeoffs between them. This analysis is useful to determine an optimum block size for the blocked algorithm. The case study is done on MasPar MP-1 and MP-2 machines;The dissertation also describes the parallel implementation of the hydrostatic version of MM5 (the fifth generation of Mesoscale Model), which has been widely used for climate studies. The model was parallelized in machine-independent manner using the Runtime System Library (RSL), a runtime library for handling message-passing and index transformation. The dissertation discusses validation of the parallel implementation of MM5 using field data and presents performance results. The parallel model was tested on the IBM SP1, a distributed memory parallel computer

    An efficient parallelization of a real scientific application

    Get PDF
    Bibliography: leaves 137-145.In the past decade the cost of computing has come down considerably making high-powered computing more easily affordable. As a result many institutions and organisations now have networks of high-powered workstations. Such networks provide a large, generally untapped, source of computing power which can be used for running large scientific applications which previously could only be run on supercomputers. This dissertation shows that a substantial improvement in performance can be achieved by the parallelization of a real scientific application for a heterogeneous network of Sun and Silicon Graphics workstations connected by an Ethernet network, but that this is affected by a number of factors. These factors include communication delays, load balancing, and the number of slaves used. This dissertation shows that performance can be improved by sending more, shorter messages, and by overlapping communication with computation. Part of this thesis concerns the difficulties involved in the evaluation of parallel performance on a heterogeneous network. This dissertation shows that conventional methods such as speedup and efficiency are not appropriate for evaluating the performance of a heterogeneous system, and that linear speed gives a much more representative indication of the actual performance achieved. We also proposed new concepts of perfect linear speed and linear efficiency, which help to evaluate the improvement in parallel performance on a heterogeneous system
    corecore