19 research outputs found

    A case study in synchronous parallel discrete event simulation

    Get PDF
    This paper considers the suitability of SPED, a synchronous parallel discrete event simulator, for the study of message passing networks. The simulation algorithm is described, and its potential performance is assessed showing that, under some simplifying assumptions, SPED might offer speedups directly proportional to the number of processors used in the simulation. An implementation of SPED in a distributed memory parallel system is used to study a model of an interconnection network for a multicomputer. Experiments show that SPED performs nearly as expected, as long as the event density imposed on the LPs is above a certain threshold. If this is not the case, the overhead due to synchronization plus communication dominates the execution time, and the achieved speedups are not as good. Some ways to improve the performance of SPED are proposed: a method to reduce the number of messages interchanged during the simulation, and a new algorithm for synchronous PDES, called PTD-NB (Parallel Time Driven- No Barriers), which reduces the synchronization overhead by removing barrier operations and can be easily implemented in multicomputer systems without support for global synchronization operations

    Dist a distribution independent parallel programs for matrix multiplication

    Get PDF
    This report considers the problem of writing data distribution independent (DDI) programs in order to eliminate or reduce initial data redistribution overheads for distributed memory parallel computers. The functionality and execution time of DDI programs are independent of initial data distributions. First, modular mappings, which can be used to derive many equally optimal ant1 functionally equivalent programs, are briefly reviewed. Relations between modular mappings and input data distributions are then established. These relations are the basis of a systematic approach to the derivation of DDI programs which is illustrated for matrix-matrix multiplication(c = a x b). Conditions on data distributions that correspond to an optimal modular mapping are: (1) the first row of the inverse of distribution pattern matrix of army \u27a\u27 should be equal to the second row of the inverse of distribution pattern matrix of array \u27b\u27) (2) the second row of the inverse of distribution pattern matrix of array \u27a\u27 should be linearly independent of the first row of the inverse of distribution pattern matrix of array \u27b\u27, and (3) each distribution pattern matrix of arrays \u27a\u27, \u27b\u27, and \u27c\u27 should have at [east one zero entry, respectively. It is shown that only twelve programs suffice to accomplish redistribution-free execution for the many input data distributions that satisfy the above conditions. When DDI matrix multiplication programs are used in an algorithm with multiple matrix products, half of data redistributions otherwise required can be eliminated

    Modular mappings of rectangular algorithms

    Get PDF
    Affine space-time mappings have been extensively studied for systolic array design and parallelizing compilation. However, there are practical important cases that require other types of transformations. This paper considers so-called modular mappings described by linear transformations modulo a constant vector. Sufficient conditions for these mappings to be one-to-one are investigated for rectangular domains of arbitrary dimensions. It is shown that a sufficient condition for a modular mapping to be one-to-one is that its (n x n) coefficient -matrix T has entries tii = k1 and tij = 0 for i \u3e j where \u3e is a total order on {1, 2,. ., n), n = domain dimension. These conditions are strengthened and extended for particular types of rectangular domains and a.ffine transformations modulo a coinstant vector. The results of this paper can be used to identify a space of valid modular mappings of specific algorithms into time and space. They are illustrated by examples which include Cannon\u27s matrix multiplication algorithm

    Experimental Evaluation of Affine Schedules for Matrix Multiplication on the MasPar Architecture

    No full text
    International audienceNo abstrac

    Experimental Evaluation of Affine Schedules for Matrix Multiplication on the MasPar Architecture

    No full text
    This paper reports an experimental study on the suitability of systolic algorithms scheduling methods to the automatic parallelization of algorithms on SIMD computers. We consider the matrix multiplication on the MasPar MP-1 architecture. We comparatively study different scheduling methods and the blocking of the best resulting algorithms. 1 Introduction This document reports an experimental study on the suitability of systolic algorithms scheduling methods to the automatic parallelization of algorithms on SIMD computers. We considered a simple yet fundamental algorithm that is computation and communication intensive --- matrix multiplication --- executed on the MasPar MP-1. We have found that the modeling of a machine as the MasPar is very challenging because of its hierarchical communication capabilities. The two major performance factors that we have identified are the communication time and the use of the memory --- registers vs. indexed memory access. We come to the conclusion th..

    Performance and Interoperability Issues in Incorporating Cluster Management Systems within a Wide-Area Network-Computing Environment

    No full text
    This paper describes the performance and interoperability issues that arise in the process of integrating cluster management systems into a wide-area networkcomputing environment, and provides solutions in the context of the Purdue University Network Computing Hubs (PUNCH). The described solution provides users with a single point of access to resources spread across administrative domains, and an intelligent translation process makes it possible for users to submit jobs to different types of cluster management systems in a transparent manner. The approach does not require any modifications to the cluster management software; however, call-back and caching capabilities that would improve performance and make such systems more interoperable with wide-area computing systems are discussed. 1. Introduction Cluster management systems manage access to workstations, servers, and specialized machines distributed across local-area networks. From a user's perspective, the systems provide a cent..
    corecore