20 research outputs found

    List Scheduling with and without Communication Delays

    No full text
    Empirical results have shown that the classical critical path (CP) list scheduling heuristic for task graphs is a fast and practical heuristic when communication cost is zero. In the first part of this paper we study the theoretical properties of the CP heuristic that lead to near optimum performance in practice. In the second part we extend the CP analysis to the problem of ordering the task execution when the processor assignment is given and communication cost is nonzero. We propose two new list scheduling heuristics, the RCP and RCP 3 that use critical path information and ready list priority scheduling. We show that the performance properties for RCP and RCP 3 , when communication is nonzero, are similar to CP when communication is zero. Finally, we present an extensive experimental study and optimality analysis of the heuristics which verifies our theoretical results. 1 Introduction The processor scheduling problem is of considerable importance in parallel processing. Given a..

    A Fast Static Scheduling Algorithm for DAGs on an Unbounded Number of Processors

    No full text
    Scheduling parallel tasks on an unbounded number of completely connected processors when communication overhead is taken into account is NP-complete. Assuming that task duplication is not allowed, we propose a fast heuristic algorithm, called the dominant sequence clustering algorithm (DSC), for this scheduling problem. The DSC algorithm is superior to several other algorithms from the literature in terms of both computational complexity and parallel time. We present experimental results for scheduling general directed acyclic task graphs (DAGs) and compare the performance of several algorithms. Moreover, we show that DSC is optimum for special classes of DAGs such as join, fork and coarse grain tree graphs. 1 Introduction Scheduling parallel tasks with precedence relations over distributed memory multiprocessors has been found to be much more difficult than the classical scheduling problem, see Graham [14] and Lenstra and Kan [15]. This is because data transferring between processor..

    On The Granularity And Clustering Of Directed Acyclic Task Graphs

    No full text
    Clustering has been used as a compile time pre-processing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, the impact of the granularity on clustering strategies is analyzed. A clustering is called linear if every cluster is one simple directed path in the task graph; otherwise is called nonlinear. For coarse grain directed acyclic task graphs (DAGs), a completely connected architecture with unbounded number of processors and under the assumption that task duplication is not allowed, the following property is shown: For every nonlinear clustering there exists a linear clustering with less or equal parallel time. This property, along with a performance bound for linear clustering algorithms, shows that linear clustering is the best choice for coarse grain DAGs. It provides a theoretical justificati..

    Scalable Parallelization of Harmonic Balance Simulation

    No full text
    A new approach to parallelizing harmonic balance simulation is presented. The technique leverages circuit substructure to expose potential parallelism in the form of a directed, acyclic graph (dag) of computations. This dag is then allocated and scheduled using various linear clustering techniques. The result is a highly scalable and efficient approach to harmonic balance simulation. Two large examples, one from the integrated circuit regime and another from the communication regime, executed on three different parallel computers are used to demonstrate the efficacy of the approach

    PYRROS: Static Task Scheduling and Code Generation for Message Passing Multiprocessors

    No full text
    We describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and preliminary experiments show performance comparable to the "best" hand-written programs. 1 Introduction In this paper, we consider static scheduling and code generation for message passing architectures. There are generally three distinct ways in addressing the programming difficulties for distributed memory architectures. The first approach considers the problem of automatic parallelization and scheduling from sequential programs. The emphasis has been in the development of compilers or software tools that will assist in programming parallel architectures [2, 16, 18, 19]. Since message passing architectures require coarse grain parallelism to be efficient, one difficulty is the identification of parallelism especially at the procedural ..

    A Parallel Programming Tool for Scheduling on Distributed Memory Multiprocessors

    No full text
    PYRROS is a tool for scheduling and parallel code generation for distributed memory message passing architectures [35]. In this paper, we discuss several compile-time optimization techniques used in PYRROS. The scheduling part of PYRROS optimizes both data and program mapping so that the parallel time is minimized. The communication and storage optimization part facilitates the generation of efficient parallel codes. The related issues of partitioning and "owner computes rule" are discussed and the importance of program scheduling is demonstrated. 1 Introduction One of the obstacles in the development of parallelizing compilers is the automatic identification of embedded parallelism in a sequential program. This is because the dependence analysis problem is NP-hard. Significant progress has been made in obtaining approximate solutions [31]. However, the false dependencies found in approximated solutions could have a negative impact in effective parallelization. This was demonstrated ..

    Scheduling Program Task Graphs on MIMD Architectures

    No full text
    Scheduling is a mapping of parallel tasks onto a set of physical processors and a determination of the starting time of each task. In this paper, we discuss several static scheduling techniques used for distributed memory architectures. We also give an overview of a software system PYRROS [38] that uses the scheduling algorithms to generate parallel code for message passing architectures

    Performance Bounds for Column-Block Partitioning of Parallel Gaussian Elimination and Gauss-Jordan Methods

    No full text
    Column-block partitioning is commonly used in the parallelization of Gaussian-Elimination(GE) and Gauss-Jordan(GJ) algorithms. It is therefore of interest to know performance bounds of such partitioning on scalable distributed-memory parallel architectures. In this paper, we use a graph-theoretic approach in deriving asymptotic performance lower bounds of column-block partitioning for both GE and GJ. The new contribution is the incorporation of communication cost in the analysis which results in the derivation of sharper lower bounds. We use our scheduling system PYRROS to experimentally compare the actual run t..
    corecore