147 research outputs found

    A note on the O(n)-storage implementation of the GKO algorithm

    Full text link
    We propose a new O(n)-space implementation of the GKO-Cauchy algorithm for the solution of linear systems with Cauchy-like matrix. Despite its slightly higher computational cost, this new algorithm makes a more efficient use of the processor cache memory. Thus, for matrices of size larger than about 500-1000, it outperforms the existing algorithms. We present an applicative case of Cauchy-like matrices with non-reconstructible main diagonal. In this special instance, the O(n) space algorithms can be adapted nicely to provide an efficient implementation of basic linear algebra operations in terms of the low displacement-rank generators

    A Fast Static Scheduling Algorithm for DAGs on an Unbounded Number of Processors

    No full text
    Scheduling parallel tasks on an unbounded number of completely connected processors when communication overhead is taken into account is NP-complete. Assuming that task duplication is not allowed, we propose a fast heuristic algorithm, called the dominant sequence clustering algorithm (DSC), for this scheduling problem. The DSC algorithm is superior to several other algorithms from the literature in terms of both computational complexity and parallel time. We present experimental results for scheduling general directed acyclic task graphs (DAGs) and compare the performance of several algorithms. Moreover, we show that DSC is optimum for special classes of DAGs such as join, fork and coarse grain tree graphs. 1 Introduction Scheduling parallel tasks with precedence relations over distributed memory multiprocessors has been found to be much more difficult than the classical scheduling problem, see Graham [14] and Lenstra and Kan [15]. This is because data transferring between processor..

    List Scheduling with and without Communication Delays

    No full text
    Empirical results have shown that the classical critical path (CP) list scheduling heuristic for task graphs is a fast and practical heuristic when communication cost is zero. In the first part of this paper we study the theoretical properties of the CP heuristic that lead to near optimum performance in practice. In the second part we extend the CP analysis to the problem of ordering the task execution when the processor assignment is given and communication cost is nonzero. We propose two new list scheduling heuristics, the RCP and RCP 3 that use critical path information and ready list priority scheduling. We show that the performance properties for RCP and RCP 3 , when communication is nonzero, are similar to CP when communication is zero. Finally, we present an extensive experimental study and optimality analysis of the heuristics which verifies our theoretical results. 1 Introduction The processor scheduling problem is of considerable importance in parallel processing. Given a..

    An approach to machine-independent parallel programming

    No full text

    Scalable Parallelization of Harmonic Balance Simulation

    No full text
    A new approach to parallelizing harmonic balance simulation is presented. The technique leverages circuit substructure to expose potential parallelism in the form of a directed, acyclic graph (dag) of computations. This dag is then allocated and scheduled using various linear clustering techniques. The result is a highly scalable and efficient approach to harmonic balance simulation. Two large examples, one from the integrated circuit regime and another from the communication regime, executed on three different parallel computers are used to demonstrate the efficacy of the approach
    corecore