147 research outputs found
A note on the O(n)-storage implementation of the GKO algorithm
We propose a new O(n)-space implementation of the GKO-Cauchy algorithm for
the solution of linear systems with Cauchy-like matrix. Despite its slightly
higher computational cost, this new algorithm makes a more efficient use of the
processor cache memory. Thus, for matrices of size larger than about 500-1000,
it outperforms the existing algorithms.
We present an applicative case of Cauchy-like matrices with
non-reconstructible main diagonal. In this special instance, the O(n) space
algorithms can be adapted nicely to provide an efficient implementation of
basic linear algebra operations in terms of the low displacement-rank
generators
A Fast Static Scheduling Algorithm for DAGs on an Unbounded Number of Processors
Scheduling parallel tasks on an unbounded number of completely connected processors when communication overhead is taken into account is NP-complete. Assuming that task duplication is not allowed, we propose a fast heuristic algorithm, called the dominant sequence clustering algorithm (DSC), for this scheduling problem. The DSC algorithm is superior to several other algorithms from the literature in terms of both computational complexity and parallel time. We present experimental results for scheduling general directed acyclic task graphs (DAGs) and compare the performance of several algorithms. Moreover, we show that DSC is optimum for special classes of DAGs such as join, fork and coarse grain tree graphs. 1 Introduction Scheduling parallel tasks with precedence relations over distributed memory multiprocessors has been found to be much more difficult than the classical scheduling problem, see Graham [14] and Lenstra and Kan [15]. This is because data transferring between processor..
List Scheduling with and without Communication Delays
Empirical results have shown that the classical critical path (CP) list scheduling heuristic for task graphs is a fast and practical heuristic when communication cost is zero. In the first part of this paper we study the theoretical properties of the CP heuristic that lead to near optimum performance in practice. In the second part we extend the CP analysis to the problem of ordering the task execution when the processor assignment is given and communication cost is nonzero. We propose two new list scheduling heuristics, the RCP and RCP 3 that use critical path information and ready list priority scheduling. We show that the performance properties for RCP and RCP 3 , when communication is nonzero, are similar to CP when communication is zero. Finally, we present an extensive experimental study and optimality analysis of the heuristics which verifies our theoretical results. 1 Introduction The processor scheduling problem is of considerable importance in parallel processing. Given a..
Scalable Parallelization of Harmonic Balance Simulation
A new approach to parallelizing harmonic balance simulation is presented. The technique leverages circuit substructure to expose potential parallelism in the form of a directed, acyclic graph (dag) of computations. This dag is then allocated and scheduled using various linear clustering techniques. The result is a highly scalable and efficient approach to harmonic balance simulation. Two large examples, one from the integrated circuit regime and another from the communication regime, executed on three different parallel computers are used to demonstrate the efficacy of the approach
- …