Search CORE

28,497 research outputs found

Algorithms for Hierarchical and Semi-Partitioned Parallel Scheduling

Author: Bonifaci Vincenzo
Dangelo Gianlorenzo
Marchetti-Spaccamela Alberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

We propose a model for scheduling jobs in a parallel machine setting that takes into account the cost of migrations by assuming that the processing time of a job may depend on the specific set of machines among which the job is migrated. For the makespan minimization objective, the model generalizes classical scheduling problems such as unrelated parallel machine scheduling, as well as novel ones such as semi-partitioned and clustered scheduling. In the case of a hierarchical family of machines, we derive a compact integer linear programming formulation of the problem and leverage its fractional relaxation to obtain a polynomial-time 2-approximation algorithm. Extensions that incorporate memory capacity constraints are also discussed

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Principles for problem aggregation and assignment in medium scale multiprocessors

Author: Nicol David M.
Saltz Joel H.
Publication venue
Publication date
Field of study

One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior

NASA Technical Reports Server

An analysis of scatter decomposition

Author: Nicol David M.
Saltz Joel H.
Publication venue
Publication date
Field of study

A formal analysis of a powerful mapping technique known as scatter decomposition is presented. Scatter decomposition divides an irregular computational domain into a large number of equal sized pieces, and distributes them modularly among processors. A probabilistic model of workload in one dimension is used to formally explain why, and when scatter decomposition works. The first result is that if correlation in workload is a convex function of distance, then scattering a more finely decomposed domain yields a lower average processor workload variance. The second result shows that if the workload process is stationary Gaussian and the correlation function decreases linearly in distance until becoming zero and then remains zero, scattering a more finely decomposed domain yields a lower expected maximum processor workload. Finally it is shown that if the correlation function decreases linearly across the entire domain, then among all mappings that assign an equal number of domain pieces to each processor, scatter decomposition minimizes the average processor workload variance. The dependence of these results on the assumption of decreasing correlation is illustrated with situations where a coarser granularity actually achieves better load balance

NASA Technical Reports Server

Directed taskgraph scheduling using simulated annealing

Author: D'Hollander Erik
Devis Yves
Publication venue: 'Informa UK Limited'
Publication date: 01/01/1991
Field of study

Ghent University Academic Bibliography

Recommended from our members

EXTEND-L : an input language for extensible register transfer compilation

Author: Dutt Nikil D.
Gajski Daniel D.
Publication venue: eScholarship, University of California
Publication date: 15/04/1988
Field of study

This report discusses the model and input language for EXTEND, a synthesis system that permits extensible register transfer synthesis. EXTEND-L fills the need for a language that bridges the gap between existing behavioral input descriptions, which are too abstract, and structural schematics, which cannot capture the high-level behavior. The report first discusses previous work in behavioral synthesis and summarizes the deficiencies of these behavioral specifications. The report then describes the proposed langauge in detail, and concludes with a few examples that show its utility

eScholarship - University of California

Parallel local search for solving Constraint Problems on the Cell Broadband Engine (Preliminary Results)

Author: Abreu Salvator
Codognet Philippe
Diaz Daniel
Publication venue: 'Open Publishing Association'
Publication date: 12/09/2009
Field of study

We explore the use of the Cell Broadband Engine (Cell/BE for short) for combinatorial optimization applications: we present a parallel version of a constraint-based local search algorithm that has been implemented on a multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16 SPUs per blade). This algorithm was chosen because it fits very well the Cell/BE architecture and requires neither shared memory nor communication between processors, while retaining a compact memory footprint. We study the performance on several large optimization benchmarks and show that this achieves mostly linear time speedups, even sometimes super-linear. This is possible because the parallel implementation might explore simultaneously different parts of the search space and therefore converge faster towards the best sub-space and thus towards a solution. Besides getting speedups, the resulting times exhibit a much smaller variance, which benefits applications where a timely reply is critical

arXiv.org e-Print Archive

Directory of Open Access Journals

HAL-Paris1

Effects of partitioning and scheduling sparse matrix factorization on communication and load balance

Author: Naik Vijay K.
Venugopal Sesh
Publication venue
Publication date
Field of study

A block based, automatic partitioning and scheduling methodology is presented for sparse matrix factorization on distributed memory systems. Using experimental results, this technique is analyzed for communication and load imbalance overhead. To study the performance effects, these overheads were compared with those obtained from a straightforward 'wrap mapped' column assignment scheme. All experimental results were obtained using test sparse matrices from the Harwell-Boeing data set. The results show that there is a communication and load balance tradeoff. The block based method results in lower communication cost whereas the wrap mapped scheme gives better load balance

NASA Technical Reports Server