83,921 research outputs found
Relating goal scheduling, precedence, and memory management in and-parallel execution of logic programs
The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics
of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models
Theory and Engineering of Scheduling Parallel Jobs
Scheduling is very important for an efficient utilization of modern parallel computing systems. In this thesis, four main research areas for scheduling are investigated: the interplay and distribution of decision makers, the efficient schedule computation, efficient scheduling for the memory hierarchy and energy-efficiency. The main result is a provably fast and efficient scheduling algorithm for malleable jobs. Experiments show the importance and possibilities of scheduling considering the memory hierarchy
Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors
International audienceIn this paper we present an efficient algorithm for compile-time scheduling and clustering of parallel programs onto parallel processing systems with distributed memory, which is called The Dynamic Critical Path Scheduling DCPS. The DCPS is superior to several other algorithms from the literature in terms of computational complexity, processors consumption and solution quality. DCPS has a time complexity of O (e + v\log v), as opposed to DSC algorithm O((e + v)\log v) which is the best known algorithm. Experimental results demonstrate the superiority of DCPS over the DSC algorithm
Relating goal scheduling, precedence, and memory management in and-parallel execution of logic programs
The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics
of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models
Energy-Efficient Multiprocessor Scheduling for Flow Time and Makespan
We consider energy-efficient scheduling on multiprocessors, where the speed
of each processor can be individually scaled, and a processor consumes power
when running at speed , for . A scheduling algorithm
needs to decide at any time both processor allocations and processor speeds for
a set of parallel jobs with time-varying parallelism. The objective is to
minimize the sum of the total energy consumption and certain performance
metric, which in this paper includes total flow time and makespan. For both
objectives, we present instantaneous parallelism clairvoyant (IP-clairvoyant)
algorithms that are aware of the instantaneous parallelism of the jobs at any
time but not their future characteristics, such as remaining parallelism and
work. For total flow time plus energy, we present an -competitive
algorithm, which significantly improves upon the best known non-clairvoyant
algorithm and is the first constant competitive result on multiprocessor speed
scaling for parallel jobs. In the case of makespan plus energy, which is
considered for the first time in the literature, we present an
-competitive algorithm, where is the total number of
processors. We show that this algorithm is asymptotically optimal by providing
a matching lower bound. In addition, we also study non-clairvoyant scheduling
for total flow time plus energy, and present an algorithm that achieves -competitive for jobs with arbitrary release time and
-competitive for jobs with identical release time. Finally,
we prove an lower bound on the competitive ratio of
any non-clairvoyant algorithm, matching the upper bound of our algorithm for
jobs with identical release time
Novel neighborhood search for multiprocessor scheduling with pipelining
Presents a neighborhood search algorithm for heterogeneous multiprocessor scheduling in which loop pipelining is used to exploit parallelism between iterations. The method adopts a realistic model for interprocessor communication where resource contention is taken into consideration. The schedule representation scheme is flexible so that communication scheduling can be performed in a generic manner. Based on a general time formulation of the schedule performance, the algorithm improves an initial schedule in an efficient way. Experimental results show that significant improvement over existing methods can be obtained. Using the scheduling results, a parallel software video encoder was implemented and real-time performance was achieved.published_or_final_versio
A Massively Parallel Implementation of Multilevel Monte Carlo for Finite Element Models
The Multilevel Monte Carlo (MLMC) method has proven to be an effective
variance-reduction statistical method for Uncertainty Quantification (UQ) in
Partial Differential Equation (PDE) models, combining model computations at
different levels to create an accurate estimate. Still, the computational
complexity of the resulting method is extremely high, particularly for 3D
models, which requires advanced algorithms for the efficient exploitation of
High Performance Computing (HPC). In this article we present a new
implementation of the MLMC in massively parallel computer architectures,
exploiting parallelism within and between each level of the hierarchy. The
numerical approximation of the PDE is performed using the finite element method
but the algorithm is quite general and could be applied to other discretization
methods as well, although the focus is on parallel sampling. The two key
ingredients of an efficient parallel implementation are a good processor
partition scheme together with a good scheduling algorithm to assign work to
different processors. We introduce a multiple partition of the set of
processors that permits the simultaneous execution of different levels and we
develop a dynamic scheduling algorithm to exploit it. The problem of finding
the optimal scheduling of distributed tasks in a parallel computer is an
NP-complete problem. We propose and analyze a new greedy scheduling algorithm
to assign samples and we show that it is a 2-approximation, which is the best
that may be expected under general assumptions. On top of this result we design
a distributed memory implementation using the Message Passing Interface (MPI)
standard. Finally we present a set of numerical experiments illustrating its
scalability properties.Comment: 21 pages, 13 figure
- …