83,921 research outputs found

    Relating goal scheduling, precedence, and memory management in and-parallel execution of logic programs

    Full text link
    The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models

    Theory and Engineering of Scheduling Parallel Jobs

    Get PDF
    Scheduling is very important for an efficient utilization of modern parallel computing systems. In this thesis, four main research areas for scheduling are investigated: the interplay and distribution of decision makers, the efficient schedule computation, efficient scheduling for the memory hierarchy and energy-efficiency. The main result is a provably fast and efficient scheduling algorithm for malleable jobs. Experiments show the importance and possibilities of scheduling considering the memory hierarchy

    Critical Path Scheduling Parallel Programs on an Unbounded Number of Processors

    No full text
    International audienceIn this paper we present an efficient algorithm for compile-time scheduling and clustering of parallel programs onto parallel processing systems with distributed memory, which is called The Dynamic Critical Path Scheduling DCPS. The DCPS is superior to several other algorithms from the literature in terms of computational complexity, processors consumption and solution quality. DCPS has a time complexity of O (e + v\log v), as opposed to DSC algorithm O((e + v)\log v) which is the best known algorithm. Experimental results demonstrate the superiority of DCPS over the DSC algorithm

    Relating goal scheduling, precedence, and memory management in and-parallel execution of logic programs

    Get PDF
    The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models

    Energy-Efficient Multiprocessor Scheduling for Flow Time and Makespan

    Full text link
    We consider energy-efficient scheduling on multiprocessors, where the speed of each processor can be individually scaled, and a processor consumes power sαs^{\alpha} when running at speed ss, for α>1\alpha>1. A scheduling algorithm needs to decide at any time both processor allocations and processor speeds for a set of parallel jobs with time-varying parallelism. The objective is to minimize the sum of the total energy consumption and certain performance metric, which in this paper includes total flow time and makespan. For both objectives, we present instantaneous parallelism clairvoyant (IP-clairvoyant) algorithms that are aware of the instantaneous parallelism of the jobs at any time but not their future characteristics, such as remaining parallelism and work. For total flow time plus energy, we present an O(1)O(1)-competitive algorithm, which significantly improves upon the best known non-clairvoyant algorithm and is the first constant competitive result on multiprocessor speed scaling for parallel jobs. In the case of makespan plus energy, which is considered for the first time in the literature, we present an O(ln11/αP)O(\ln^{1-1/\alpha}P)-competitive algorithm, where PP is the total number of processors. We show that this algorithm is asymptotically optimal by providing a matching lower bound. In addition, we also study non-clairvoyant scheduling for total flow time plus energy, and present an algorithm that achieves O(lnP)O(\ln P)-competitive for jobs with arbitrary release time and O(ln1/αP)O(\ln^{1/\alpha}P)-competitive for jobs with identical release time. Finally, we prove an Ω(ln1/αP)\Omega(\ln^{1/\alpha}P) lower bound on the competitive ratio of any non-clairvoyant algorithm, matching the upper bound of our algorithm for jobs with identical release time

    Novel neighborhood search for multiprocessor scheduling with pipelining

    Get PDF
    Presents a neighborhood search algorithm for heterogeneous multiprocessor scheduling in which loop pipelining is used to exploit parallelism between iterations. The method adopts a realistic model for interprocessor communication where resource contention is taken into consideration. The schedule representation scheme is flexible so that communication scheduling can be performed in a generic manner. Based on a general time formulation of the schedule performance, the algorithm improves an initial schedule in an efficient way. Experimental results show that significant improvement over existing methods can be obtained. Using the scheduling results, a parallel software video encoder was implemented and real-time performance was achieved.published_or_final_versio

    A Massively Parallel Implementation of Multilevel Monte Carlo for Finite Element Models

    Full text link
    The Multilevel Monte Carlo (MLMC) method has proven to be an effective variance-reduction statistical method for Uncertainty Quantification (UQ) in Partial Differential Equation (PDE) models, combining model computations at different levels to create an accurate estimate. Still, the computational complexity of the resulting method is extremely high, particularly for 3D models, which requires advanced algorithms for the efficient exploitation of High Performance Computing (HPC). In this article we present a new implementation of the MLMC in massively parallel computer architectures, exploiting parallelism within and between each level of the hierarchy. The numerical approximation of the PDE is performed using the finite element method but the algorithm is quite general and could be applied to other discretization methods as well, although the focus is on parallel sampling. The two key ingredients of an efficient parallel implementation are a good processor partition scheme together with a good scheduling algorithm to assign work to different processors. We introduce a multiple partition of the set of processors that permits the simultaneous execution of different levels and we develop a dynamic scheduling algorithm to exploit it. The problem of finding the optimal scheduling of distributed tasks in a parallel computer is an NP-complete problem. We propose and analyze a new greedy scheduling algorithm to assign samples and we show that it is a 2-approximation, which is the best that may be expected under general assumptions. On top of this result we design a distributed memory implementation using the Message Passing Interface (MPI) standard. Finally we present a set of numerical experiments illustrating its scalability properties.Comment: 21 pages, 13 figure
    corecore