2,109 research outputs found

    Independent and Divisible Task Scheduling on Heterogeneous Star-shaped Platforms with Limited Memory

    Get PDF
    In this paper, we consider the problem of allocating and scheduling a collection of independent, equal-sized tasks on heterogeneous star-shaped platforms. We also address the same problem for divisible tasks. For both cases, we take memory constraints into account. We prove strong NP-completeness results for different objective functions, namely makespan minimization and throughput maximization, on simple star-shaped platforms. We propose an approximation algorithm based on the unconstrained version (with unlimited memory) of the problem. We introduce several heuristics, which are evaluated and compared through extensive simulations. An unexpected conclusion drawn from these experiments is that classical scheduling heuristics that try to greedily minimize the completion time of each task are outperformed by the simple heuristic that consists in assigning the task to the available processor that has the smallest communication time, regardless of computation power (hence a "bandwidth-centric" distribution).Dans ce rapport, nous nous intéressons au problème de l’allocation d’un grand nombre de taches indépendantes et de taille identiques sur des plateformes de calcul hétérogènes organisées en étoile. Nous nous intéressons également au modèle des tâches divisibles. Pour ces deux modèles, nous prenons en compte les contraintes mémoires et démontrons des résultats de NP-complétude pour diverses métriques (le «makespakan» et le débit). Nous proposons un algorithme d’approximation basé sur la version non-contrainte (c’est-`a-dire avec une mémoire infinie) du problème. Nous proposons également d’autres heuristiques que nous évaluons à l’aide d’un grand nombre de simulations. Une conclusion inattendue qui ressort de ces expériences est que les heuristiques de listes classiques qui essaient de minimiser gloutonnement la durée de l’ordonnancement sont bien moins performantes que l’heuristique toute simple consistant à envoyer les tâches aux processeurs disponibles ayant le temps de communication le plus faible, sans même tenir compte de leur puissance de calcu

    Optimizing egalitarian performance in the side-effects model of colocation for data center resource management

    Full text link
    In data centers, up to dozens of tasks are colocated on a single physical machine. Machines are used more efficiently, but tasks' performance deteriorates, as colocated tasks compete for shared resources. As tasks are heterogeneous, the resulting performance dependencies are complex. In our previous work [18] we proposed a new combinatorial optimization model that uses two parameters of a task - its size and its type - to characterize how a task influences the performance of other tasks allocated to the same machine. In this paper, we study the egalitarian optimization goal: maximizing the worst-off performance. This problem generalizes the classic makespan minimization on multiple processors (P||Cmax). We prove that polynomially-solvable variants of multiprocessor scheduling are NP-hard and hard to approximate when the number of types is not constant. For a constant number of types, we propose a PTAS, a fast approximation algorithm, and a series of heuristics. We simulate the algorithms on instances derived from a trace of one of Google clusters. Algorithms aware of jobs' types lead to better performance compared with algorithms solving P||Cmax. The notion of type enables us to model degeneration of performance caused by using standard combinatorial optimization methods. Types add a layer of additional complexity. However, our results - approximation algorithms and good average-case performance - show that types can be handled efficiently.Comment: Author's version of a paper published in Euro-Par 2017 Proceedings, extends the published paper with addtional results and proof

    Revisiting Matrix Product on Master-Worker Platforms

    Get PDF
    This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous master-worker platforms. While matrix-product is well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: - Centralized data. We assume that all matrix files originate from, and must be returned to, the master. - Heterogeneous star-shaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. - Limited memory. Because we investigate the parallelization of large problems, we cannot assume that full matrix panels can be stored in the worker memories and re-used for subsequent updates (as in ScaLAPACK). We have devised efficient algorithms for resource selection (deciding which workers to enroll) and communication ordering (both for input and result messages), and we report a set of numerical experiments on various platforms at Ecole Normale Superieure de Lyon and the University of Tennessee. However, we point out that in this first version of the report, experiments are limited to homogeneous platforms

    Utilizing Divisible Load Scheduling Theorem in Round Robin Algorithm for Load Balancing In Cloud Environment

    Get PDF
    Cloud Computing is a newly paradigm in computing that promises a shift from an organization required to invest heavily for limited IT resources that are internally managed, to a model where the organization can buy or rent resources that are managed by a cloud provider, and pay peruse. With the fast growing of cloud computing one of the areas that is paramount to cloud computing service providers is the establishment of an effective load balancing algorithm that assigns tasks to best Virtual Machines(VM) in such a way that it provides satisfactory performance to both, cloud users and providers. Among these load balancing algorithms in cloud environment Round Robin (RR) algorithm is one of them. In this paper firstly analysis of various Round Robin load balancing algorithms is done. Secondly, anew Virtual Machines (VM) load balancing algorithm has been proposed and implemented; i.e. ‘Divisible Weighted Round Robin(DWRR) Load Balancing Algorithm’. This proposed load balancing algorithm utilizes the Divisible Load Scheduling Theorem in the Round Robin load balancing algorithm. In order to evaluate the performance of this proposed algorithm (DWRR) the researcher used a simulator called CloudSim tool to conduct a test on the performances between the proposed algorithm (DWRR) and the types of Round Robin algorithms. After a thoroughly comparison between these algorithms, the results showed that DWRR outperforms the various types of Round Robin(Weighted Round Robin and Round Robin with server affinity )algorithms in terms of execution time (makespan) with the least complexity. Keywords: Scheduling Algorithm, performance, cloud computing, load balancing algorithm, Divisible Load scheduling Theory

    Efficient Parallel Video Encoding on Heterogeneous Systems

    Get PDF
    Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014.In this study we propose an efficient method for collaborative H.264/AVC inter-loop encoding in heterogeneous CPU+GPU systems. This method relies on specifically developed extensive library of highly optimized parallel algorithms for both CPU and GPU architectures, and all inter-loop modules. In order to minimize the overall encoding time, this method integrates adaptive load balancing for the most computationally intensive, inter-prediction modules, which is based on dynamically built functional performance models of heterogenous devices and inter-loop modules. The proposed method also introduces efficient communication-aware techniques, which maximize data reusing, and decrease the overhead of expensive data transfers in collaborative video encoding. The experimental results show that the proposed method is able of achieving real-time video encoding for very demanding video coding parameters, i.e., full HD video format, 64×64 pixels search area and the exhaustive motion estimation.This work was supported by national funds through FCT – Fundação para a Ciência e a Tecnologia, under projects PEst-OE/EEI/LA0021/2013, PTDC/EEI-ELC/3152/2012 and PTDC/EEA-ELC/117329/2010
    • …
    corecore