1,076 research outputs found

    Independent and Divisible Task Scheduling on Heterogeneous Star-shaped Platforms with Limited Memory

    Get PDF
    In this paper, we consider the problem of allocating and scheduling a collection of independent, equal-sized tasks on heterogeneous star-shaped platforms. We also address the same problem for divisible tasks. For both cases, we take memory constraints into account. We prove strong NP-completeness results for different objective functions, namely makespan minimization and throughput maximization, on simple star-shaped platforms. We propose an approximation algorithm based on the unconstrained version (with unlimited memory) of the problem. We introduce several heuristics, which are evaluated and compared through extensive simulations. An unexpected conclusion drawn from these experiments is that classical scheduling heuristics that try to greedily minimize the completion time of each task are outperformed by the simple heuristic that consists in assigning the task to the available processor that has the smallest communication time, regardless of computation power (hence a "bandwidth-centric" distribution).Dans ce rapport, nous nous intĂ©ressons au problĂšme de l’allocation d’un grand nombre de taches indĂ©pendantes et de taille identiques sur des plateformes de calcul hĂ©tĂ©rogĂšnes organisĂ©es en Ă©toile. Nous nous intĂ©ressons Ă©galement au modĂšle des tĂąches divisibles. Pour ces deux modĂšles, nous prenons en compte les contraintes mĂ©moires et dĂ©montrons des rĂ©sultats de NP-complĂ©tude pour diverses mĂ©triques (le «makespakan» et le dĂ©bit). Nous proposons un algorithme d’approximation basĂ© sur la version non-contrainte (c’est-`a-dire avec une mĂ©moire infinie) du problĂšme. Nous proposons Ă©galement d’autres heuristiques que nous Ă©valuons Ă  l’aide d’un grand nombre de simulations. Une conclusion inattendue qui ressort de ces expĂ©riences est que les heuristiques de listes classiques qui essaient de minimiser gloutonnement la durĂ©e de l’ordonnancement sont bien moins performantes que l’heuristique toute simple consistant Ă  envoyer les tĂąches aux processeurs disponibles ayant le temps de communication le plus faible, sans mĂȘme tenir compte de leur puissance de calcu

    Revisiting Matrix Product on Master-Worker Platforms

    Get PDF
    This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous master-worker platforms. While matrix-product is well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: - Centralized data. We assume that all matrix files originate from, and must be returned to, the master. - Heterogeneous star-shaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. - Limited memory. Because we investigate the parallelization of large problems, we cannot assume that full matrix panels can be stored in the worker memories and re-used for subsequent updates (as in ScaLAPACK). We have devised efficient algorithms for resource selection (deciding which workers to enroll) and communication ordering (both for input and result messages), and we report a set of numerical experiments on various platforms at Ecole Normale Superieure de Lyon and the University of Tennessee. However, we point out that in this first version of the report, experiments are limited to homogeneous platforms

    Scheduling multiple bags of tasks on heterogeneous master- worker platforms: centralized versus distributed solutions

    Get PDF
    Multiple applications that execute concurrently on heterogeneous platforms compete for CPU and network resources. In this paper we consider the problem of scheduling applications to ensure fair and efficient execution on master-worker platforms where the communication is restricted to a tree embedded in the network. The goal of the scheduling is to obtain the best throughput while enforcing some fairness between applications. We show how to derive an asymptotically optimal periodic schedule by solving a linear program expressing all problem constraints. For single-level trees, the optimal solution can be analytically computed. For large-scale platforms, gathering the global knowledge needed by the linear programming approach might be unrealistic. One solution is to adapt the multi-commodity flow algorithm of Awerbuch and Leighton, but it still requires some global knowledge. Thus, we also investigates heuristic solutions using only local information, and test them via simulations. The best of our heuristics achieves the optimal performance on about two-thirds of our test cases, but is far worse in a few cases

    MPEG-4 Software Video Encoding

    Get PDF
    A Thesis submitted in fulfillment of the requirements of the degree of doctor of Philosophy in the University of LondonThis thesis presents a software model that allows a parallel decomposition of the MPEG-4 video encoder onto shared memory architectures, in order to reduce its total video encoding time. Since a video sequence consists of video objects each of which is likely to have different encoding requirements, the model incorporates a scheduler which (a) always selects the most appropriate video object for encoding and, (b) employs a mechanism for dynamically allocating video objects allocation onto the system processors, based on video object size information. Further spatial video object parallelism is exploited by applying the single program multiple data (SPMD) paradigm within the different modules of the MPEG-4 video encoder. Due to the fact that not all macroblocks have the same processing requirements, the model also introduces a data partition scheme that generates tiles with identical processing requirements. Since, macroblock data dependencies preclude data parallelism at the shape encoder the model also introduces a new mechanism that allows parallelism using a circular pipeline macroblock technique The encoding time depends partly on an encoder’s computational complexity. This thesis also addresses the problem of the motion estimation, as its complexity has a significant impact on the encoder’s complexity. In particular, two fast motion estimation algorithms have been developed for the model which reduce the computational complexity significantly. The thesis includes experimental results on a four processor shared memory platform, Origin200

    Decoupling User Interface Design Using Libraries of Reusable Components

    Get PDF
    The integration of electronic and mechanical hardware, software and interaction design presents a challenging design space for researchers developing physical user interfaces and interactive artifacts. Currently in the academic research community, physical user interfaces and interactive artifacts are predominantly designed and prototyped either as one-off instances from the ground up, or using functionally rich hardware toolkits and prototyping systems. During this prototyping phase, undertaking an integral design of the interface or interactive artifact’s electronic hardware is frequently constraining due to the tight couplings between the different design realms and the typical need for iterations as the design matures. Several current toolkit designs have consequently embraced component-sharing and component-swapping modular designs with a view to extending flexibility and improving researcher freedom by disentangling and softening the cause-effect couplings. Encouraged by early successes of these toolkits, this research work strives to further enhance these freedoms by pursuing an alternative style and dimension of hardware modularity. Another motivation is our goal to facilitate the design and development of certain classes of interfaces and interactive artifacts for which current electronic design approaches are argued to be restrictively constraining (e.g., relating to scale and complexity). Unfortunately, this goal of a new platform architecture is met with conceptual and technical challenges on the embedded system networking front. In response, this research investigates and extends a growing field of multi-module distributed embedded systems. We identify and characterize a sub-class of these systems, calling them embedded aggregates. We then outline and develop a framework for realizing the embedded aggregate class of systems. Toward this end, this thesis examines several architectures, topologies and communication protocols, making the case for and substantial steps toward the development of a suite of networking protocols and control algorithms to support embedded aggregates. We define a set of protocols, mechanisms and communication packets that collectively form the underlying framework for the aggregates. Following the aggregates design, we develop blades and tiles to support user interface researchers

    Scheduling for Large Scale Distributed Computing Systems: Approaches and Performance Evaluation Issues

    Get PDF
    Although our everyday life and society now depends heavily oncommunication infrastructures and computation infrastructures,scientists and engineers have always been among the main consumers ofcomputing power. This document provides a coherent overview of theresearch I have conducted in the last 15 years and which targets themanagement and performance evaluation of large scale distributedcomputing infrastructures such as clusters, grids, desktop grids,volunteer computing platforms, ... when used for scientific computing.In the first part of this document, I present how I have addressedscheduling problems arising on distributed platforms (like computinggrids) with a particular emphasis on heterogeneity and multi-userissues, hence in connection with game theory. Most of these problemsare relaxed from a classical combinatorial optimization formulationinto a continuous form, which allows to easily account for keyplatform characteristics such as heterogeneity or complex topologywhile providing efficient practical and distributed solutions.The second part presents my main contributions to the SimGrid project,which is a simulation toolkit for building simulators of distributedapplications (originally designed for scheduling algorithm evaluationpurposes). It comprises a unified presentation of how the questions ofvalidation and scalability have been addressed in SimGrid as well asthoughts on specific challenges related to methodological aspects andto the application of SimGrid to the HPC context

    Computational Methods in Science and Engineering : Proceedings of the Workshop SimLabs@KIT, November 29 - 30, 2010, Karlsruhe, Germany

    Get PDF
    In this proceedings volume we provide a compilation of article contributions equally covering applications from different research fields and ranging from capacity up to capability computing. Besides classical computing aspects such as parallelization, the focus of these proceedings is on multi-scale approaches and methods for tackling algorithm and data complexity. Also practical aspects regarding the usage of the HPC infrastructure and available tools and software at the SCC are presented

    An Overview on Wireless Sensor Networks Technology and Evolution

    Get PDF
    Wireless sensor networks (WSNs) enable new applications and require non-conventional paradigms for protocol design due to several constraints. Owing to the requirement for low device complexity together with low energy consumption (i.e., long network lifetime), a proper balance between communication and signal/data processing capabilities must be found. This motivates a huge effort in research activities, standardization process, and industrial investments on this field since the last decade. This survey paper aims at reporting an overview of WSNs technologies, main applications and standards, features in WSNs design, and evolutions. In particular, some peculiar applications, such as those based on environmental monitoring, are discussed and design strategies highlighted; a case study based on a real implementation is also reported. Trends and possible evolutions are traced. Emphasis is given to the IEEE 802.15.4 technology, which enables many applications of WSNs. Some example of performance characteristics of 802.15.4-based networks are shown and discussed as a function of the size of the WSN and the data type to be exchanged among nodes
    • 

    corecore