1,123 research outputs found

    Ishu bunsan shisutemu ni okeru kabun tasuku no sukejulingu

    Get PDF
    制度:新 ; 報告番号:甲2691号 ; 学位の種類:博士(国際情報通信学) ; 授与年月日:2008/7/30 ; 早大学位記番号:新486

    Divisible load scheduling of image processing applications on the heterogeneous star and tree networks using a new genetic algorithm

    Get PDF
    The divisible load scheduling of image processing applications on the heterogeneous star and multi-level tree networks is addressed in this paper. In our platforms, processors and network links have different speeds. In addition, computation and communication overheads are considered. A new genetic algorithm for minimizing the processing time of low-level image applications using divisible load theory is introduced. The closed-form solution for the processing time, the image fractions that should be allocated to each processor, the optimum number of participating processors, and the optimal sequence for load distribution are derived. The new concept of equivalent processor in tree network is introduced and the effect of different image and kernel sizes on processing time and speed up are investigated. Finally, to indicate the efficiency of our algorithm, several numerical experiments are presented

    Agent-Based Load Balancing on Homogeneous Minigrids: Macroscopic Modeling and Characterization

    Get PDF
    Abstract—In this paper, we present a macroscopic characterization of agent-based load balancing in homogeneous minigrid environments. The agent-based load balancing is regarded as agent distribution from a macroscopic point of view. We study two quantities on minigrids: the number and size of teams where agents (tasks) queue. In macroscopic modeling, the load balancing mechanism is characterized using differential equations. We show that the load balancing we concern always converges to a steady state. Furthermore, we show that load balancing with different initial distributions converges to the same steady state gradually. Also, we prove that the steady state becomes an even distribution if and only if agents have complete knowledge about agent teams on minigrids. Utility gains and efficiency are introduced to measure the quality of load balancing. Through numerical simulations, we discuss the utility gains and efficiency of load balancing in different cases and give a series of analysis. In order to maximize the utility gain and the efficiency, we theoretically discuss the optimization of agents ’ strategies. Finally, in order to validate our proposed agentbased load balancing mechanism, we develop a computing platform, called Simulation System for Grid Task Distribution (SSGTD). Through experimentation, we note that our experimental results in general confirm our theoretical proofs and numerical simulation results on the proposed equation system. In addition, we find a very interesting phenomenon, that is, our agent-based load balancing mechanism is topology-independent

    The effect of start-up delays in scheduling divisible loads on bus networks: An alternate approach

    Get PDF
    AbstractIn this paper, scheduling of divisible loads in a bus network is considered. The objective is to minimize the processing time by including the overhead component due to start-up time that could degrade the performance of the system, in addition to the inherent communication and computation delays. These overheads are considered to be constant additive factors to the communication and computation components. A closed-form expression for optimal processing time is derived. Using this closed-form expression, this paper analytically proves significant results regarding the optimal sequence of load distribution and optimal number of processors. Numerical examples are presented to illustrate the analysis

    Adaptive structured parallelism

    Get PDF
    Algorithmic skeletons abstract commonly-used patterns of parallel computation, communication, and interaction. Parallel programs are expressed by interweaving parameterised skeletons analogously to the way in which structured sequential programs are developed, using well-defined constructs. Skeletons provide top-down design composition and control inheritance throughout the program structure. Based on the algorithmic skeleton concept, structured parallelism provides a high-level parallel programming technique which allows the conceptual description of parallel programs whilst fostering platform independence and algorithm abstraction. By decoupling the algorithm specification from machine-dependent structural considerations, structured parallelism allows programmers to code programs regardless of how the computation and communications will be executed in the system platform.Meanwhile, large non-dedicated multiprocessing systems have long posed a challenge to known distributed systems programming techniques as a result of the inherent heterogeneity and dynamism of their resources. Scant research has been devoted to the use of structural information provided by skeletons in adaptively improving program performance, based on resource utilisation. This thesis presents a methodology to improve skeletal parallel programming in heterogeneous distributed systems by introducing adaptivity through resource awareness. As we hypothesise that a skeletal program should be able to adapt to the dynamic resource conditions over time using its structural forecasting information, we have developed ASPara: Adaptive Structured Parallelism. ASPara is a generic methodology to incorporate structural information at compilation into a parallel program, which will help it to adapt at execution

    Revisiting Matrix Product on Master-Worker Platforms

    Get PDF
    This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous master-worker platforms. While matrix-product is well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: - Centralized data. We assume that all matrix files originate from, and must be returned to, the master. - Heterogeneous star-shaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. - Limited memory. Because we investigate the parallelization of large problems, we cannot assume that full matrix panels can be stored in the worker memories and re-used for subsequent updates (as in ScaLAPACK). We have devised efficient algorithms for resource selection (deciding which workers to enroll) and communication ordering (both for input and result messages), and we report a set of numerical experiments on various platforms at Ecole Normale Superieure de Lyon and the University of Tennessee. However, we point out that in this first version of the report, experiments are limited to homogeneous platforms

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Extending OmpSs for OpenCL kernel co-execution in heterogeneous systems

    Get PDF
    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Heterogeneous systems have a very high potential performance but present difficulties in their programming. OmpSs is a well known framework for task based parallel applications, which is an interesting tool to simplify the programming of these systems. However, it does not support the co-execution of a single OpenCL kernel instance on several compute devices. To overcome this limitation, this paper presents an extension of the OmpSs framework that solves two main objectives: the automatic division of datasets among several devices and the management of their memory address spaces. To adapt to different kinds of applications, the data division can be performed by the novel HGuided load balancing algorithm or by the well known Static and Dynamic. All this is accomplished with negligible impact on the programming. Experimental results reveal that there is always one load balancing algorithm that improves the performance and energy consumption of the system.This work has been supported by the University of Cantabria with grant CVE-2014-18166, the Generalitat de Catalunya under grant 2014-SGR-1051, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2016- 76635-C2-2-R (AEI/FEDER, UE) and TIN2015-65316-P. The Spanish Government through the Programa Severo Ochoa (SEV-2015-0493). The European Research Council under grant agreement No 321253 European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc Projects, grant agreement n 288777, 610402 and 671697 and the European HiPEAC Network.Peer ReviewedPostprint (published version
    corecore