931 research outputs found

    Using Content-Addressable Networks for Load Balancing in Desktop Grids

    Get PDF
    Desktop grids combine Peer-to-Peer and Grid computing techniques to improve the robustness, reliability and scalability of job execution infrastructures. However, efficiently matching incoming jobs to available system resources and achieving good load balance in a fully decentralized and heterogeneous computing environment is a challenging problem. In this paper, we extend our prior work with a new decentralized algorithm for maintaining approximate global load information, and a job pushing mechanism that uses the global information to push jobs towards underutilized portions of the system. The resulting system more effectively balances load and improves overall system throughput. Through a comparative analysis of experimental results across different system configurations and job profiles, performed via simulation, we show that our system can reliably execute Grid applications on a distributed set of resources both with low cost and with good load balance

    Master/worker parallel discrete event simulation

    Get PDF
    The execution of parallel discrete event simulation across metacomputing infrastructures is examined. A master/worker architecture for parallel discrete event simulation is proposed providing robust executions under a dynamic set of services with system-level support for fault tolerance, semi-automated client-directed load balancing, portability across heterogeneous machines, and the ability to run codes on idle or time-sharing clients without significant interaction by users. Research questions and challenges associated with issues and limitations with the work distribution paradigm, targeted computational domain, performance metrics, and the intended class of applications to be used in this context are analyzed and discussed. A portable web services approach to master/worker parallel discrete event simulation is proposed and evaluated with subsequent optimizations to increase the efficiency of large-scale simulation execution through distributed master service design and intrinsic overhead reduction. New techniques for addressing challenges associated with optimistic parallel discrete event simulation across metacomputing such as rollbacks and message unsending with an inherently different computation paradigm utilizing master services and time windows are proposed and examined. Results indicate that a master/worker approach utilizing loosely coupled resources is a viable means for high throughput parallel discrete event simulation by enhancing existing computational capacity or providing alternate execution capability for less time-critical codes.Ph.D.Committee Chair: Fujimoto, Richard; Committee Member: Bader, David; Committee Member: Perumalla, Kalyan; Committee Member: Riley, George; Committee Member: Vuduc, Richar

    Requirements of the SALTY project

    Get PDF
    This document is the first external deliverable of the SALTY project (Self-Adaptive very Large disTributed sYstems), funded by the ANR under contract ANR-09-SEGI-012. It is the result of task 1.1 of the Work Package (WP) 1 : Requirements and Architecture. Its objective is to identify and collect requirements from use cases that are going to be developed in WP 4 (Use cases and Validation). Based on the study and classification of the use cases, requirements against the envisaged framework are then determined and organized in features. These features will aim at guide and control the advances in all work packages of the project. As a start, features are classified, briefly described and related scenarios in the defined use cases are pinpointed. In the following tasks and deliverables, these features will facilitate design by assigning priorities to them and defining success criteria at a finer grain as the project progresses. This report, as the first external document, has no dependency to any other external documents and serves as a reference to future external documents. As it has been built from the use cases studies that have been synthesized in two internal documents of the project, extracts from the two documents are made available as appendices (cf. appen- dices B and C)

    DECENTRALIZED AND SCALABLE RESOURCE MANAGEMENT FOR DESKTOP GRIDS

    Get PDF
    The recent growth of the Internet and the CPU power of personal computers and workstations enables desktop grid computing to achieve tremendous computing power with low cost, through opportunistic sharing of resources. However, traditional server-client Grid architectures have inherent problems in robustness, reliability and scalability. Researchers have therefore recently turned to Peer-to-Peer (P2P) algorithms in an attempt to address these issues. I have designed and evaluated a set of protocols that implement a scalable P2P desktop grid computing system for executing Grid applications on widely distributed sets of resources. Such infrastructure must be decentralized, robust, highly available and scalable, while effectively mapping application instances to available resources throughout the system (called matchmaking). First of all, I address the problem of efficient matchmaking of jobs to available system resources by employing customized Content-Addressable Network (CAN) where each resource type corresponds to a distinct dimension. With this approach, incoming jobs are matched with system nodes through proximity in an N-dimensional resource space. Second, I provide comprehensive load balancing mechanisms that can greatly improve overall system throughput and response time without using any centralized control or information about the system. Finally, to remove any hot spots in the system where a small number of nodes are processing a lot of system maintenance work, I have designed a set of optimizations to minimize overall system overheads and distribute them fairly among available system nodes. My ultimate goal is to ensure that no node in the system becomes much more heavily loaded than others, either because of executing jobs or from system maintenance tasks. This is because every node in our system is a peer, so that no node is acting as a pure server or a pure client. Throughout extensive experimental results, I show that the resulting P2P desktop grid computing system is scalable and effective so that it can efficiently match any type of resource requirements for jobs simultaneously, while balancing load among multiple candidate nodes

    Asynchronous Teams and Tasks in a Message Passing Environment

    Get PDF
    As the discipline of scientific computing grows, so too does the "skills gap" between the increasingly complex scientific applications and the efficient algorithms required. Increasing demand for computational power on the march towards exascale requires innovative approaches. Closing the skills gap avoids the many pitfalls that lead to poor utilisation of resources and wasted investment. This thesis tackles two challenges: asynchronous algorithms for parallel computing and fault tolerance. First I present a novel asynchronous task invocation methodology for Discontinuous Galerkin codes called enclave tasking. The approach modifies the parallel ordering of tasks that allows for efficient scaling on dynamic meshes up to 756 cores. It ensures high levels of concurrency and intermixes tasks of different computational properties. Critical tasks along domain boundaries are prioritised for an overlap of computation and communication. The second contribution is the teaMPI library, forming teams of MPI processes exchanging consistency data through an asynchronous "heartbeat". In contrast to previous approaches, teaMPI operates fully asynchronously with reduced overhead. It is also capable of detecting individually slow or failing ranks and inconsistent data among replicas. Finally I provide an outlook into how asynchronous teams using enclave tasking can be combined into an advanced team-based diffusive load balancing scheme. Both concepts are integrated into and contribute towards the ExaHyPE project, a next generation code that solves hyperbolic equation systems on dynamically adaptive cartesian grids
    corecore