931 research outputs found
Using Content-Addressable Networks for Load Balancing in Desktop Grids
Desktop grids combine Peer-to-Peer and Grid computing techniques to improve
the robustness, reliability and scalability of job execution
infrastructures.
However, efficiently matching incoming jobs to available system resources
and achieving good load balance in a fully decentralized and heterogeneous
computing environment is a challenging problem.
In this paper, we extend our prior work with a new decentralized algorithm
for maintaining approximate global load information, and a job pushing
mechanism that uses the global information to push jobs towards
underutilized portions of the system.
The resulting system more effectively balances load and improves overall
system throughput.
Through a comparative analysis of experimental results across different
system configurations and job profiles, performed via simulation, we show
that our system can reliably execute Grid applications on a distributed set
of resources both with low cost and with good load balance
Master/worker parallel discrete event simulation
The execution of parallel discrete event simulation across metacomputing infrastructures is examined. A master/worker architecture for parallel discrete event simulation is proposed providing robust executions under a dynamic set of services with system-level support for fault tolerance, semi-automated client-directed load balancing, portability across heterogeneous machines, and the ability to run codes on idle or time-sharing clients without significant interaction by users. Research questions and challenges associated with issues and limitations with the work distribution paradigm, targeted computational domain, performance metrics, and the intended class of applications to be used in this context are analyzed and discussed. A portable web services approach to master/worker parallel discrete event simulation is proposed and evaluated with subsequent optimizations to increase the efficiency of large-scale simulation execution through distributed master service design and intrinsic overhead reduction. New techniques for addressing challenges associated with optimistic parallel discrete event simulation across metacomputing such as rollbacks and message unsending with an inherently different computation paradigm utilizing master services and time windows are proposed and examined. Results indicate that a master/worker approach utilizing loosely coupled resources is a viable means for high throughput parallel discrete event simulation by enhancing existing computational capacity or providing alternate execution capability for less time-critical codes.Ph.D.Committee Chair: Fujimoto, Richard; Committee Member: Bader, David; Committee Member: Perumalla, Kalyan; Committee Member: Riley, George; Committee Member: Vuduc, Richar
Requirements of the SALTY project
This document is the first external deliverable of the SALTY project (Self-Adaptive very Large disTributed sYstems), funded by the ANR under contract ANR-09-SEGI-012. It is the result of task 1.1 of the Work Package (WP) 1 : Requirements and Architecture. Its objective is to identify and collect requirements from use cases that are going to be developed in WP 4 (Use cases and Validation). Based on the study and classification of the use cases, requirements against the envisaged framework are then determined and organized in features. These features will aim at guide and control the advances in all work packages of the project. As a start, features are classified, briefly described and related scenarios in the defined use cases are pinpointed. In the following tasks and deliverables, these features will facilitate design by assigning priorities to them and defining success criteria at a finer grain as the project progresses. This report, as the first external document, has no dependency to any other external documents and serves as a reference to future external documents. As it has been built from the use cases studies that have been synthesized in two internal documents of the project, extracts from the two documents are made available as appendices (cf. appen- dices B and C)
Modern approaches to modeling user requirements on resource and task allocation in hierarchical computational grids
Peer ReviewedPostprint (published version
DECENTRALIZED AND SCALABLE RESOURCE MANAGEMENT FOR DESKTOP GRIDS
The recent growth of the Internet and the CPU power of personal
computers and workstations enables desktop grid computing to
achieve tremendous computing power with low cost, through
opportunistic sharing of resources. However, traditional
server-client Grid architectures have inherent problems in robustness,
reliability and scalability. Researchers have therefore recently
turned to Peer-to-Peer (P2P) algorithms in an attempt to address these
issues.
I have designed and evaluated a set of protocols that implement a
scalable P2P desktop grid computing system for executing Grid
applications on widely distributed sets of resources. Such
infrastructure must be decentralized, robust, highly available and
scalable, while effectively mapping application instances to available
resources throughout the system (called matchmaking).
First of all, I address the problem of efficient matchmaking of jobs
to available system resources by employing customized
Content-Addressable Network (CAN) where each resource type corresponds
to a distinct dimension. With this approach, incoming jobs are matched
with system nodes through proximity in an N-dimensional resource
space. Second, I provide comprehensive load balancing mechanisms that
can greatly improve overall system throughput and response time
without using any centralized control or information about the
system. Finally, to remove any hot spots in the system where a small
number of nodes are processing a lot of system maintenance work, I
have designed a set of optimizations to minimize overall system
overheads and distribute them fairly among available system nodes. My
ultimate goal is to ensure that no node in the system becomes much
more heavily loaded than others, either because of executing jobs or
from system maintenance tasks. This is because every node in our
system is a peer, so that no node is acting as a pure server or a pure
client.
Throughout extensive experimental results, I show that the resulting
P2P desktop grid computing system is scalable and effective so that it
can efficiently match any type of resource requirements for jobs
simultaneously, while balancing load among multiple candidate nodes
Asynchronous Teams and Tasks in a Message Passing Environment
As the discipline of scientific computing grows, so too does the "skills gap" between the increasingly complex scientific applications and the efficient algorithms required. Increasing demand for computational power on the march towards exascale requires innovative approaches. Closing the skills gap avoids the many pitfalls that lead to poor utilisation of resources and wasted investment. This thesis tackles two challenges: asynchronous algorithms for parallel computing and fault tolerance. First I present a novel asynchronous task invocation methodology for Discontinuous Galerkin codes called enclave tasking. The approach modifies the parallel ordering of tasks that allows for efficient scaling on dynamic meshes up to 756 cores. It ensures high levels of concurrency and intermixes tasks of different computational properties. Critical tasks along domain boundaries are prioritised for an overlap of computation and communication. The second contribution is the teaMPI library, forming teams of MPI processes exchanging consistency data through an asynchronous "heartbeat". In contrast to previous approaches, teaMPI operates fully asynchronously with reduced overhead. It is also capable of detecting individually slow or failing ranks and inconsistent data among replicas. Finally I provide an outlook into how asynchronous teams using enclave tasking can be combined into an advanced team-based diffusive load balancing scheme. Both concepts are integrated into and contribute towards the ExaHyPE project, a next generation code that solves hyperbolic equation systems on dynamically adaptive cartesian grids
- …