16 research outputs found

    Efficient Checkpointing for Heterogeneous Collaborative Environments: Representation, Coordination, and Automation.

    Get PDF
    Checkpointing can be used to adapt resource utilization in heterogeneous distributed environments. In checkpointing, the state of a process is captured and later restored on a computer to restart execution from the point where the state capturing had occurred. Such capability can be applied to process migration for which resource utilization is adapted toward high-performance by moving a running process from one computer to another. For a heterogeneous environment, problems in checkpointing can be categorized into three domains regarding mechanisms to capture and restore the execution state, memory state, and communication state of a process. Although a few solutions have been proposed, a well-defined solution is not yet exist. This thesis presents a practical solution to capture and restore the process state in heterogeneous distributed environments. The solution is based on three novel mechanisms: the data transfer mechanism, the memory space representation model and its associated data collection and restoration mechanisms, and the reliable communication and process migration protocols. These mechanisms define the machine-independent representations of the execution state, the memory state, and the communication state. They work in coordination to perform process migration in a heterogeneous environment. A software system is designed and implemented to automatically migrate a process. A number of process migration experiments are tested on sequential and collaborative processes. Experimental results advocate correctness and practicability of our solution

    A checkpointing mechanism for virtual clusters using memory-bound time-multiplexed data transfers

    Get PDF
    Transparent hypervisor-level checkpoint-restart mechanisms for virtual clusters (VCs) or clusters of virtual machines (VMs) offer an attractive fault tolerance capability for cloud data centers. However, existing mechanisms have suffered from high checkpoint downtimes and overheads. This paper introduces Mekha, a novel hypervisor-level, in-memory coordinated checkpoint-restart mechanism for VCs that leverages precopy live migration. During a VC checkpoint event, Mekha creates a shadow VM for each VM and employs a novel memory-bound timed-multiplex data (MTD) transfer mechanism to replicate the state of each VM to its corresponding shadow VM. We also propose a global ending condition that enables the checkpoint coordinator to control the termination of the MTD algorithm for every VM in a VC, thereby reducing overall checkpoint latency. Furthermore, the checkpoint protocols of Mekha are designed based on barrier synchronizations and virtual time, ensuring the global consistency of checkpoints and utilizing existing data retransmission capabilities to handle message loss. We conducted several experiments to evaluate Mekha using a message passing interface (MPI) application from the NASA advanced supercomputing (NAS) parallel benchmark. The results demonstrate that Mekha significantly reduces checkpoint downtime compared to traditional checkpoint mechanisms. Consequently, Mekha effectively decreases checkpoint overheads while offering efficiency and practicality, making it a viable solution for cloud computing environments

    Heterogeneous Strong Computation Migration

    Full text link
    The continuous increase in performance requirements, for both scientific computation and industry, motivates the need of a powerful computing infrastructure. The Grid appeared as a solution for inexpensive execution of heavy applications in a parallel and distributed manner. It allows combining resources independently of their physical location and architecture to form a global resource pool available to all grid users. However, grid environments are highly unstable and unpredictable. Adaptability is a crucial issue in this context, in order to guarantee an appropriate quality of service to users. Migration is a technique frequently used for achieving adaptation. The objective of this report is to survey the problem of strong migration in heterogeneous environments like the grids', the related implementation issues and the current solutions.Comment: This is the pre-peer reviewed version of the following article: Milan\'es, A., Rodriguez, N. and Schulze, B. (2008), State of the art in heterogeneous strong migration of computations. Concurrency and Computation: Practice and Experience, 20: 1485-1508, which has been published in final form at http://onlinelibrary.wiley.com/doi/10.1002/cpe.1287/abstrac

    Communication and Process Migration Protocols for Distributed Heterogeneous Computing

    No full text
    Communication and Process Migration Protocols instituted in an independent layer of a virtual machine environment allow for heterogeneous or homogeneous process migration. The protocols manage message traffic for processes communicating in the virtual machine environment. The protocols manage message traffic for migrating processes so that no message traffic is lost during migration, and proper message order is maintained for the migrating process. In addition to correctness of migration operations, low overhead and high efficiency is achieved for supporting scalable, point-to-point communications.Sponsorship: Illinois Institute of TechnologyUnited States Paten

    Efficient process migration for parallel processing on non-dedicated network of workstations

    Get PDF
    This paper presents the design and preliminary implementation of MpPVM, a software system that supports process migration for PVM application programs in a nondedicated heterogeneous computing environment. New concepts of migration point as well as migration point analysis and necessary data analysis are introduced. In MpPVM, process migrations occur only at previously inserted migration points. Migration point analysis determines appropriate locations to insert migration points; whereas, necessary data analysis provides a minimum set of variables to be transferred at each migration point. A new methodology to perform reliable point-to-point data communications in a migration environment is also discussed. Finally, a preliminary implementation of Mp-PVM and its experimental results are presented, showing the correctness and promising performance of our process migration mechanism in a scalable non-dedicated heterogeneous computing environment. While MpPVM is developed on top of PVM, the process migration methodology introduced in this study is general and can be applied to any distributed software environment

    Memory Space Representation for Heterogeneous Network Process Migration

    No full text
    A major difficulty of heterogeneous process migration is how to collect advanced dynamic data-structures, transform them into machine independent form, and restor them appropriately in a different hardware and software environment. In this study we introduce a data model, the Memory Space Representation (MSR) model, to recognize complex data structures in program address spaces. Supporting mechanisms of the MSR model are also developed for collecting program data structures and restoring them in a heterogeneous environment. The MSR design has been implemented under a prototype heterogeneous process migration environment. Pointer-intensive programs with function and recursion calls are tested. Experimental results confirm that the newly proposed design is feasible and effective for heterogeneous network process migration. 1. Introduction As network computing becomes an increasingly popular choice for computing, network process migration has received unprecedented attention recently. On..
    corecore