207 research outputs found

    Checkpoint-based Fault-tolerant Infrastructure for Virtualized Service Providers

    Get PDF
    Crash and omission failures are common in service providers: a disk can break down or a link can fail anytime. In addition, the probability of a node failure increases with the number of nodes. Apart from reducing the provider’s computation power and jeopardizing the fulfillment of his contracts, this can also lead to computation time wasting when the crash occurs before finishing the task execution. In order to avoid this problem, efficient checkpoint infrastructures are required, especially in virtualized environments where these infrastructures must deal with huge virtual machine images. This paper proposes a smart checkpoint infrastructure for virtualized service providers. It uses Another Union File System to differentiate read-only from read-write parts in the virtual machine image. In this way, read-only parts can be checkpointed only once, while the rest of checkpoints must only save the modifications in read-write parts, thus reducing the time needed to make a checkpoint. The checkpoints are stored in a Hadoop Distributed File System. This allows resuming a task execution faster after a node crash and increasing the fault tolerance of the system, since checkpoints are distributed and replicated in all the nodes of the provider. This paper presents a running implementation of this infrastructure and its evaluation, demonstrating that it is an effective way to make faster checkpoints with low interference on task execution and efficient task recovery after a node failure.Peer ReviewedPostprint (published version

    Usability of Scientific Workflow in Dynamically Changing Environment

    Get PDF
    Scientific workflow management systems are mainly data-flow oriented, which face several challenges due to the huge amount of data and the required computational capacity which cannot be predicted before enactment. Other problems may arise due to the dynamic access of the data storages or other data sources and the distributed nature of the scientific workflow computational infrastructures (cloud, cluster, grid, HPC), which status may change even during running of a single workflow instance. Many of these failures could be avoided with workflow management systems that provide provenance based dynamism and adaptivity to the unforeseen scenarios arising during enactment. In our work we summarize and categorize the failures that can arise in cloud environment during enactment and show the possibility of prediction and avoidance of failures with dynamic and provenance support

    Elastic management of tasks in virtualized environments

    Get PDF
    Nowadays, service providers in the Cloud offer complex services ready to be used as it was a commodity like water or electricity to their customers. A key technology for this approach is virtualization which facilitates provider's management and provides on-demand virtual environments, which are isolated and consolidated in order to achieve a better utilization of the provider's resources. However, dealing with some virtualization capabilities, such as the creation of virtual environments, implies an effort for the user in order to take benefit from them. In order to avoid this problem, we are contributing the research community with the EMOTIVE (Elastic Management of Tasks in Virtualized Environments) middleware, which allows executing tasks and providing virtualized environments to the users without any extra effort in an efficient way. This is a virtualized environment manager which aims to provide virtual machines that fulfils with the user requirements in terms of software and system capabilities. Furthermore, it supports fine-grained local resource management and provides facilities for developing scheduling policies such as migration and checkpointing.Postprint (published version

    Performance Evaluation of Scheduling Algorithms for Real Time Cloud Computing Systems

    Get PDF
    Cloud computing shares data and oers services transparently among its users. With the increase in number of users of cloud the tasks to be scheduled increases. The performance of cloud depends on the task scheduling algorithms used in the scheduling components or brokering components. Scheduling of tasks on cloud computing systems is one of the research problem, Where the matching of machines and completion time of the tasks are considered. Tasks matching of machines problem is that, assume number of active hosts are Y, number of VMs in each host are Z. Maximum number of possible Virtual Machines(VMs) to schedule a single task is (y*z). If we need to schedule X tasks, number of possibilities are (y *z)^x. So scheduling of tasks is NP Hard problem. NP Hard means this scheduling of tasks on VMs not having polynomial time complexity, but it may have algorithm for verifying solution. Fault-tolerance becomes an important key to establish dependability in cloud computing system. In task scheduling, if task not completed in it's deadline ,then it is one type of fault in scheduling of tasks. In this thesis this type of faults are taken and try to overcome it. In this thesis we present a non-preemptive scheduling algorithm, By inserting the ideal time for postponing the task by ensuring the other task will completes its execution with in the deadline. In simulation the proposed algorithm maximizes the prot of 25%, throughput of 25% and minimizes the penalty of 20% over EDF

    Performance Evaluation of Scheduling Algorithms for Real Time Cloud Computing Systems

    Get PDF
    Cloud computing shares data and oers services transparently among its users. With the increase in number of users of cloud the tasks to be scheduled increases. The performance of cloud depends on the task scheduling algorithms used in the scheduling components or brokering components. Scheduling of tasks on cloud computing systems is one of the research problem, Where the matching of machines and completion time of the tasks are considered. Tasks matching of machines problem is that, assume number of active hosts are Y, number of VMs in each host are Z. Maximum number of possible Virtual Machines(VMs) to schedule a single task is (y*z). If we need to schedule X tasks, number of possibilities are (y *z)^x. So scheduling of tasks is NP Hard problem. NP Hard means this scheduling of tasks on VMs not having polynomial time complexity, but it may have algorithm for verifying solution. Fault-tolerance becomes an important key to establish dependability in cloud computing system. In task scheduling, if task not completed in it's deadline ,then it is one type of fault in scheduling of tasks. In this thesis this type of faults are taken and try to overcome it. In this thesis we present a non-preemptive scheduling algorithm, By inserting the ideal time for postponing the task by ensuring the other task will completes its execution with in the deadline. In simulation the proposed algorithm maximizes the prot of 25%, throughput of 25% and minimizes the penalty of 20% over EDF

    Elastic Highly Available Cloud Computing

    Get PDF
    High availability and elasticity are two the cloud computing services technical features. Elasticity is a key feature of cloud computing where provisioning of resources is closely tied to the runtime demand. High availability assure that cloud applications are resilient to failures. Existing cloud solutions focus on providing both features at the level of the virtual resource through virtual machines by managing their restart, addition, and removal as needed. These existing solutions map applications to a specific design, which is not suitable for many applications especially virtualized telecommunication applications that are required to meet carrier grade standards. Carrier grade applications typically rely on the underlying platform to manage their availability by monitoring heartbeats, executing recoveries, and attempting repairs to bring the system back to normal. Migrating such applications to the cloud can be particularly challenging, especially if the elasticity policies target the application only, without considering the underlying platform contributing to its high availability (HA). In this thesis, a Network Function Virtualization (NFV) framework is introduced; the challenges and requirements of its use in mobile networks are discussed. In particular, an architecture for NFV framework entities in the virtual environment is proposed. In order to reduce signaling traffic congestion and achieve better performance, a criterion to bundle multiple functions of virtualized evolved packet-core in a single physical device or a group of adjacent devices is proposed. The analysis shows that the proposed grouping can reduce the network control traffic by 70 percent. Moreover, a comprehensive framework for the elasticity of highly available applications that considers the elastic deployment of the platform and the HA placement of the application’s components is proposed. The approach is applied to an internet protocol multimedia subsystem (IMS) application and demonstrate how, within a matter of seconds, the IMS application can be scaled up while maintaining its HA status

    A Bag-of-Tasks Scheduler Tolerant to Temporal Failures in Clouds

    Full text link
    Cloud platforms have emerged as a prominent environment to execute high performance computing (HPC) applications providing on-demand resources as well as scalability. They usually offer different classes of Virtual Machines (VMs) which ensure different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for lower price. Despite the monetary advantages, a spot VM can be terminated, stopped, or hibernated by EC2 at any moment. Using both hibernation-prone spot VMs (for cost sake) and on-demand VMs, we propose in this paper a static scheduling for HPC applications which are composed by independent tasks (bag-of-task) with deadline constraints. However, if a spot VM hibernates and it does not resume within a time which guarantees the application's deadline, a temporal failure takes place. Our scheduling, thus, aims at minimizing monetary costs of bag-of-tasks applications in EC2 cloud, respecting its deadline and avoiding temporal failures. To this end, our algorithm statically creates two scheduling maps: (i) the first one contains, for each task, its starting time and on which VM (i.e., an available spot or on-demand VM with the current lowest price) the task should execute; (ii) the second one contains, for each task allocated on a VM spot in the first map, its starting time and on which on-demand VM it should be executed to meet the application deadline in order to avoid temporal failures. The latter will be used whenever the hibernation period of a spot VM exceeds a time limit. Performance results from simulation with task execution traces, configuration of Amazon EC2 VM classes, and VMs market history confirms the effectiveness of our scheduling and that it tolerates temporal failures
    corecore