49,227 research outputs found

    Fail Over Strategy for Fault Tolerance in Cloud Computing Environment

    Get PDF
    YesCloud fault tolerance is an important issue in cloud computing platforms and applications. In the event of an unexpected system failure or malfunction, a robust fault-tolerant design may allow the cloud to continue functioning correctly possibly at a reduced level instead of failing completely. To ensure high availability of critical cloud services, the application execution and hardware performance, various fault tolerant techniques exist for building self-autonomous cloud systems. In comparison to current approaches, this paper proposes a more robust and reliable architecture using optimal checkpointing strategy to ensure high system availability and reduced system task service finish time. Using pass rates and virtualised mechanisms, the proposed Smart Failover Strategy (SFS) scheme uses components such as Cloud fault manager, Cloud controller, Cloud load balancer and a selection mechanism, providing fault tolerance via redundancy, optimized selection and checkpointing. In our approach, the Cloud fault manager repairs faults generated before the task time deadline is reached, blocking unrecoverable faulty nodes as well as their virtual nodes. This scheme is also able to remove temporary software faults from recoverable faulty nodes, thereby making them available for future request. We argue that the proposed SFS algorithm makes the system highly fault tolerant by considering forward and backward recovery using diverse software tools. Compared to existing approaches, preliminary experiment of the SFS algorithm indicate an increase in pass rates and a consequent decrease in failure rates, showing an overall good performance in task allocations. We present these results using experimental validation tools with comparison to other techniques, laying a foundation for a fully fault tolerant IaaS Cloud environment

    Reliable Fault Tolerance System for Service Composition in Mobile Ad Hoc Network

    Get PDF
    A Due to the rapid development of smart processing mobile devices, Mobile applications are exploring the use of web services in MANETs to satisfy the user needs. Complex user needs are satisfied by the service composition where a complex service is created by combining one or more atomic services. Service composition has a significant challenge in MANETs due to its limited bandwidth, constrained energy sources, dynamic node movement and often suffers from node failures. These constraints increase the failure rate of service composition. To overcome these, we propose Reliable Fault Tolerant System for Service Composition in MANETs (RFTSC) which makes use of the checkpointing technique for service composition in MANETs. We propose fault policies for each fault in service composition when the faults occur. Failure of services in the service composition process is recovered locally by making use of Checkpointing system and by using discovered services which satisfies the QoS constraints. A Multi-Service Tree (MST) is proposed to recover failed services with O(1) time complexity. Simulation result shows that the proposed approach is efficient when compared to existing approaches

    A Reliable and Cost-Efficient Auto-Scaling System for Web Applications Using Heterogeneous Spot Instances

    Full text link
    Cloud providers sell their idle capacity on markets through an auction-like mechanism to increase their return on investment. The instances sold in this way are called spot instances. In spite that spot instances are usually 90% cheaper than on-demand instances, they can be terminated by provider when their bidding prices are lower than market prices. Thus, they are largely used to provision fault-tolerant applications only. In this paper, we explore how to utilize spot instances to provision web applications, which are usually considered availability-critical. The idea is to take advantage of differences in price among various types of spot instances to reach both high availability and significant cost saving. We first propose a fault-tolerant model for web applications provisioned by spot instances. Based on that, we devise novel auto-scaling polices for hourly billed cloud markets. We implemented the proposed model and policies both on a simulation testbed for repeatable validation and Amazon EC2. The experiments on the simulation testbed and the real platform against the benchmarks show that the proposed approach can greatly reduce resource cost and still achieve satisfactory Quality of Service (QoS) in terms of response time and availability

    Coordination-Free Byzantine Replication with Minimal Communication Costs

    Get PDF
    State-of-the-art fault-tolerant and federated data management systems rely on fully-replicated designs in which all participants have equivalent roles. Consequently, these systems have only limited scalability and are ill-suited for high-performance data management. As an alternative, we propose a hierarchical design in which a Byzantine cluster manages data, while an arbitrary number of learners can reliable learn these updates and use the corresponding data. To realize our design, we propose the delayed-replication algorithm, an efficient solution to the Byzantine learner problem that is central to our design. The delayed-replication algorithm is coordination-free, scalable, and has minimal communication cost for all participants involved. In doing so, the delayed-broadcast algorithm opens the door to new high-performance fault-tolerant and federated data management systems. To illustrate this, we show that the delayed-replication algorithm is not only useful to support specialized learners, but can also be used to reduce the overall communication cost of permissioned blockchains and to improve their storage scalability

    Implementing fault tolerant applications using reflective object-oriented programming

    Get PDF
    Abstract: Shows how reflection and object-oriented programming can be used to ease the implementation of classical fault tolerance mechanisms in distributed applications. When the underlying runtime system does not provide fault tolerance transparently, classical approaches to implementing fault tolerance mechanisms often imply mixing functional programming with non-functional programming (e.g. error processing mechanisms). The use of reflection improves the transparency of fault tolerance mechanisms to the programmer and more generally provides a clearer separation between functional and non-functional programming. The implementations of some classical replication techniques using a reflective approach are presented in detail and illustrated by several examples, which have been prototyped on a network of Unix workstations. Lessons learnt from our experiments are drawn and future work is discussed

    Building Fault Tollrence within Clouds at Network Level

    Get PDF
    Cloud computing technologies and infrastructure facilities are coming up in a big way making it cost effective for the users to implement their IT based solutions to run business in most cost-effective and economical way. Many intricate issues however, have cropped-up which must be addressed to be able to use clouds the purpose for which they are designed and implemented. Among all, fault tolerance and securing the data stored on the clouds takes most of the importance. Continuous availability of the services is dependent on many factors. Faults bound to happen within a network, software, and platform or within the infrastructure which are all used for establishing the cloud. The network that connects various servers, devices, peripherals etc., have to be fault tolerant to start-with so that intended and un-interrupted services to the user can be made available. A novel network design method that leads to achieve high availability of the network and thereby the cloud itself has been presented in this pape

    Hosting Byzantine Fault Tolerant Services on a Chord Ring

    Get PDF
    In this paper we demonstrate how stateful Byzantine Fault Tolerant services may be hosted on a Chord ring. The strategy presented is fourfold: firstly a replication scheme that dissociates the maintenance of replicated service state from ring recovery is developed. Secondly, clients of the ring based services are made replication aware. Thirdly, a consensus protocol is introduced that supports the serialization of updates. Finally Byzantine fault tolerant replication protocols are developed that ensure the integrity of service data hosted on the ring.Comment: Submitted to DSN 2007 Workshop on Architecting Dependable System

    A metaobject architecture for fault-tolerant distributed systems : the FRIENDS approach

    Get PDF
    The FRIENDS system developed at LAAS-CNRS is a metalevel architecture providing libraries of metaobjects for fault tolerance, secure communication, and group-based distributed applications. The use of metaobjects provides a nice separation of concerns between mechanisms and applications. Metaobjects can be used transparently by applications and can be composed according to the needs of a given application, a given architecture, and its underlying properties. In FRIENDS, metaobjects are used recursively to add new properties to applications. They are designed using an object oriented design method and implemented on top of basic system services. This paper describes the FRIENDS software-based architecture, the object-oriented development of metaobjects, the experiments that we have done, and summarizes the advantages and drawbacks of a metaobject approach for building fault-tolerant system
    • 

    corecore