10 research outputs found

    Operative Merest-undertaking Impeccable Reclamation Line Accretion Ordering for Deterministic Mobile Distributed Computing Systems

    Get PDF
    Impeccable-RL-accretion   (Impeccable Reclamation Line accretion) is one of the ordinarily familiarized  approaches to present failing resilience  in Distributed Computing  setup (DCS)   so that the setup can operate even if one or more components have abdicated. However, Mobile DCSs are constrained by small transmittal potentiality, Suppleness, and dearth of stabilized repository, recurrent disruptions and imperfect battery life. From this time Impeccable-RL-accretion   orderings which have reduced reestablishment-dots   are favored in mobile environments. In this paper, we contemplate a merest-undertaking synchronic ordering for Impeccable-RL-accretion   for mobile DCS. We eliminate inoperable reestablishment-dots   as well as stalling of undertakings amidst reestablishment-dots   at the striving of registering contra-dispatches of very few dispatches amidst Impeccable-RL-accretion. We also organize an effort to subside the depletion of Impeccable-RL-accretion   work when any undertaking collapses to stockpile its reestablishment-dot in a founding. In this mode, we handle excessive failings amidst Impeccable-RL-accretion. We organize registering of contra-dispatches of very few dispatches only amidst Impeccable-RL-accretion. We also strive to subside depletion of Impeccable-RL-accretion   work. &nbsp

    CHECKPOINTING AND RECOVERY IN DISTRIBUTED AND DATABASE SYSTEMS

    Get PDF
    A transaction-consistent global checkpoint of a database records a state of the database which reflects the effect of only completed transactions and not the re- sults of any partially executed transactions. This thesis establishes the necessary and sufficient conditions for a checkpoint of a data item (or the checkpoints of a set of data items) to be part of a transaction-consistent global checkpoint of the database. This result would be useful for constructing transaction-consistent global checkpoints incrementally from the checkpoints of each individual data item of a database. By applying this condition, we can start from any useful checkpoint of any data item and then incrementally add checkpoints of other data items until we get a transaction- consistent global checkpoint of the database. This result can also help in designing non-intrusive checkpointing protocols for database systems. Based on the intuition gained from the development of the necessary and sufficient conditions, we also de- veloped a non-intrusive low-overhead checkpointing protocol for distributed database systems. Checkpointing and rollback recovery are also established techniques for achiev- ing fault-tolerance in distributed systems. Communication-induced checkpointing algorithms allow processes involved in a distributed computation take checkpoints independently while at the same time force processes to take additional checkpoints to make each checkpoint to be part of a consistent global checkpoint. This thesis develops a low-overhead communication-induced checkpointing protocol and presents a performance evaluation of the protocol

    A Recovery Scheme for Cluster Federations Using Sender-based Message Logging

    Get PDF
    A cluster federation is a union of clusters and is heterogeneous. Each cluster contains a certain number of processes. An application running in such a computing environment is divided into communicating modules so that these modules can run on different clusters. To achieve fault-tolerance different clusters may employ different check pointing schemes. For example, some may use coordinated schemes, while some other may use communication-induced schemes. It may complicate the recovery process. In this paper, we have addressed the complex problem of recovery for cluster computing environment. The proposed approach handles both inter cluster orphan and lost messages unlike the existing works in this area. We first propose an algorithm to determine a recovery line so that there does not exist any inter cluster orphan message between any pair of the cluster level check points belonging to the recovery line. The main feature of the proposed algorithm is that it can be executed simultaneously by all clusters in the cluster federation. Next we apply the sender-based message logging idea to effectively handle all inter cluster lost messages to ensure correctness of computation

    Design and analysis of an efficient energy algorithm in wireless social sensor networks

    Get PDF
    Because mobile ad hoc networks have characteristics such as lack of center nodes, multi-hop routing and changeable topology, the existing checkpoint technologies for normal mobile networks cannot be applied well to mobile ad hoc networks. Considering the multi-frequency hierarchy structure of ad hoc networks, this paper proposes a hybrid checkpointing strategy which combines the techniques of synchronous checkpointing with asynchronous checkpointing, namely the checkpoints of mobile terminals in the same cluster remain synchronous, and the checkpoints in different clusters remain asynchronous. This strategy could not only avoid cascading rollback among the processes in the same cluster, but also avoid too many message transmissions among the processes in different clusters. What is more, it can reduce the communication delay. In order to assure the consistency of the global states, this paper discusses the correctness criteria of hybrid checkpointing, which includes the criteria of checkpoint taking, rollback recovery and indelibility. Based on the designed Intra-Cluster Checkpoint Dependence Graph and Inter-Cluster Checkpoint Dependence Graph, the elimination rules for different kinds of checkpoints are discussed, and the algorithms for the same cluster checkpoints, different cluster checkpoints, and rollback recovery are also given. Experimental results demonstrate the proposed hybrid checkpointing strategy is a preferable trade-off method, which not only synthetically takes all kinds of resource constraints of Ad hoc networks into account, but also outperforms the existing schemes in terms of the dependence to cluster heads, the recovery time compared to the pure synchronous, and the pure asynchronous checkpoint advantage. © 2017 by the authors. Licensee MDPI, Basel, Switzerland

    Locality-driven checkpoint and recovery

    Get PDF
    Checkpoint and recovery are important fault-tolerance techniques for distributed systems. The two categories of existing strategies incur unacceptable performance cost either at run time or upon failure recovery, when applied to large-scale distributed systems. In particular, the large number of messages and processes in these systems causes either considerable checkpoint as well as logging overhead, or catastrophic global-wise recovery effect. This thesis proposes a locality-driven strategy for efficiently checkpointing and recovering such systems with both affordable runtime cost and controllable failure recoverability. Messages establish dependencies between distributed processes, which can be either preserved by coordinated checkpoints or removed via logging. Existing strategies enforce a uniform handling policy for all message dependencies, and hence gains advantage at one end but bears disadvantage at the other. In this thesis, a generic theory of Quasi-Atomic Recovery has been formulated to accommodate message handling requirements of both kinds, and to allow using different message handling methods together. Quasi-atomicity of recovery blocks implies proper confinement of recoveries, and thus enables localization of checkpointing and recovery around such a block and consequently a hybrid strategy with combined advantages from both ends. A strategy of group checkpointing with selective logging has been proposed, based on the observation of message localization around 'locality regions' in distributed systems. In essence, a group-wise coordinated checkpoint is created around such a region and only the few inter-region messages are logged subsequently. Runtime overhead is optimized due to largely reduced logging efforts, and recovery spread is as localized as region-wise. Various protocols have been developed to provide trade-offs between flexibility and performance. Also proposed is the idea of process clone that can be used to effectively remove program-order recovery dependencies among successive group checkpoints and thus to stop inter-group recovery spread. Distributed executions exhibit locality of message interactions. Such locality originates from resolving distributed dependency localization via message passing, and appears as a hierarchical 'region-transition' pattern. A bottom-up approach has been proposed to identify those regions, by detecting popular recurrence patterns from individual processes as 'locality intervals', and then composing them into 'locality regions' based on their tight message coupling relations between each other. Experiments conducted on real-life applications have shown the existence of hierarchical locality regions and have justified the feasibility of this approach. Performance optimization of group checkpoint strategies has to do with their uses of locality. An abstract performance measure has been-proposed to properly integrate both runtime overhead and failure recoverability in a region-wise marner. Taking this measure as the optimization objective, a greedy heuristic has been introduced to decompose a given distributed execution into optimized regions. Analysis implies that an execution pattern with good locality leads to good optimized performance, and the locality pattern itself can serve as a good candidate for the optimal decomposition. Consequently, checkpoint protocols have been developed to efficiently identify optimized regions in such an execution, with assistance of either design-time or runtime knowledge

    ALGORITHMS FOR FAULT TOLERANCE IN DISTRIBUTED SYSTEMS AND ROUTING IN AD HOC NETWORKS

    Get PDF
    Checkpointing and rollback recovery are well-known techniques for coping with failures in distributed systems. Future generation Supercomputers will be message passing distributed systems consisting of millions of processors. As the number of processors grow, failure rate also grows. Thus, designing efficient checkpointing and recovery algorithms for coping with failures in such large systems is important for these systems to be fully utilized. We presented a novel communication-induced checkpointing algorithm which helps in reducing contention for accessing stable storage to store checkpoints. Under our algorithm, a process involved in a distributed computation can independently initiate consistent global checkpointing by saving its current state, called a tentative checkpoint. Other processes involved in the computation come to know about the consistent global checkpoint initiation through information piggy-backed with the application messages or limited control messages if necessary. When a process comes to know about a new consistent global checkpoint initiation, it takes a tentative checkpoint after processing the message. The tentative checkpoints taken can be flushed to stable storage when there is no contention for accessing stable storage. The tentative checkpoints together with the message logs stored in the stable storage form a consistent global checkpoint. Ad hoc networks consist of a set of nodes that can form a network for communication with each other without the aid of any infrastructure or human intervention. Nodes are energy-constrained and hence routing algorithm designed for these networks should take this into consideration. We proposed two routing protocols for mobile ad hoc networks which prevent nodes from broadcasting route requests unnecessarily during the route discovery phase and hence conserve energy and prevent contention in the network. One is called Triangle Based Routing (TBR) protocol. The other routing protocol we designed is called Routing Protocol with Selective Forwarding (RPSF). Both of the routing protocols greatly reduce the number of control packets which are needed to establish routes between pairs of source nodes and destination nodes. As a result, they reduce the energy consumed for route discovery. Moreover, these protocols reduce congestion and collision of packets due to limited number of nodes retransmitting the route requests

    Efficient Passive Clustering and Gateways selection MANETs

    Get PDF
    Passive clustering does not employ control packets to collect topological information in ad hoc networks. In our proposal, we avoid making frequent changes in cluster architecture due to repeated election and re-election of cluster heads and gateways. Our primary objective has been to make Passive Clustering more practical by employing optimal number of gateways and reduce the number of rebroadcast packets

    Non-blocking Synchronous Checkpointing Based On Rollback-dependency Trackability

    No full text
    This article proposes an original approach that applies the Rollback-Dependency Trackability (RDT) property to implement a new non-blocking synchronous checkpointing protocol, called RDT-NBS, that takes mutable checkpoints and efficiently supports concurrent initiators. Mutable checkpoints can be saved in non-stable storage and make it possible for non-blocking synchronous checkpointing protocols to save a minimal number of checkpoints in stable storage during the construction of a consistent global checkpoint. We prove that this minimality property does not hold in presence of concurrent checkpointing initiations. Even though, RDT-NBS uses mutable checkpoints to reduce the use of stable memory assuring the existence of a consistent global checkpoint in stable storage. We also present simulation results that compare RDT-NBS to quasisynchronous RDT. © 2006 IEEE.411420Baldoni, R., Helary, J., Mostefaoui, A., Raynal, M., A Communication-Induced Checkpoint Protocol that Ensures Rollback Dependency Trackability (1997) IEEE Symp. on Fault Tolerant ComputingCao, G., Singhal, M., On Coordinated Checkpointing in Distributed Systems (1998) IEEE Trans. on Parallel and Distributed Systems, 9 (12), pp. 1213-1225. , DecCao, G., Singhal, M., On the Impossibility of Min-process Non-blocking Checkpointing and an Efficient Checkpointing Algorithm for Mobile Computing Systems (1998) Proc. 27th Internat. Conf. on Parallel Processing, pp. 37-44. , New York, IEEE PressCao, G., Singhal, M., Checkpointing with Mutable Checkpoints (2003) Theoretical Computer Science, 290 (2), pp. 1127-1148. , janChandy, M., Lamport, L., Distributed Snapshots: Determining Global States of Distributed Systems (1985) ACM Transaction on Computing Systems, 3 (1), pp. 63-75. , FebE. N. Elnozahy and D. B. J. ad W. Zwaenepoel. The Performance of Consistent Checkpointing. In Proc. of the 11th Symposium on Reliable Distributed Systems, pages 86-95, Oct. 1992Garcia, I.C., Buzato, L.E., Progressive Construction of Consistent Global Checkpoints (1999) 19th IEEE International Conference on Distributed Computing Systems, , Austin, Texas, USA, JuneI. C. Garcia and L. E. Buzato. Using Common Knowledge to Improve Fixed-Dependency-After-Send. In II Workshop de Testes e Tolerância a Falhas, Curitiba, Paraná, July 2000. Available as technical report number IC-99-22 (http://www.dcc.unicamp.br/ic-tr-ftp/1999/99-22.ps.gz)Garcia, I.C., Buzato, L.E., An Efficient Checkpointing Protocol for the Minimal Characterization of Operational Rollback-Dependency Trackability (2004) 23rd Symposium on Reliable Distributed Computing Systems, , Florianópolis, Santa Catarina, OctKoo, R., Toueg, S., Checkpointing and RollbackRecovery for Distributed Systems (1987) IEEE Transaction on Software Engineering, 13, pp. 23-31. , JanKumar, P., Kumar, L., Chauhan, R., Gupta, V., A Non-intrusive Minimum Process Synchronous Checkpointing Protocol for Mobile Distributed Systems (2005) IEEE International Personal Wireless Communications, pp. 491-495. , janLamport, L., Time, Clocks, and the Ordering of Events in a Distributed System (1978) Commun. ACM, 21 (7), pp. 558-565. , JulyManivannan, D., Netzer, R.H.B., Singhal, M., Finding Consistent Global Checkpoints in a Distributed Computation (1997) IEEE Trans. on Parallel and Distributed Systems, pp. 623-627. , JuneMattern, F., Virtual Time and Global States of Distributed Systems (1989) Parallel and Distributed Algorithms, pp. 215-226. , Elsevier Science Publishers B.V, North-HollandNetzer, R.H.B., Xu, J., Necessary and Sufficient Conditions for Consistent Global Snapshots (1995) IEEE Transaction on Parallel and Distributed Systems, 6 (2), pp. 165-169Prakash, R., Singhal, M., Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems (1996) IEEE Transaction on Parallel and Distributed Systems, 7 (10), pp. 1035-1048. , OctSchmidt, F.P.R., Garcia, I.C., Buzato, L.E., Optimal Asynchronous Garbage Collection for RDT Checkpointing Protocols (2005) 25th IEEE International Conference on Distributed Computing Systems, , Columbus, Ohio, USA, JuneRandell, B., System Structure for Software Fault Tolerance (1975) IEEE Transaction on Software Engineering, 1 (2), pp. 220-232. , JuneSilva, L.M., Silva, J.G., Global Checkpointing for Distributed Programs (1992) Proc. of the 11th Symposium on Reliable Distributed Systems, pp. 155-162. , OctTsai, J., Kuo, S.-Y., Wang, Y.-M., Theoretical Analysis for Communication- Induced Checkpointing Protocols with Rollback-Dependency Trackability (1998) IEEE Transaction on Parallel and Distributed Systems, 9 (10), pp. 963-971. , OctVieira, G.M.D., Buzato, L.E., Distributed Checkpointing: Analysis and Benchmarks. Curitiba, Paraná (2006) Proceedings of Simpósio Brasileiro de Redes de Computadores, , May, To appear inWang, Y.M., Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints (1997) IEEE Trans. on Computers, 46 (4), pp. 456-468. , Ap
    corecore