26 research outputs found

    Lazy Checkpoint Coordination for Bounding Rollback Propagation

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Aeronautics and Space Administration / NASA NAG 1-613Department of the Navy managed by the Office of the Chief of Naval Research / N00014-91-J-128

    Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems

    Get PDF
    Checkpointing and rollback recovery are techniques that can provide efficient recovery from transient process failures. In a message-passing system, the rollback of a message sender may cause the rollback of the corresponding receiver, and the system needs to roll back to a consistent set of checkpoints called recovery line. If the processes are allowed to take uncoordinated checkpoints, the above rollback propagation may result in the domino effect which prevents recovery line progression. Traditionally, only obsolete checkpoints before the global recovery line can be discarded, and the necessary and sufficient condition for identifying all garbage checkpoints has remained an open problem. A necessary and sufficient condition for achieving optimal garbage collection is derived and it is proved that the number of useful checkpoints is bounded by N(N+1)/2, where N is the number of processes. The approach is based on the maximum-sized antichain model of consistent global checkpoints and the technique of recovery line transformation and decomposition. It is also shown that, for systems requiring message logging to record in-transit messages, the same approach can be used to achieve optimal message log reclamation. As a final topic, a unifying framework is described by considering checkpoint coordination and exploiting piecewise determinism as mechanisms for bounding rollback propagation, and the applicability of the optimal garbage collection algorithm to domino-free recovery protocols is demonstrated

    A Survey of Checkpointing Algorithms in Mobile Ad Hoc Network

    Get PDF
    Checkpoint is defined as a fault tolerant technique that is a designated place in a program at which normal processing is interrupted specifically to preserve the status information necessary to allow resumption of processing at a later time. If there is a failure, computation may be restarted from the current checkpoint instead of repeating the computation from beginning. Checkpoint based rollback recovery is one of the widely used technique used in various areas like scientific computing, database, telecommunication and critical applications in distributed and mobile ad hoc network. The mobile ad hoc network architecture is one consisting of a set of self configure mobile hosts capable of communicating with each other without the assistance of base stations. The main problems of this environment are insufficient power and limited storage capacity, so the checkpointing is major challenge in mobile ad hoc network. This paper presents the review of the algorithms, which have been reported for checkpointing approaches in mobile ad hoc network

    Recoverable Distributed Shared Memory Under Sequential and Relaxed Consistency

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryOffice of Naval Research / N00014-90-J-1270 and N00014-91-J-1283National Aeronautics and Space Administration / NASA NAG 1-61

    An implementation and performance measurement of the progressive retry technique

    Get PDF
    This paper describes a recovery technique called progressive retry for bypassing software faults in message-passing applications. The technique is implemented as reusable modules to provide application-level software fault tolerance. The paper describes the implementation of the technique and presents results from the application of progressive retry to two telecommunications systems. the results presented show that the technique is helpful in reducing the total recovery time for message-passing applications

    Visões progressivas de computações distribuidas

    Get PDF
    Orientador : Luiz Eduardo BuzatoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Um checkpoint é um estado selecionado por um processo durante a sua execução. Um checkpoint global é composto por um checkpoint de cada processo e é consistente se representa urna foto­grafia da computação que poderia ter sido capturada por um observador externo. Soluções para vários problemas em sistemas distribuídos necessitam de uma seqüência de checkpoints globais consistentes que descreva o progresso de urna computação distribuída. Corno primeira contri­buição desta tese, apresentamos um conjunto de algoritmos para a construção destas seqüências, denominadas visões progressivas. Outras contribuições provaram que certas suposições feitas na literatura eram falsas utilizando o argumento de que algumas propriedades precisam ser válidas ao longo de todo o progresso da computação. Durante algumas computações distribuídas, todas as dependências de retrocesso entre check­points podem ser rastreadas em tempo de execução. Esta propriedade é garantida através da indução de checkpoints imediatamente antes da formação de um padrão de mensagens que poderia dar origem a urna dependência de retrocesso não rastreável. Estudos teóricos e de simu­lação indicam que, na maioria das vezes, quanto mais restrito o padrão de mensagens, menor o número de checkpoints induzidos. Acreditava-se que a caracterização minimal para a obtenção desta propriedade estava estabelecida e que um protocolo baseado nesta caracterização precisa­ria da manutenção e propagação de informações de controle com complexidade O(n2), onde n é o número de processos na computação. A complexidade quadrática tornava o protocolo base­ado na caracterização mimimal menos interessante que protocolos baseados em caracterizações maiores, mas com complexidade linear.A segunda contribuição desta tese é uma prova de que a caracterização considerada minimal podia ser eduzida, embora a complexidade requerida por um protocolo baseado nesta nova caracterização minimal continuasse indicando ser quadrática. A terceira contribuição desta tese é a proposta de um pequeno relaxamento na caracterização minimal que propicia a implementação de um protocolo com complexidade linear e desempenho semelhante à solução quadrática. Como última contribuição, através de um estudo detalhado das variações da informação de controle durante o progresso de urna computação, propomos um protocolo que implementa exatamente a caracterização minimal, mas com complexidade linearAbstract: A checkpoint is a state selected by a process during its execution. A global checkpoint is composed of one checkpoint from each process and it is consistent if it represents a snapshot of the computation that could have been taken by an external observer. The solution to many problems in distributed systems requires a sequence of consistent global checkpoints that describes the progress of a distributed computation. As the first contribution of this thesis, we present a set of algorithms to the construction of these sequences, called progressive views. Additionally, the analysis of properties during the progress of a distributed computation allowed us to verify that some assumptions made in the literature were false. Some checkpoint patterns present only on-line trackable rollback-dependencies among check­points. This property is enforced by taking a checkpoint immediately before the formation of a message pattern that can produce a non-trackable rollback-dependency. Theoretical and simula­tion studies have shown that, most often, the more restricted the pattern, the more efficient the protocol. The minimal characterization was supposed to be known and its implementation was supposed to require the processes of the computation to maintain and propagate O(n2) control information, where n is the number of processes in the computation. The quadratic complexity makes the protocol based on the minimal characterization less interesting than protocols based on wider characterizations, but with a linear complexity. The second contribution of this thesis is a proof that the characterization that was supposed to be minimal could be reduced. However, the complexity required by a protocol based on the new minimal characterization seemed to be also quadratic. The third contribution of this thesis is a protocol based on a slightly weaker condition than the minimal characterization, but with linear complexity and performance similar to the quadratic solution. As the last contribution, through a detailed analysis of the control information computed and transmitted during the progress of distributed computations, we have proposed a protocol that implements exactly the minimal characterization, but with a linear complexityDoutoradoDoutor em Ciência da Computaçã

    Design and analysis of an efficient energy algorithm in wireless social sensor networks

    Get PDF
    Because mobile ad hoc networks have characteristics such as lack of center nodes, multi-hop routing and changeable topology, the existing checkpoint technologies for normal mobile networks cannot be applied well to mobile ad hoc networks. Considering the multi-frequency hierarchy structure of ad hoc networks, this paper proposes a hybrid checkpointing strategy which combines the techniques of synchronous checkpointing with asynchronous checkpointing, namely the checkpoints of mobile terminals in the same cluster remain synchronous, and the checkpoints in different clusters remain asynchronous. This strategy could not only avoid cascading rollback among the processes in the same cluster, but also avoid too many message transmissions among the processes in different clusters. What is more, it can reduce the communication delay. In order to assure the consistency of the global states, this paper discusses the correctness criteria of hybrid checkpointing, which includes the criteria of checkpoint taking, rollback recovery and indelibility. Based on the designed Intra-Cluster Checkpoint Dependence Graph and Inter-Cluster Checkpoint Dependence Graph, the elimination rules for different kinds of checkpoints are discussed, and the algorithms for the same cluster checkpoints, different cluster checkpoints, and rollback recovery are also given. Experimental results demonstrate the proposed hybrid checkpointing strategy is a preferable trade-off method, which not only synthetically takes all kinds of resource constraints of Ad hoc networks into account, but also outperforms the existing schemes in terms of the dependence to cluster heads, the recovery time compared to the pure synchronous, and the pure asynchronous checkpoint advantage. © 2017 by the authors. Licensee MDPI, Basel, Switzerland
    corecore