5 research outputs found

    CHECKPOINTING AND RECOVERY IN DISTRIBUTED AND DATABASE SYSTEMS

    Get PDF
    A transaction-consistent global checkpoint of a database records a state of the database which reflects the effect of only completed transactions and not the re- sults of any partially executed transactions. This thesis establishes the necessary and sufficient conditions for a checkpoint of a data item (or the checkpoints of a set of data items) to be part of a transaction-consistent global checkpoint of the database. This result would be useful for constructing transaction-consistent global checkpoints incrementally from the checkpoints of each individual data item of a database. By applying this condition, we can start from any useful checkpoint of any data item and then incrementally add checkpoints of other data items until we get a transaction- consistent global checkpoint of the database. This result can also help in designing non-intrusive checkpointing protocols for database systems. Based on the intuition gained from the development of the necessary and sufficient conditions, we also de- veloped a non-intrusive low-overhead checkpointing protocol for distributed database systems. Checkpointing and rollback recovery are also established techniques for achiev- ing fault-tolerance in distributed systems. Communication-induced checkpointing algorithms allow processes involved in a distributed computation take checkpoints independently while at the same time force processes to take additional checkpoints to make each checkpoint to be part of a consistent global checkpoint. This thesis develops a low-overhead communication-induced checkpointing protocol and presents a performance evaluation of the protocol

    Visões progressivas de computações distribuidas

    Get PDF
    Orientador : Luiz Eduardo BuzatoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Um checkpoint é um estado selecionado por um processo durante a sua execução. Um checkpoint global é composto por um checkpoint de cada processo e é consistente se representa urna foto­grafia da computação que poderia ter sido capturada por um observador externo. Soluções para vários problemas em sistemas distribuídos necessitam de uma seqüência de checkpoints globais consistentes que descreva o progresso de urna computação distribuída. Corno primeira contri­buição desta tese, apresentamos um conjunto de algoritmos para a construção destas seqüências, denominadas visões progressivas. Outras contribuições provaram que certas suposições feitas na literatura eram falsas utilizando o argumento de que algumas propriedades precisam ser válidas ao longo de todo o progresso da computação. Durante algumas computações distribuídas, todas as dependências de retrocesso entre check­points podem ser rastreadas em tempo de execução. Esta propriedade é garantida através da indução de checkpoints imediatamente antes da formação de um padrão de mensagens que poderia dar origem a urna dependência de retrocesso não rastreável. Estudos teóricos e de simu­lação indicam que, na maioria das vezes, quanto mais restrito o padrão de mensagens, menor o número de checkpoints induzidos. Acreditava-se que a caracterização minimal para a obtenção desta propriedade estava estabelecida e que um protocolo baseado nesta caracterização precisa­ria da manutenção e propagação de informações de controle com complexidade O(n2), onde n é o número de processos na computação. A complexidade quadrática tornava o protocolo base­ado na caracterização mimimal menos interessante que protocolos baseados em caracterizações maiores, mas com complexidade linear.A segunda contribuição desta tese é uma prova de que a caracterização considerada minimal podia ser eduzida, embora a complexidade requerida por um protocolo baseado nesta nova caracterização minimal continuasse indicando ser quadrática. A terceira contribuição desta tese é a proposta de um pequeno relaxamento na caracterização minimal que propicia a implementação de um protocolo com complexidade linear e desempenho semelhante à solução quadrática. Como última contribuição, através de um estudo detalhado das variações da informação de controle durante o progresso de urna computação, propomos um protocolo que implementa exatamente a caracterização minimal, mas com complexidade linearAbstract: A checkpoint is a state selected by a process during its execution. A global checkpoint is composed of one checkpoint from each process and it is consistent if it represents a snapshot of the computation that could have been taken by an external observer. The solution to many problems in distributed systems requires a sequence of consistent global checkpoints that describes the progress of a distributed computation. As the first contribution of this thesis, we present a set of algorithms to the construction of these sequences, called progressive views. Additionally, the analysis of properties during the progress of a distributed computation allowed us to verify that some assumptions made in the literature were false. Some checkpoint patterns present only on-line trackable rollback-dependencies among check­points. This property is enforced by taking a checkpoint immediately before the formation of a message pattern that can produce a non-trackable rollback-dependency. Theoretical and simula­tion studies have shown that, most often, the more restricted the pattern, the more efficient the protocol. The minimal characterization was supposed to be known and its implementation was supposed to require the processes of the computation to maintain and propagate O(n2) control information, where n is the number of processes in the computation. The quadratic complexity makes the protocol based on the minimal characterization less interesting than protocols based on wider characterizations, but with a linear complexity. The second contribution of this thesis is a proof that the characterization that was supposed to be minimal could be reduced. However, the complexity required by a protocol based on the new minimal characterization seemed to be also quadratic. The third contribution of this thesis is a protocol based on a slightly weaker condition than the minimal characterization, but with linear complexity and performance similar to the quadratic solution. As the last contribution, through a detailed analysis of the control information computed and transmitted during the progress of distributed computations, we have proposed a protocol that implements exactly the minimal characterization, but with a linear complexityDoutoradoDoutor em Ciência da Computaçã

    Locality-driven checkpoint and recovery

    Get PDF
    Checkpoint and recovery are important fault-tolerance techniques for distributed systems. The two categories of existing strategies incur unacceptable performance cost either at run time or upon failure recovery, when applied to large-scale distributed systems. In particular, the large number of messages and processes in these systems causes either considerable checkpoint as well as logging overhead, or catastrophic global-wise recovery effect. This thesis proposes a locality-driven strategy for efficiently checkpointing and recovering such systems with both affordable runtime cost and controllable failure recoverability. Messages establish dependencies between distributed processes, which can be either preserved by coordinated checkpoints or removed via logging. Existing strategies enforce a uniform handling policy for all message dependencies, and hence gains advantage at one end but bears disadvantage at the other. In this thesis, a generic theory of Quasi-Atomic Recovery has been formulated to accommodate message handling requirements of both kinds, and to allow using different message handling methods together. Quasi-atomicity of recovery blocks implies proper confinement of recoveries, and thus enables localization of checkpointing and recovery around such a block and consequently a hybrid strategy with combined advantages from both ends. A strategy of group checkpointing with selective logging has been proposed, based on the observation of message localization around 'locality regions' in distributed systems. In essence, a group-wise coordinated checkpoint is created around such a region and only the few inter-region messages are logged subsequently. Runtime overhead is optimized due to largely reduced logging efforts, and recovery spread is as localized as region-wise. Various protocols have been developed to provide trade-offs between flexibility and performance. Also proposed is the idea of process clone that can be used to effectively remove program-order recovery dependencies among successive group checkpoints and thus to stop inter-group recovery spread. Distributed executions exhibit locality of message interactions. Such locality originates from resolving distributed dependency localization via message passing, and appears as a hierarchical 'region-transition' pattern. A bottom-up approach has been proposed to identify those regions, by detecting popular recurrence patterns from individual processes as 'locality intervals', and then composing them into 'locality regions' based on their tight message coupling relations between each other. Experiments conducted on real-life applications have shown the existence of hierarchical locality regions and have justified the feasibility of this approach. Performance optimization of group checkpoint strategies has to do with their uses of locality. An abstract performance measure has been-proposed to properly integrate both runtime overhead and failure recoverability in a region-wise marner. Taking this measure as the optimization objective, a greedy heuristic has been introduced to decompose a given distributed execution into optimized regions. Analysis implies that an execution pattern with good locality leads to good optimized performance, and the locality pattern itself can serve as a good candidate for the optimal decomposition. Consequently, checkpoint protocols have been developed to efficiently identify optimized regions in such an execution, with assistance of either design-time or runtime knowledge

    Estudo comparativo de algoritmos para checkpointing

    Get PDF
    Orientador : Luiz Eduardo BuzatoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Esta dissertação fornece um estudo comparativo abrangente de algoritmos quase-síncronos para checkpointing. Para tanto, utilizamos a simulação de sistemas distribuídos que nos oferece liberdade para construirmos modelos de sistemas com grande facilidade. O estudo comparativo avaliou pela primeira vez de forma uniforme o impacto sobre o desempenho dos algoritmos de fatores como a escala do sistema, a freqüência de check points básicos e a diferença na velocidade dos processos da aplicação. Com base nestes dados obtivemos um profundo conhecimento sobre o comportamento destes algoritmos e produzimos um valioso referencial para projetistas de sistemas em busca de algoritmos para check pointing para as suas aplicações distribuídasAbstract: This dissertation provides a comprehensive comparative study ofthe performance of quase synchronous check pointing algorithms. To do so we used the simulation of distributed systems, which provides freedom to build system models easily. The comparative study assessed for the first time in an uniform environment the impact of the algorithms' performance with respect to factors such as the system's scale, the basic checkpoint rate and the relative processes' speed. By analyzing these data we acquired a deep understanding of the behavior of these algorithms and were able to produce a valuable reference to system architects looking for check pointing algorithms for their distributed applicationsMestradoMestre em Ciência da Computaçã

    A VP-accordant checkpointing protocol preventing useless checkpoints

    No full text
    corecore