4 research outputs found

    Visões progressivas de computações distribuidas

    Get PDF
    Orientador : Luiz Eduardo BuzatoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Um checkpoint é um estado selecionado por um processo durante a sua execução. Um checkpoint global é composto por um checkpoint de cada processo e é consistente se representa urna foto­grafia da computação que poderia ter sido capturada por um observador externo. Soluções para vários problemas em sistemas distribuídos necessitam de uma seqüência de checkpoints globais consistentes que descreva o progresso de urna computação distribuída. Corno primeira contri­buição desta tese, apresentamos um conjunto de algoritmos para a construção destas seqüências, denominadas visões progressivas. Outras contribuições provaram que certas suposições feitas na literatura eram falsas utilizando o argumento de que algumas propriedades precisam ser válidas ao longo de todo o progresso da computação. Durante algumas computações distribuídas, todas as dependências de retrocesso entre check­points podem ser rastreadas em tempo de execução. Esta propriedade é garantida através da indução de checkpoints imediatamente antes da formação de um padrão de mensagens que poderia dar origem a urna dependência de retrocesso não rastreável. Estudos teóricos e de simu­lação indicam que, na maioria das vezes, quanto mais restrito o padrão de mensagens, menor o número de checkpoints induzidos. Acreditava-se que a caracterização minimal para a obtenção desta propriedade estava estabelecida e que um protocolo baseado nesta caracterização precisa­ria da manutenção e propagação de informações de controle com complexidade O(n2), onde n é o número de processos na computação. A complexidade quadrática tornava o protocolo base­ado na caracterização mimimal menos interessante que protocolos baseados em caracterizações maiores, mas com complexidade linear.A segunda contribuição desta tese é uma prova de que a caracterização considerada minimal podia ser eduzida, embora a complexidade requerida por um protocolo baseado nesta nova caracterização minimal continuasse indicando ser quadrática. A terceira contribuição desta tese é a proposta de um pequeno relaxamento na caracterização minimal que propicia a implementação de um protocolo com complexidade linear e desempenho semelhante à solução quadrática. Como última contribuição, através de um estudo detalhado das variações da informação de controle durante o progresso de urna computação, propomos um protocolo que implementa exatamente a caracterização minimal, mas com complexidade linearAbstract: A checkpoint is a state selected by a process during its execution. A global checkpoint is composed of one checkpoint from each process and it is consistent if it represents a snapshot of the computation that could have been taken by an external observer. The solution to many problems in distributed systems requires a sequence of consistent global checkpoints that describes the progress of a distributed computation. As the first contribution of this thesis, we present a set of algorithms to the construction of these sequences, called progressive views. Additionally, the analysis of properties during the progress of a distributed computation allowed us to verify that some assumptions made in the literature were false. Some checkpoint patterns present only on-line trackable rollback-dependencies among check­points. This property is enforced by taking a checkpoint immediately before the formation of a message pattern that can produce a non-trackable rollback-dependency. Theoretical and simula­tion studies have shown that, most often, the more restricted the pattern, the more efficient the protocol. The minimal characterization was supposed to be known and its implementation was supposed to require the processes of the computation to maintain and propagate O(n2) control information, where n is the number of processes in the computation. The quadratic complexity makes the protocol based on the minimal characterization less interesting than protocols based on wider characterizations, but with a linear complexity. The second contribution of this thesis is a proof that the characterization that was supposed to be minimal could be reduced. However, the complexity required by a protocol based on the new minimal characterization seemed to be also quadratic. The third contribution of this thesis is a protocol based on a slightly weaker condition than the minimal characterization, but with linear complexity and performance similar to the quadratic solution. As the last contribution, through a detailed analysis of the control information computed and transmitted during the progress of distributed computations, we have proposed a protocol that implements exactly the minimal characterization, but with a linear complexityDoutoradoDoutor em Ciência da Computaçã

    On The Minimal Characterization Of The Rollback-dependency Trackability Property

    No full text
    Checkpoint and communication patterns that enforce rollback-dependency trackability (RDT) have only on-line trackable checkpoint dependencies and allow efficient solutions to the determination of consistent global checkpoints. Baldoni, Helary, and Raynal have explored RDT at the message level, in which checkpoint dependencies are represented by zigzag paths. They have presented many characterizations of RDT and conjectured that a certain communication pattern characterizes the minimal set of zigzag paths that must be tested on-line by a checkpointing protocol in order to enforce RDT. The contributions of this work are (i) a proof that their conjecture is false, (ii) a minimal characterization of RDT, and (iii) introduction of an original approach to analyze RDT checkpointing protocols.34234

    Coleta de lixo para protocolos de checkpointing

    Get PDF
    Orientadores : Luiz Eduardo Buzato, Islene Calciolari GarciaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação CientificaMestrad

    Optimal Asynchronous Garbage Collection For Rdt Checkpointing Protocols

    No full text
    Communication-induced checkpointing protocols that ensure rollback-dependency trackability (RDT) guarantee important properties to the recovery system without explicit coordination. However, to the best of our knowledge, there was no garbage collection algorithm for them which did not use some type of process synchronization, like time assumptions or reliable control message exchanges. This paper addresses the problem of garbage collection for RDT checkpointing protocols and presents an optimal solution for the case where coordination is done only by means of timestamps piggybacked in application messages. Our algorithm uses the same timestamps as off-the-shelf RDT protocols and ensures the tight upper bound on the number of uncollected checkpoints for each process during all the system execution. © 2005 IEEE.167176Agbaria, A., Attiya, H., Friedman, R., Vitenberg, R., Quantifying rollback propagation in distributed checkpointing (2001) Proceedings of the 20th Symposium on Reliable Distributed Systems, pp. 36-45. , OctAlvisi, L., Elnozahy, E., Rao, S., Husain, S.A., Mel, A.D., An analysis of communication-induced checkpointing (1999) Proceedings of the 29th IEEE Symposium on Fault-tolerant Computing, pp. 242-249. , JuneBaldoni, R., Hélary, J.M., Mostéfaoui, A., Raynal, M., A communication-induced checkpoint protocol that ensures rollback dependency trackability (1997) Proceedings of the 27th IEEE Symposium on Fault-tolerant Computing, pp. 68-77. , JuneBaldoni, R., Hélary, J.M., Raynal, M., Rollback-dependency trackability: Visible characterizations (1999) Proceedings of the 18th ACM Symposium on Principles of Distributed Computing, pp. 33-42. , MayBhargava, B., Lian, S.R., Independent checkpointing and concurrent rollback for recovery - An optimistic approach (1988) Proceedings of the 7th Symposium on Reliable Distributed Systems, pp. 3-12Chandy, M., Lamport, L., Distributed snapshots: Determining global states of distributed systems (1985) ACM Trans. on Computing Systems, 3 (1), pp. 63-75. , FebBabaoǧlu, O., Marzullo, K., Consistent global states of distributed systems: Fundamental concepts and mechanisms (1993) Distributed Systems, pp. 55-96. , In S. Mullender, editor. Addison-WesleyElnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B., A survey of rollback-recovery protocols in message-passing systems (2002) ACM Computing Surveys, 34 (3), pp. 375-408. , SeptGarcia, I.C., Buzato, L.E., On the minimal characterization of the rollback-dependency trackability property (2001) Proceedings of the 21th IEEE Int. Conf. on Distributed Computing Systems, , AprGarcia, I.C., Buzato, L.E., An efficient checkpointing protocol for the minimal characterization of the operational rollback-dependency trackability (2004) Proceedings of the 23th IEEE Symposium on Reliable Distributed Systems, , OctHuang, Y., Kintala, C., Software implemented fault tolerance: Technologies and experiences (1993) Proceedings of the 16th IEEE Fault-Tolerant Computing Symp., pp. 2-9. , JuneKoo, R., Toueg, S., Checkpointing and rollback-recovery for distributed systems (1987) IEEE Trans. on Software Engineering, 13, pp. 23-31. , JanLamport, L., Time, clocks, and the ordering of events in a distributed system (1978) Commun. ACM, 21 (7), pp. 558-565. , JulyManivannan, D., Singhal, M., A low-overhead recovery technique using quasi-synchronous checkpointing (1996) Proceedings of the 16th IEEE Int. Conf. on Distributed Computing Systems, , MayManivannan, D., Singhal, M., Quasi-synchronous checkpointing: Models, characterization, and classification (1999) IEEE Trans. on Parallel and Distributed Systems, 10 (7), pp. 703-713. , JulyNetzer, R.H.B., Xu, J., Necessary and sufficient conditions for consistent global snapshots (1995) IEEE Trans. on Parallel and Distributed Systems, 6 (2), pp. 165-169Randell, B., System structure for software fault tolerance (1975) IEEE Trans. on Software Engineering, 1 (2), pp. 220-232. , JuneSchmidt, R., Garcia, I.C., Pedone, F., Buzato, L.E., Optimal asynchronous garbage collection for checkpointing protocols with rollback-dependency trackability (2004) Technical Report, IC-2004-45. , School of Computer and Communicaiton Sciences, EPFLStrom, R., Yemini, S., Optimistic recovery in distributed systems (1985) ACM Trans. on Computing Systems, 3 (3), pp. 204-226. , AugTsai, J., Kuo, S.Y., Wang, Y.M., Theoretical analysis for communication-induced checkpointing protocols with rollback-dependency trackability (1998) IEEE Trans. on Parallel and Distributed Systems, 9 (10), pp. 963-971. , OctWang, Y.M., Consistent global checkpoints that contain a given set of local checkpoints (1997) IEEE Trans. on Computers, 46 (4), pp. 456-468. , AprWang, Y.M., Chung, P.Y., Lin, I.J., Fuchs, W.K., Checkpoint space reclamation for uncoordinated checkpointing in message-passing systems (1995) IEEE Trans. on Parallel and Distributed Systems, 6 (5), pp. 546-554. , Ma
    corecore