15 research outputs found

    Visões progressivas de computações distribuidas

    Get PDF
    Orientador : Luiz Eduardo BuzatoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Um checkpoint é um estado selecionado por um processo durante a sua execução. Um checkpoint global é composto por um checkpoint de cada processo e é consistente se representa urna foto­grafia da computação que poderia ter sido capturada por um observador externo. Soluções para vários problemas em sistemas distribuídos necessitam de uma seqüência de checkpoints globais consistentes que descreva o progresso de urna computação distribuída. Corno primeira contri­buição desta tese, apresentamos um conjunto de algoritmos para a construção destas seqüências, denominadas visões progressivas. Outras contribuições provaram que certas suposições feitas na literatura eram falsas utilizando o argumento de que algumas propriedades precisam ser válidas ao longo de todo o progresso da computação. Durante algumas computações distribuídas, todas as dependências de retrocesso entre check­points podem ser rastreadas em tempo de execução. Esta propriedade é garantida através da indução de checkpoints imediatamente antes da formação de um padrão de mensagens que poderia dar origem a urna dependência de retrocesso não rastreável. Estudos teóricos e de simu­lação indicam que, na maioria das vezes, quanto mais restrito o padrão de mensagens, menor o número de checkpoints induzidos. Acreditava-se que a caracterização minimal para a obtenção desta propriedade estava estabelecida e que um protocolo baseado nesta caracterização precisa­ria da manutenção e propagação de informações de controle com complexidade O(n2), onde n é o número de processos na computação. A complexidade quadrática tornava o protocolo base­ado na caracterização mimimal menos interessante que protocolos baseados em caracterizações maiores, mas com complexidade linear.A segunda contribuição desta tese é uma prova de que a caracterização considerada minimal podia ser eduzida, embora a complexidade requerida por um protocolo baseado nesta nova caracterização minimal continuasse indicando ser quadrática. A terceira contribuição desta tese é a proposta de um pequeno relaxamento na caracterização minimal que propicia a implementação de um protocolo com complexidade linear e desempenho semelhante à solução quadrática. Como última contribuição, através de um estudo detalhado das variações da informação de controle durante o progresso de urna computação, propomos um protocolo que implementa exatamente a caracterização minimal, mas com complexidade linearAbstract: A checkpoint is a state selected by a process during its execution. A global checkpoint is composed of one checkpoint from each process and it is consistent if it represents a snapshot of the computation that could have been taken by an external observer. The solution to many problems in distributed systems requires a sequence of consistent global checkpoints that describes the progress of a distributed computation. As the first contribution of this thesis, we present a set of algorithms to the construction of these sequences, called progressive views. Additionally, the analysis of properties during the progress of a distributed computation allowed us to verify that some assumptions made in the literature were false. Some checkpoint patterns present only on-line trackable rollback-dependencies among check­points. This property is enforced by taking a checkpoint immediately before the formation of a message pattern that can produce a non-trackable rollback-dependency. Theoretical and simula­tion studies have shown that, most often, the more restricted the pattern, the more efficient the protocol. The minimal characterization was supposed to be known and its implementation was supposed to require the processes of the computation to maintain and propagate O(n2) control information, where n is the number of processes in the computation. The quadratic complexity makes the protocol based on the minimal characterization less interesting than protocols based on wider characterizations, but with a linear complexity. The second contribution of this thesis is a proof that the characterization that was supposed to be minimal could be reduced. However, the complexity required by a protocol based on the new minimal characterization seemed to be also quadratic. The third contribution of this thesis is a protocol based on a slightly weaker condition than the minimal characterization, but with linear complexity and performance similar to the quadratic solution. As the last contribution, through a detailed analysis of the control information computed and transmitted during the progress of distributed computations, we have proposed a protocol that implements exactly the minimal characterization, but with a linear complexityDoutoradoDoutor em Ciência da Computaçã

    Optimal Asynchronous Garbage Collection for RDT Checkpointing Protocols

    Get PDF
    Communication-induced checkpointing protocols that ensure rollback-dependency trackability (RDT) guarantee important properties to the recovery system without explicit coordination. However, to the best of our knowledge, there was no garbage collection algorithm for them which did not use some type of process synchronization, like time assumptions or reliable control message exchanges. This paper addresses the problem of garbage collection for RDT checkpointing protocols and presents an optimal solution for the case where coordination is done only by means of timestamps piggybacked in application messages. Our algorithm uses the same timestamps as off-the-shelf RDT protocols and ensures the tight upper bound on the number of uncollected checkpoints for each process during all the system execution

    CHECKPOINTING AND RECOVERY IN DISTRIBUTED AND DATABASE SYSTEMS

    Get PDF
    A transaction-consistent global checkpoint of a database records a state of the database which reflects the effect of only completed transactions and not the re- sults of any partially executed transactions. This thesis establishes the necessary and sufficient conditions for a checkpoint of a data item (or the checkpoints of a set of data items) to be part of a transaction-consistent global checkpoint of the database. This result would be useful for constructing transaction-consistent global checkpoints incrementally from the checkpoints of each individual data item of a database. By applying this condition, we can start from any useful checkpoint of any data item and then incrementally add checkpoints of other data items until we get a transaction- consistent global checkpoint of the database. This result can also help in designing non-intrusive checkpointing protocols for database systems. Based on the intuition gained from the development of the necessary and sufficient conditions, we also de- veloped a non-intrusive low-overhead checkpointing protocol for distributed database systems. Checkpointing and rollback recovery are also established techniques for achiev- ing fault-tolerance in distributed systems. Communication-induced checkpointing algorithms allow processes involved in a distributed computation take checkpoints independently while at the same time force processes to take additional checkpoints to make each checkpoint to be part of a consistent global checkpoint. This thesis develops a low-overhead communication-induced checkpointing protocol and presents a performance evaluation of the protocol

    Software architecture for fault-recovery using quasi-synchronous checkpointing

    Get PDF
    Orientadores: Islene Calciolari GarciaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Um sistema distribuído tolerante a falhas que utilize recuperação por retrocesso de estado deve selecionar os checkpoints dos seus processos que serão gravados. Além dessa seleção, definida por um protocolo de checkpointing, o sistema precisa realizar uma coleta de lixo, para eliminar os checkpoints que se tornam obsoletos à medida que a aplicação executa. Assim, na ocorrência de uma falha, a computação pode ser retrocedida para um estado consistente salvo anteriormente. Esta dissertação discute os aspectos teóricos e práticos de um sistema distribuído tolerante a falhas que utiliza protocolos de checkpointing quase-síncronos e algoritmos para a coleta de lixo e recuperação por retrocesso. Existem vários protocolos de checkpointing na literatura, e nesta dissertação foram estudados os protocolos de checkpointing quase-síncronos. Esses protocols enviam informações de controle juntamente com as mensagens da aplicação, e podem exigir a gravação de checkpoints forçados, mas não necessitam de sincronização ou troca de mensagens de controle entre os processos. Com base nesse estudo, um framework para protocolos de checkpointing quase-sincronos foi implementado numa biblioteca de troca de mensagens chamada LAM/MPI. Além disso, uma arquitetura de software para recuperação de falhas por retrocesso de estado chamada Curupira também foi estudada e implementada naquela biblioteca. O Curupira_e a primeira arquitetura de software que n~ao precisa de troca de mensagens de controle ou qualquer sincronização entre os processos na execução dos protocolos de checkpointing e de coleta de lixoAbstract: A fault-tolerant distributed system based on rollback-recovery has to checkpoints of its processes are stored. Besides this selection, that is controlled checkpointing protocol, the system has to do garbage collection, in order to eliminate that become obsolete while the application executes. The garbage collection because checkpoints require the use of storage resources and the storage has limited capacity. So, when some fault occurs, the whole distributed be restored to a consistent global state previously stored. This dissertation practical and theoretical aspects of a fault-tolerant distributed system quasisynchronous checkpointing protocols and also garbage collection and algorithms. There are several checkpointing protocols proposed in the literature, quasisynchronous ones were studied in this dissertation. These protocols information in the application's messages and can induce forced checkpoints, need any synchronization or exchanging of control messages among on that study, a framework for quasi-synchronous checkpointing implemented in a message passing library called LAM/MPI. Moreover, a based on rollback-recovery from faults named Curupira was also implemented in that library. Curupira is the _rst software architecture exchanging of control messages or any synchronization among the execution of the checkpointing and garbage collection protocolsMestradoSistemas DistribuidosMestre em Ciência da Computaçã

    Coleta de lixo para protocolos de checkpointing

    Get PDF
    Orientadores : Luiz Eduardo Buzato, Islene Calciolari GarciaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação CientificaMestrad

    Locality-driven checkpoint and recovery

    Get PDF
    Checkpoint and recovery are important fault-tolerance techniques for distributed systems. The two categories of existing strategies incur unacceptable performance cost either at run time or upon failure recovery, when applied to large-scale distributed systems. In particular, the large number of messages and processes in these systems causes either considerable checkpoint as well as logging overhead, or catastrophic global-wise recovery effect. This thesis proposes a locality-driven strategy for efficiently checkpointing and recovering such systems with both affordable runtime cost and controllable failure recoverability. Messages establish dependencies between distributed processes, which can be either preserved by coordinated checkpoints or removed via logging. Existing strategies enforce a uniform handling policy for all message dependencies, and hence gains advantage at one end but bears disadvantage at the other. In this thesis, a generic theory of Quasi-Atomic Recovery has been formulated to accommodate message handling requirements of both kinds, and to allow using different message handling methods together. Quasi-atomicity of recovery blocks implies proper confinement of recoveries, and thus enables localization of checkpointing and recovery around such a block and consequently a hybrid strategy with combined advantages from both ends. A strategy of group checkpointing with selective logging has been proposed, based on the observation of message localization around 'locality regions' in distributed systems. In essence, a group-wise coordinated checkpoint is created around such a region and only the few inter-region messages are logged subsequently. Runtime overhead is optimized due to largely reduced logging efforts, and recovery spread is as localized as region-wise. Various protocols have been developed to provide trade-offs between flexibility and performance. Also proposed is the idea of process clone that can be used to effectively remove program-order recovery dependencies among successive group checkpoints and thus to stop inter-group recovery spread. Distributed executions exhibit locality of message interactions. Such locality originates from resolving distributed dependency localization via message passing, and appears as a hierarchical 'region-transition' pattern. A bottom-up approach has been proposed to identify those regions, by detecting popular recurrence patterns from individual processes as 'locality intervals', and then composing them into 'locality regions' based on their tight message coupling relations between each other. Experiments conducted on real-life applications have shown the existence of hierarchical locality regions and have justified the feasibility of this approach. Performance optimization of group checkpoint strategies has to do with their uses of locality. An abstract performance measure has been-proposed to properly integrate both runtime overhead and failure recoverability in a region-wise marner. Taking this measure as the optimization objective, a greedy heuristic has been introduced to decompose a given distributed execution into optimized regions. Analysis implies that an execution pattern with good locality leads to good optimized performance, and the locality pattern itself can serve as a good candidate for the optimal decomposition. Consequently, checkpoint protocols have been developed to efficiently identify optimized regions in such an execution, with assistance of either design-time or runtime knowledge

    Reinforcing Digital Trust for Cloud Manufacturing Through Data Provenance Using Ethereum Smart Contracts

    Get PDF
    Cloud Manufacturing(CMfg) is an advanced manufacturing model that caters to fast-paced agile requirements (Putnik, 2012). For manufacturing complex products that require extensive resources, manufacturers explore advanced manufacturing techniques like CMfg as it becomes infeasible to achieve high standards through complete ownership of manufacturing artifacts (Kuan et al., 2011). CMfg, with other names such as Manufacturing as a Service (MaaS) and Cyber Manufacturing (NSF, 2020), addresses the shortcoming of traditional manufacturing by building a virtual cyber enterprise of geographically distributed entities that manufacture custom products through collaboration. With manufacturing venturing into cyberspace, Digital Trust issues concerning product quality, data, and intellectual property security, become significant concerns (R. Li et al., 2019). This study establishes a trust mechanism through data provenance for ensuring digital trust between various stakeholders involved in CMfg. A trust model with smart contracts built on the Ethereum blockchain implements data provenance in CMfg. The study covers three data provenance models using Ethereum smart contracts for establishing digital trust in CMfg. These are Product Provenance, Order Provenance, and Operational Provenance. The models of provenance together address the most important questions regarding CMfg: What goes into the product, who manufactures the product, who transports the products, under what conditions the products are manufactured, and whether regulatory constraints/requisites are met

    Efficient Passive Clustering and Gateways selection MANETs

    Get PDF
    Passive clustering does not employ control packets to collect topological information in ad hoc networks. In our proposal, we avoid making frequent changes in cluster architecture due to repeated election and re-election of cluster heads and gateways. Our primary objective has been to make Passive Clustering more practical by employing optimal number of gateways and reduce the number of rebroadcast packets

    On The Minimal Characterization Of The Rollback-dependency Trackability Property

    No full text
    Checkpoint and communication patterns that enforce rollback-dependency trackability (RDT) have only on-line trackable checkpoint dependencies and allow efficient solutions to the determination of consistent global checkpoints. Baldoni, Helary, and Raynal have explored RDT at the message level, in which checkpoint dependencies are represented by zigzag paths. They have presented many characterizations of RDT and conjectured that a certain communication pattern characterizes the minimal set of zigzag paths that must be tested on-line by a checkpointing protocol in order to enforce RDT. The contributions of this work are (i) a proof that their conjecture is false, (ii) a minimal characterization of RDT, and (iii) introduction of an original approach to analyze RDT checkpointing protocols.34234

    An Efficient Checkpointing Protocol For The Minimal Characterization Of Operational Rollback-dependency Trackability

    No full text
    A checkpointing protocol that enforces rollback-dependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of non-trackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of non-trackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n 2) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information. © 2004 IEEE.126135Baldoni, R., Helary, J.M., Mostefaoui, A., Raynal, M., A communication-induced checkpoint protocol that ensures rollback dependency trackability (1997) IEEE Symposium on Fault Tolerant Computing (FTCS'97), pp. 68-77Baldoni, R., Helary, J.M., Raynal, M., Rollback-dependency trackability: Visible characterizations (1999) 18th Symposium on the Principles of Distributed Computing (PODC'99), , Atlanta (USA), MayBaldoni, R., Helary, J.M., Raynal, M., Rollback-dependency trackability: A minimal characterization and its protocol (2001) Information and Computation, 165 (2), pp. 144-173. , MarCao, G., Singhal, M., Checkpointing with mutable check-points (2003) Theoretical Computer Science, 209 (2), pp. 1127-1148Babaoǧlu, Ö., Marzullo, K., Consistent global states of distributed systems: Fundamental concepts and mechanisms (1993) Distributed Systems, pp. 55-96. , S. Mullender, editor. Addison-WesleyElnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B., A survey of rollback-recovery protocols in message-passing systems (2002) ACM Computing Surveys, 34 (3), pp. 375-408. , SeptGarcia, I.C., Buzato, L.E., On the minimal characterization of rollback-dependency trackability property (2001) Proceedings of the 21th IEEE Int. Conf. on Distributed Computing Systems, , Phoenix, Arizona, EUA, AprGarcia, I.C., Vieira, G.M.D., Buzato, L.E., RDT-partner: An efficient checkpointing protocol that enforces rollback-dependency trackability (2001) Simpósio Brasileiro de Redes de Computadores, , Florianópolis, Santa Catarina, MayVenkatesh, T.R.K., Li, H.F., Optimal checkpointing and local recording for domino-free rollback recovery (1987) Information Processing Letters, 25 (5), pp. 295-303Lamport, L., Time, clocks, and the ordering of events in a distributed system (1978) Commun. ACM, 21 (7), pp. 558-565. , JulyManivannan, D., Singhal, M., Quasi-synchronous check-pointing: Models, characterization, and classification (1999) IEEE Trans, on Parallel and Distributed Systems, 10 (7). , JulyNetzer, R.H.B., Xu, J., Necessary and sufficient conditions for consistent global snapshots (1995) IEEE Trans. on Parallel and Distributed Systems, 6 (2), pp. 165-169Prakash, R., Singhal, M., Low-cost checkpointing and failure recovery in mobile computing systems (1996) IEEE Trans. on Parallel and Distributed Systems, 7 (10), pp. 1035-1048. , OctTsai, J., On properties of RDT communication-induced checkpointing protocols (2003) IEEE Trans. on Parallel and Distributed Systems, 14 (8). , AugTsai, J., Kuo, S.Y., Wang, Y.M., Theoretical analysis for communication-induced checkpointing protocols with rollback-dependency trackability (1998) IEEE Trans. on Parallel and Distributed Systems, , OctWang, Y.M., Consistent global checkpoints that contain a given set of local checkpoints (1997) IEEE Trans, on Computers, 46 (4), pp. 456-468. , Ap
    corecore