8 research outputs found

    GORDA: an open architecture for database replication

    Get PDF
    Database replication has been a common feature in database management systems (DBMSs) for a long time. In particular, asynchronous or lazy propagation of updates provides a simple yet efficient way of increasing performance and data availability and is widely available across the DBMS product spectrum. High end systems additionally offer sophisticated conflict resolution and data propagation options as well as, synchronous replication based on distributed locking and two-phase commit protocols. This paper presents GORDA architecture and programming interface (GAPI), that enables different replication strategies to be implemented once and deployed in multiple DBMSs. This is achieved by proposing a reflective interface to transaction processing instead of relying on-client interfaces or ad-hoc server extensions. The proposed approach is thus cost-effective, in enabling reuse of replication protocols or components in multiple DBMSs, as well as potentially efficient, as it allows close coupling with DBMS internals.(undefined

    Optimizing recovery protocols for replicated database systems

    Full text link
    En la actualidad, el uso de tecnologías de informacíon y sistemas de cómputo tienen una gran influencia en la vida diaria. Dentro de los sistemas informáticos actualmente en uso, son de gran relevancia los sistemas distribuidos por la capacidad que pueden tener para escalar, proporcionar soporte para la tolerancia a fallos y mejorar el desempeño de aplicaciones y proporcionar alta disponibilidad. Los sistemas replicados son un caso especial de los sistemas distribuidos. Esta tesis está centrada en el área de las bases de datos replicadas debido al uso extendido que en el presente se hace de ellas, requiriendo características como: bajos tiempos de respuesta, alto rendimiento en los procesos, balanceo de carga entre las replicas, consistencia e integridad de datos y tolerancia a fallos. En este contexto, el desarrollo de aplicaciones utilizando bases de datos replicadas presenta dificultades que pueden verse atenuadas mediante el uso de servicios de soporte a mas bajo nivel tales como servicios de comunicacion y pertenencia. El uso de los servicios proporcionados por los sistemas de comunicación de grupos permiten ocultar los detalles de las comunicaciones y facilitan el diseño de protocolos de replicación y recuperación. En esta tesis, se presenta un estudio de las alternativas y estrategias empleadas en los protocolos de replicación y recuperación en las bases de datos replicadas. También se revisan diferentes conceptos sobre los sistemas de comunicación de grupos y sincronia virtual. Se caracterizan y clasifican diferentes tipos de protocolos de replicación con respecto a la interacción o soporte que pudieran dar a la recuperación, sin embargo el enfoque se dirige a los protocolos basados en sistemas de comunicación de grupos. Debido a que los sistemas comerciales actuales permiten a los programadores y administradores de sistemas de bases de datos renunciar en alguna medida a la consistencia con la finalidad de aumentar el rendimiento, es importante determinar el nivel de consistencia necesario. En el caso de las bases de datos replicadas la consistencia está muy relacionada con el nivel de aislamiento establecido entre las transacciones. Una de las propuestas centrales de esta tesis es un protocolo de recuperación para un protocolo de replicación basado en certificación. Los protocolos de replicación de base de datos basados en certificación proveen buenas bases para el desarrollo de sus respectivos protocolos de recuperación cuando se utiliza el nivel de aislamiento snapshot. Para tal nivel de aislamiento no se requiere que los readsets sean transferidos entre las réplicas ni revisados en la fase de cetificación y ya que estos protocolos mantienen un histórico de la lista de writesets que es utilizada para certificar las transacciones, este histórico provee la información necesaria para transferir el estado perdido por la réplica en recuperación. Se hace un estudio del rendimiento del protocolo de recuperación básico y de la versión optimizada en la que se compacta la información a transferir. Se presentan los resultados obtenidos en las pruebas de la implementación del protocolo de recuperación en el middleware de soporte. La segunda propuesta esta basada en aplicar el principio de compactación de la informacion de recuperación en un protocolo de recuperación para los protocolos de replicación basados en votación débil. El objetivo es minimizar el tiempo necesario para transfeir y aplicar la información perdida por la réplica en recuperación obteniendo con esto un protocolo de recuperación mas eficiente. Se ha verificado el buen desempeño de este algoritmo a través de una simulación. Para efectuar la simulación se ha hecho uso del entorno de simulación Omnet++. En los resultados de los experimentos puede apreciarse que este protocolo de recuperación tiene buenos resultados en múltiples escenarios. Finalmente, se presenta la verificación de la corrección de ambos algoritmos de recuperación en el Capítulo 5.Nowadays, information technology and computing systems have a great relevance on our lives. Among current computer systems, distributed systems are one of the most important because of their scalability, fault tolerance, performance improvements and high availability. Replicated systems are a specific case of distributed system. This Ph.D. thesis is centered in the replicated database field due to their extended usage, requiring among other properties: low response times, high throughput, load balancing among replicas, data consistency, data integrity and fault tolerance. In this scope, the development of applications that use replicated databases raises some problems that can be reduced using other fault-tolerant building blocks, as group communication and membership services. Thus, the usage of the services provided by group communication systems (GCS) hides several communication details, simplifying the design of replication and recovery protocols. This Ph.D. thesis surveys the alternatives and strategies being used in the replication and recovery protocols for database replication systems. It also summarizes different concepts about group communication systems and virtual synchrony. As a result, the thesis provides a classification of database replication protocols according to their support to (and interaction with) recovery protocols, always assuming that both kinds of protocol rely on a GCS. Since current commercial DBMSs allow that programmers and database administrators sacrifice consistency with the aim of improving performance, it is important to select the appropriate level of consistency. Regarding (replicated) databases, consistency is strongly related to the isolation levels being assigned to transactions. One of the main proposals of this thesis is a recovery protocol for a replication protocol based on certification. Certification-based database replication protocols provide a good basis for the development of their recovery strategies when a snapshot isolation level is assumed. In that level readsets are not needed in the validation step. As a result, they do not need to be transmitted to other replicas. Additionally, these protocols hold a writeset list that is used in the certification/validation step. That list maintains the set of writesets needed by the recovery protocol. This thesis evaluates the performance of a recovery protocol based on the writeset list tranfer (basic protocol) and of an optimized version that compacts the information to be transferred. The second proposal applies the compaction principle to a recovery protocol designed for weak-voting replication protocols. Its aim is to minimize the time needed for transferring and applying the writesets lost by the recovering replica, obtaining in this way an efficient recovery. The performance of this recovery algorithm has been checked implementing a simulator. To this end, the Omnet++ simulating framework has been used. The simulation results confirm that this recovery protocol provides good results in multiple scenarios. Finally, the correction of both recovery protocols is also justified and presented in Chapter 5.García Muñoz, LH. (2013). Optimizing recovery protocols for replicated database systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/31632TESI

    An Indulgent Uniform Total Order Algorithm with Optimistic Delivery

    Get PDF
    A total order algorithm is a fundamental building block in the construction of distributed fault-tolerant applications. Unfortunately, the implementation of such a primitive can be expensive both in terms of communication steps and of number of messages exchanged. This problem is exacerbated in large-scale systems, where the performance of the algorithm may be limited by the presence of high-latency links. Typically, the most efficient total order algorithms do not provide uniform delivery and assume the availability of a perfect failure detector. Such algorithms may provide inconsistent results if the system assumptions do not hold. On the other hand, algorithms that assume an unreliable failure detector always provide consistent results but exhibit higher costs. This paper presents a new algorithm that combines the advantages of both approaches. On good periods, when the system is stable and processes are not suspected, the algorithm operates as if a perfect failure detector is assumed. Yet, the algorithm is indulgent, since it never violates consistency, even in runs where processes are suspecte

    FTRMI: plataforma transparente tolerante a faltas para invocações remotas

    Get PDF
    Tese de mestrado em Engenharia Informática, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2012As Chamadas a Procedimentos Remotos (RPC) têm como objectivo facilitar a comunicação entre processos, mascarando-a com uma sintaxe próxima da invocação a procedimentos mas ocultando os detalhes de comunicação. Contudo, devido à evolução dos paradigmas de programação, foi necessário encontrar uma solução para a programação Orientada a Objectos (OO). As Plataformas de Objectos Distribuídos (DOF) disponibilizam as mesmas características de um RPC adaptando a tecnologia a este paradigma. A Invocação Remota de Métodos (RMI) cumpre os objectivos de um DOF. Ainda assim, a especificação desta tecnologia para Java (JRMI) é totalmente dependente do modelo cliente/servidor criando um ponto de falha no lado do servidor. As aplicações distribuídas devem apresentar uma qualidade de serviço, nomeadamente, tolerância a faltas e escalabilidade que satisfaça os utilizadores. Uma possibilidade para os sistemas computacionais cumprirem estes requisitos é a distribuição do serviço por vários servidores distintos, incentivando a tolerância a faltas e distribuição de carga. Contudo, uma sistema distribuído é mais complexo que um sistema centralizado, devido à maior diversidade de problemas a resolver. Uma abordagem clássica de tolerância a faltas é a replicação activa, onde todas as réplicas mantêm um estado coerente por executarem apenas operações deterministas e sempre pela mesma ordem. Recorrendo ao conceito de máquina de estados distribuída que concretiza a replicação activa, é possível a criação de aplicações tolerantes a faltas de forma transparente para os servidores. Este trabalho apresenta a plataforma Fault-Tolerante Remote Method Invocation (FTRMI), que proporciona ao JRMI a capacidade de replicação activa de objectos remotos. A plataforma é disponibilizada sob a forma de camada de código intermédio tornando-se totalmente transparente para o cliente e não sendo necessário qualquer alteração de código no lado servidor. O grande objectivo desta plataforma é manter transparência total para aplicações existentes implementadas em JRMI. O FTRMI foi comparado com uma solução não transparente, mas que fornece uma qualidade de serviço semelhante.In computer science, a Remote Procedure Call (RPC) is a communication mechanism that aims to hide the communication details from the programmer. Due to the evolution of programming paradigms, it was deemed necessary to apply this technique to Object Oriented (OO) programming. The solution was found on Distributed Object Frameworks (DOF), which offer the same benefits of the RPC technology, but for this paradigm. Java Remote Method Invocation (JRMI) meets the requirements of a DOF, but, by using a client/server communication model, suffers from a potential single point of failure on the server side. The distributed applications require higher quality of service and, in particular, fault tolerance and scalability. Computational systems are able to fulfil these requirements by employing multiple machines, encouraging fault tolerance and load distribution. A distributed system is, by necessity, more complex than a centralized one, to deal with problems such as heterogeneity and synchronism. However, the Group Communication Systems (GCS), whose main objective is to provide a simple interface that implements several concepts, can facilitate the implementation of replication and fault tolerance. A classical approach to achieve fault tolerance is active replication, wherein all replicas maintain a consistent state by executing deterministic operations in the same order. Using the concept of distributed state machine that implements active replication, it is possible to create fault tolerant applications transparently to the servers. This work presents Fault-Tolerant Remote Method Invocation (FTRMI) framework, that enables JRMI to support active replication on the remote objects. The platform is available in the form of middleware that becomes totally transparent to the client and doesn’t require any code changes on the server. The main objective of this platform is to maintain total transparency to existing applications already using JRMI. The FTRMI was compared to a different solution that provides a similar quality of service, but isn’t transparent

    Crash recovery with partial amnesia failure model issues

    Full text link
    Replicated systems are a kind of distributed systems whose main goal is to ensure that computer systems are highly available, fault tolerant and provide high performance. One of the last trends in replication techniques managed by replication protocols, make use of Group Communication Sys- tem, and more specifically of the communication primitive atomic broadcast for developing more eficient replication protocols. An important aspect in these systems consists in how they manage the disconnection of nodes {which degrades their service{ and the connec- tion/reconnection of nodes for maintaining their original support. This task is delegated in replicated systems to recovery protocols. How it works de- pends specially on the failure model adopted. A model commonly used for systems managing large state is the crash-recovery with partial amnesia be- cause it implies short recovery periods. But, assuming it implies arising several problems. Most of them have been already solved in the literature: view management, abort of local transactions started in crashed nodes { when referring to transactional environments{ or for example the reinclu- sion of new nodes to the replicated system. Anyway, there is one problem related to the assumption of this second failure model that has not been completely considered: the amnesia phenomenon. Phenomenon that can lead to inconsistencies if it is not correctly managed. This work presents this inconsistency problem due to the amnesia and formalizes it, de ning the properties that must be ful lled for avoiding it and de ning possible solutions. Besides, it also presents and formalizes an inconsistency problem {due to the amnesia{ which appears under a speci c sequence of events allowed by the majority partition progress condition that will imply to stop the system, proposing the properties for overcoming it and proposing di erent solutions. As a consequence it proposes a new majority partition progress condition. In the sequel there is deDe Juan Marín, R. (2008). Crash recovery with partial amnesia failure model issues [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/3302Palanci

    Flexible management of bandwidth and redundancy in fieldbuses

    Get PDF
    Doutoramento em Engenharia ElectrotécnicaOs sistemas distribuídos embarcados (Distributed Embedded Systems – DES) têm sido usados ao longo dos últimos anos em muitos domínios de aplicação, da robótica, ao controlo de processos industriais passando pela aviónica e pelas aplicações veiculares, esperando-se que esta tendência continue nos próximos anos. A confiança no funcionamento é uma propriedade importante nestes domínios de aplicação, visto que os serviços têm de ser executados em tempo útil e de forma previsível, caso contrário, podem ocorrer danos económicos ou a vida de seres humanos poderá ser posta em causa. Na fase de projecto destes sistemas é impossível prever todos os cenários de falhas devido ao não determinismo do ambiente envolvente, sendo necessária a inclusão de mecanismos de tolerância a falhas. Adicionalmente, algumas destas aplicações requerem muita largura de banda, que também poderá ser usada para a evolução dos sistemas, adicionandolhes novas funcionalidades. A flexibilidade de um sistema é uma propriedade importante, pois permite a sua adaptação às condições e requisitos envolventes, contribuindo também para a simplicidade de manutenção e reparação. Adicionalmente, nos sistemas embarcados, a flexibilidade também é importante por potenciar uma melhor utilização dos, muitas vezes escassos, recursos existentes. Uma forma evidente de aumentar a largura de banda e a tolerância a falhas dos sistemas embarcados distribuídos é a replicação dos barramentos do sistema. Algumas soluções existentes, quer comerciais quer académicas, propõem a replicação dos barramentos para aumento da largura de banda ou para aumento da tolerância a falhas. No entanto e quase invariavelmente, o propósito é apenas um, sendo raras as soluções que disponibilizam uma maior largura de banda e um aumento da tolerância a falhas. Um destes raros exemplos é o FlexRay, com a limitação de apenas ser permitido o uso de dois barramentos. Esta tese apresentada e discute uma proposta para usar a replicação de barramentos de uma forma flexível com o objectivo duplo de aumentar a largura de banda e a tolerância a falhas. A flexibilidade dos protocolos propostos também permite a gestão dinâmica da topologia da rede, sendo o número de barramentos apenas limitado pelo hardware/software. As propostas desta tese foram validadas recorrendo ao barramento de campo CAN – Controller Area Network, escolhido devido à sua grande implantação no mercado. Mais especificamente, as soluções propostas foram implementadas e validadas usando um paradigma que combina flexibilidade com comunicações event-triggered e time-triggered: o FTT – Flexible Time- Triggered. No entanto, uma generalização para CAN nativo é também apresentada e discutida. A inclusão de mecanismos de replicação do barramento impõe a alteração dos antigos protocolos de replicação e substituição do nó mestre, bem como a definição de novos protocolos para esta finalidade. Este trabalho tira partido da arquitectura centralizada e da replicação do nó mestre para suportar de forma eficiente e flexível a replicação de barramentos. Em caso de ocorrência de uma falta num barramento (ou barramentos) que poderia provocar uma falha no sistema, os protocolos e componentes propostos nesta tese fazem com que o sistema reaja, mudando para um modo de funcionamento degradado. As mensagens que estavam a ser transmitidas nos barramentos onde ocorreu a falta são reencaminhadas para os outros barramentos. A replicação do nó mestre baseia-se numa estratégia líder-seguidores (leaderfollowers), onde o líder (leader) controla todo o sistema enquanto os seguidores (followers) servem como nós de reserva. Se um erro ocorrer no nó líder, um dos nós seguidores passará a controlar o sistema de uma forma transparente e mantendo as mesmas funcionalidades. As propostas desta tese foram também generalizadas para CAN nativo, tendo sido para tal propostos dois componentes adicionais. É, desta forma possível ter as mesmas capacidades de tolerância a falhas ao nível dos barramentos juntamente com a gestão dinâmica da topologia de rede. Todas as propostas desta tese foram implementadas e avaliadas. Uma implementação inicial, apenas com um barramento foi avaliada recorrendo a uma aplicação real, uma equipa de futebol robótico onde o protocolo FTT-CAN foi usado no controlo de movimento e da odometria. A avaliação do sistema com múltiplos barramentos foi feita numa plataforma de teste em laboratório. Para tal foi desenvolvido um sistema de injecção de faltas que permite impor faltas nos barramentos e nos nós mestre, e um sistema de medida de atrasos destinado a medir o tempo de resposta após a ocorrência de uma falta.Distributed embedded systems (DES) have been widely used in the last few decades in several application domains, from robotics, industrial process control, avionics and automotive. In fact, it is expectable that this trend will continue in the next years. In some of these application fields the dependability requirements are very important since the fail to provide services in a timely and predictable manner may cause important economic losses or even put humans in risk. In the design phase it is impossible to predict all the possible scenarios of faults, due to the non deterministic behaviour of the surrounding environment. In that way, the fault tolerance mechanisms must be included in the distributed embedded system to prevent failures occurrence. Also, many application domains require a high available bandwidth to perform the desired functions, or to turn possible the scaling with the addition of new features. The flexibility of a system also plays an important role, since it improves the capability to adapt to the surrounding world, and to the simplicity of the repair and maintenance. The flexibility improves the efficiency of all the system by providing a way to efficiently manage the available resources. This is very important in embedded systems due to the limited resources often available. A natural way to improve the bandwidth and the fault tolerance in distributed systems is to use replicated buses. Commercial and academic solutions propose the use of replicated fieldbuses for a single purpose only, either to improve the fault tolerance or to improve the available bandwidth, being the first the most common. One illustrative exception is FlexRay where the bus replica can be used to improve the bandwidth of the overall system, besides enabling redundant communications. However, only one bus replica can be used. In this thesis, a flexible bus replication scheme to improve both the dependability and the throughput of fieldbuses is presented and studied. It can be applied to any number of replicated buses, provided the required hardware support is available. The flexible use of the replicated buses can achieve an also flexible management of the network topology. This claim has been validated using the Controller Area Network (CAN) fieldbus, which has been chosen because it is widely spread in millions of systems. In fact, the proposed solution uses a paradigm that combines flexibility, time and event triggered communication, that is the Flexible Time- Triggered over CAN network (FTT-CAN). However, a generalization to native CAN is also presented and studied. The inclusion of bus replication in FTT-CAN imposes not only new mechanisms but also changes of the mechanisms associated with the master replication, which has been already studied in previous research work. In this work, these mechanisms were combined and take advantage of the centralized architecture and of the redundant masters to support an efficient and flexible bus replication. When considering the system operation, if a fault in the bus (or buses) occurs, and the consequent error leads to a system failure, the system reacts, switching to a degraded mode, where the message flows that were transmitted in the faulty bus (or buses) change to the non-faulty ones. The central node replication uses a leader-follower strategy, where the leader controls the system while the followers serve as backups. If an error occurs in the leader, a backup will take the system control maintaining the system with the same functionalities. The system has been generalized for native CAN, using two additional components that provide the same fault tolerance capabilities at the bus level, and also enable the dynamic management of the network topology. All the referred proposals were implemented and assessed in the scope of this work. The single bus version of FTT-CAN was assessed using a real application, a robotic soccer team, which has obtained excellent results in international competitions. There, the FTT-CAN based embedded system has been applied in the low level control, where, mainly it is responsible for the motion control and odometry. For the case of the multiple buses system, the assessment was performed in a laboratory test bed. For this, a fault injector was developed in order to impose faults in the buses and in the central nodes. To measure the time reaction of the system, a special hardware has been developed: a delay measurement system. It is able to measure delays between two important time marks for posterior offline analysis of the obtained values

    Deferred-update database replication:theory and algorithms

    Get PDF
    This thesis is about the design of high-performance fault-tolerant computer systems. More specifically, it focuses on how to develop database systems that behave correctly and with good performance even in the event of failures. Both performance and dependability can be improved by means of the same technique, namely replication. If several database replicas are available, performance can be improved by distributing the load among them. Moreover, if one of the replicas cannot be accessed due to failures, users can still rely on the other ones. However, providing the interface of a single database system out of several replicas is not an easy task since one has to ensure they are always consistent with each other. Allowing replicas to diverge would easily break the illusion of having a single high-performance fault-tolerant database system. Although we would like to have replicas as independent of each other as possible for performance and dependability reasons, we must keep them synchronized if we want to provide a consistent interface to users. In this work, we study how we can balance this trade-off to provide good performance and fault-tolerance without compromising consistency. Our basis is a widely used technique for database replication known as the deferred update technique. In this technique, transactions are initially executed in a single replica. Passive transactions, which do not change the state of the database, can commit locally to the replica they execute. Active transactions, which change the database state, must be synchronized with the transactions running on other replicas. This thesis makes four major contributions. First, we introduce an abstract specification that generalizes the deferred update technique. This specification provides a strong model to prove lower bounds on replication algorithms, design new correct-by-construction protocols tailor-made for specific settings, and prove existing protocols correct more easily, in a standard way. Using this model, we show that the problem of termination of active transactions in deferred-update protocols is highly related to the problem of sequence agreement among a set of processes. In this context, we study the problem of implementing latency-optimal fault-tolerant solutions to sequence agreement and present a novel, highly-dynamic, algorithm that can quickly adapt to system changes in order to preserve its optimal latency. Our algorithm is based on a new agreement problem we introduce that seems to be more suitable to solve problems like sequence agreement than previously used abstractions. Our last two contributions are in the context of specific deferred-update algorithms, where we present two new fault-tolerant protocols derived from our general abstraction. The first algorithm uses no extra assumptions about database replicas. Yet, it has very little overhead associated with the termination of active transactions, propagating only strictly necessary information to replicas. Our second protocol uses strong assumptions about the concurrency control mechanism used by database replicas to reduce even more the latency and the burden associated with transaction termination. These algorithms are good examples of how our general abstraction can be extended to create new protocols and prove them correct

    The globdata fault-tolerant replicated distributed object database

    No full text
    GlobData is a project that aims to design and implement a middleware tool offering the abstraction of a global object database repository. This tool, called Copla, supports transactional access to geographically distributed persistent objects independent of their location. Additionally, it supports replication of data according to different consistency criteria. For this purpose, Copla implements a number of consistency protocols offering different tradeoffs between performance and fault-tolerance. This paper presents the work on strong consistency protocols for the Glob-Data system. Two protocols are presented: a voting protocol and a nonvoting protocol. Both these protocols rely on the use of atomic broadcast as a building block to serialize conflicting transactions. The paper also introduces the total order protocol being developed to support large-scale replication.
    corecore