1,535 research outputs found
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
Parallel Deferred Update Replication
Deferred update replication (DUR) is an established approach to implementing
highly efficient and available storage. While the throughput of read-only
transactions scales linearly with the number of deployed replicas in DUR, the
throughput of update transactions experiences limited improvements as replicas
are added. This paper presents Parallel Deferred Update Replication (P-DUR), a
variation of classical DUR that scales both read-only and update transactions
with the number of cores available in a replica. In addition to introducing the
new approach, we describe its full implementation and compare its performance
to classical DUR and to Berkeley DB, a well-known standalone database
Efficient middleware for database replication
Dissertação de Mestrado em Engenharia InformáticaDatabase systems are used to store data on the most varied applications, like Web
applications, enterprise applications, scientific research, or even personal applications.
Given the large use of database in fundamental systems for the users, it is necessary that database systems are efficient e reliable. Additionally, in order for these systems to serve a large number of users, databases must be scalable, to be able to process large numbers of transactions. To achieve this, it is necessary to resort to data replication. In a
replicated system, all nodes contain a copy of the database. Then, to guarantee that
replicas converge, write operations must be executed on all replicas. The way updates
are propagated leads to two different replication strategies. The first is known as
asynchronous or optimistic replication, and the updates are propagated asynchronously
after the conclusion of an update transaction. The second is known as synchronous or pessimistic replication, where the updates are broadcasted synchronously during the transaction.
In pessimistic replication, contrary to the optimistic replication, the replicas remain
consistent. This approach simplifies the programming of the applications, since the
replication of the data is transparent to the applications. However, this approach
presents scalability issues, caused by the number of exchanged messages during
synchronization, which forces a delay to the termination of the transaction. This leads
the user to experience a much higher latency in the pessimistic approach.
On this work is presented the design and implementation of a database replication
system, with snapshot isolation semantics, using a synchronous replication approach.
The system is composed by a primary replica and a set of secondary replicas that fully
replicate the database- The primary replica executes the read-write transactions, while
the remaining replicas execute the read-only transactions. After the conclusion of a read-write transaction on the primary replica the updates are propagated to the
remaining replicas. This approach is proper to a model where the fraction of read
operations is considerably higher than the write operations, allowing the reads load to be
distributed over the multiple replicas.
To improve the performance of the system, the clients execute some operations
speculatively, in order to avoid waiting during the execution of a database operation.
Thus, the client may continue its execution while the operation is executed on the
database. If the result replied to the client if found to be incorrect, the transaction will be aborted, ensuring the correctness of the execution of the transactions
Multi-Master Replication for Snapshot Isolation Databases
Lazy replication with snapshot isolation (SI) has emerged as a popular choice for distributed databases. However, lazy replication requires the execution of update transactions at one (master) site so that it is relatively easy for a total SI order to be determined for consistent installation of updates in the lazily replicated system. We propose a set of techniques that support update transaction execution over multiple partitioned sites, thereby allowing the master to scale. Our techniques determine a total SI order for update transactions over multiple master sites without requiring global coordination in the distributed system, and ensure that updates are installed in this order at all sites to provide consistent and scalable replication with SI. We have built our techniques into PostgreSQL and demonstrate their effectiveness through experimental evaluation.1 yea
A formal characterization of SI-based ROWA replication protocols
Snapshot isolation (SI) is commonly used in some commercial DBMSs with a multiversion
concurrency control mechanism since it never blocks read-only transactions. Recent database
replication protocols have been designed using SI replicas where transactions are firstly
executed in a delegate replica and their updates (if any) are propagated to the rest of the
replicas at commit time; i.e. they follow the Read One Write All (ROWA) approach. This paper
provides a formalization that shows the correctness of abstract protocols which cover these
replication proposals. These abstract protocols differ in the properties demanded for achieving
a global SI level and those needed for its generalized SI (GSI) variant ¿ allowing reads from old
snapshots. Additionally, we propose two more relaxed properties that also ensure a global GSI
level. Thus, some applications can further optimize their performance in a replicated system
while obtaining GSI.
© 2010 Elsevier B.V. All rights reserved.The authors wish to thank the reviewers for their valuable comments that helped us to greatly improve the quality and readability of this paper. This work has been supported by the Spanish Government under research grant TIN2009-14460-C03. Besides, the authors wish to thank the reviewers for their valuable comments that helped us to greatly improve the quality and readability of this paper.Armendáriz-Iñigo, J.; Juárez-Rodríguez, J.; González De Mendívil, J.; Garitagoitia, J.; Irún Briz, L.; Muñoz Escoí, FD. (2011). A formal characterization of SI-based ROWA replication protocols. Data and Knowledge Engineering. 70(1):21-34. doi:10.1016/j.datak.2010.07.012S213470
Partial replication with strong consistency
In response to the increasing expectations of their clients, cloud services exploit
geo-replication to provide fault-tolerance, availability and low latency when executing
requests. However, cloud platforms tend to adopt weak consistency semantics, in which
replicas may diverge in state independently. These systems offer good response times
but at the disadvantage of allowing potential data inconsistencies that may affect user
experience.
Some systems propose to adopt solutions with strong consistency, which are not as
efficient but simplify the development of correct applications by guaranteeing that all
replicas in the system maintain the same database state. Therefore, it is interesting to explore
a system that can offer strong consistency while minimizing its main disadvantage:
the impact in performance that results from coordinating every replica in the system. A
possible solution to reduce the cost of replica coordination is to support partial replication.
Partially replicating a database allows for each server to only be responsible for a
subset of the data - a partition - which means that when updating the database only some
of replicas have to be synchronized, improving response times.
In this dissertation, we propose an algorithm that implements a distributed replicated
database that offers strong consistency with support for partial replication. To achieve
strong consistency in a partially replicated scenario, our algorithm is in part based on the
Clock-SI[10] research, which presents an algorithm that implements a multi-versioned
database for strong consistency (snapshot-isolation) and performs the Two-Phase Commit
protocol when coordinating replicas during updates. The algorithm is supported by
an architecture that simplifies distributing partitions among datacenters and efficiently
propagating operations across nodes in the same partition, thanks to the ChainPaxos[27]
algorithm.Como forma de responder às expectativas cada vez maiores dos seus clientes, as
operadoras cloud tiram partido da geo-replicação para oferecer tolerância a falhas, disponibilidade
e baixa latência dos seus sistemas na resposta aos pedidos. No entanto, as
plataformas cloud tendem a adotar uma semântica de consistência fraca, na qual as réplicas
podem variar em estado de forma independente. Estes sistemas oferecem bons tempos
de resposta mas com a desvantagem de que têm de lidar com potenciais inconsistências
nos dados que podem ter impacto na experiência dos utilizadores.
Alguns sistemas propõem adotar soluções com consistência forte, as quais não são
tão eficientes mas simplificam o desenvolvimento de aplicações ao garantir que todas
as réplicas do sistema mantêm o mesmo estado da base de dados. É então interessante
explorar um sistema que garanta replicação forte mas que minimize a sua principal
desvantagem: o impacto de performance no momento de coordenar o estado das réplicas
nos sistema. Uma possível solução para reduzir o custo de coordenação das réplicas
durante transações é o suporte à replicação parcial. Replicar parcialmente uma base de
dados permite que cada servidor seja apenas responsável por uma parte dos dados - uma
partição - o que significa que quando são realizadas escritas apenas algumas das réplicas
têm de ser sincronizadas, melhorando os tempos de resposta.
Neste trabalho propomos um algoritmo que implementa um sistema de armazenamento
distríbuido replicado que oferece consistência forte com suporte a replicação parcial.
A fim de garantir consistência forte num cenário de replicação parcial, o nosso
algoritmo é em parte baseado no algoritmo Clock-SI[10], que implementa uma base de
dados parcial com multi-versões para garantir consistência forte (snapshot-isolation) e
que realiza o protocolo Two-Phase Commit para coordenar as réplicas no momento de
aplicar escritas. O algoritmo é suportado por uma arquitectura que torna simples distribuir
partições por vários centros de dados e propagar de forma eficiente operações entre
todos os nós numa mesma partição, através do algoritmo ChainPaxos[27]
Distributed replicated macro-components
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaIn recent years, several approaches have been proposed for improving application
performance on multi-core machines. However, exploring the power of multi-core processors
remains complex for most programmers. A Macro-component is an abstraction
that tries to tackle this problem by allowing to explore the power of multi-core machines
without requiring changes in the programs. A Macro-component encapsulates several
diverse implementations of the same specification. This allows to take the best performance
of all operations and/or distribute load among replicas, while keeping contention
and synchronization overhead to the minimum.
In real-world applications, relying on only one server to provide a service leads to
limited fault-tolerance and scalability. To address this problem, it is common to replicate
services in multiple machines. This work addresses the problem os supporting such
replication solution, while exploring the power of multi-core machines.
To this end, we propose to support the replication of Macro-components in a cluster of
machines. In this dissertation we present the design of a middleware solution for achieving
such goal. Using the implemented replication middleware we have successfully deployed
a replicated Macro-component of in-memory databases which are known to have scalability
problems in multi-core machines. The proposed solution combines multi-master
replication across nodes with primary-secondary replication within a node, where several
instances of the database are running on a single machine. This approach deals with
the lack of scalability of databases on multi-core systems while minimizing communication
costs that ultimately results in an overall improvement of the services. Results show
that the proposed solution is able to scale as the number of nodes and clients increases.
It also shows that the solution is able to take advantage of multi-core architectures.RepComp project (PTDC/EIAEIA/108963/2008
Optimizing recovery protocols for replicated database systems
En la actualidad, el uso de tecnologías de informacíon y sistemas de cómputo tienen una gran influencia en la vida diaria. Dentro de los sistemas informáticos actualmente en uso, son de gran relevancia los sistemas distribuidos por la capacidad que pueden tener para escalar, proporcionar soporte para la tolerancia a fallos y mejorar el desempeño de aplicaciones y proporcionar alta disponibilidad.
Los sistemas replicados son un caso especial de los sistemas distribuidos. Esta tesis está centrada en el área de las bases de datos replicadas debido al uso extendido que en el presente se hace de ellas, requiriendo características como: bajos tiempos de respuesta, alto rendimiento en los procesos, balanceo de carga entre las replicas, consistencia e integridad de datos y tolerancia a fallos.
En este contexto, el desarrollo de aplicaciones utilizando bases de datos replicadas presenta dificultades que pueden verse atenuadas mediante el uso de servicios de soporte a mas bajo nivel tales como servicios de comunicacion y pertenencia. El uso de los servicios proporcionados por los sistemas de comunicación de grupos permiten ocultar los detalles de las comunicaciones y facilitan el diseño de protocolos de replicación y recuperación.
En esta tesis, se presenta un estudio de las alternativas y estrategias empleadas en los protocolos de replicación y recuperación en las bases de datos replicadas. También se revisan diferentes conceptos sobre los sistemas de comunicación de grupos y sincronia virtual. Se caracterizan y clasifican diferentes tipos de protocolos de replicación con respecto a la interacción o soporte que pudieran dar a la recuperación, sin embargo el enfoque se dirige a los protocolos basados en sistemas de comunicación de grupos.
Debido a que los sistemas comerciales actuales permiten a los programadores y administradores de sistemas de bases de datos renunciar en alguna medida a la consistencia con la finalidad de aumentar el rendimiento, es importante determinar el nivel de consistencia necesario. En el caso de las bases de datos replicadas la consistencia está muy relacionada con el nivel de aislamiento establecido entre las transacciones.
Una de las propuestas centrales de esta tesis es un protocolo de recuperación para un protocolo de replicación basado en certificación. Los protocolos de replicación de base de datos basados en certificación proveen buenas bases para el desarrollo de sus respectivos protocolos de recuperación cuando se utiliza el nivel de aislamiento snapshot. Para tal nivel de aislamiento no se requiere que los readsets sean transferidos entre las réplicas ni revisados en la fase de cetificación y ya que estos protocolos mantienen un histórico de la lista de writesets que es utilizada para certificar las transacciones, este histórico provee la información necesaria para transferir el estado perdido por la réplica en recuperación. Se hace un estudio del rendimiento del protocolo de recuperación básico y de la versión optimizada en la que se compacta la información a transferir. Se presentan los resultados obtenidos en las pruebas de la implementación del protocolo de recuperación en el middleware de soporte.
La segunda propuesta esta basada en aplicar el principio de compactación de la informacion de recuperación en un protocolo de recuperación para los protocolos de replicación basados en votación débil. El objetivo es minimizar el tiempo necesario para transfeir y aplicar la información perdida por la réplica en recuperación obteniendo con esto un protocolo de recuperación mas eficiente. Se ha verificado el buen desempeño de este algoritmo a través de una simulación. Para efectuar la simulación se ha hecho uso del entorno de simulación Omnet++. En los resultados de los experimentos puede apreciarse que este protocolo de recuperación tiene buenos resultados en múltiples escenarios.
Finalmente, se presenta la verificación de la corrección de ambos algoritmos de recuperación en el Capítulo 5.Nowadays, information technology and computing systems have a great relevance
on our lives. Among current computer systems, distributed systems are
one of the most important because of their scalability, fault tolerance, performance
improvements and high availability.
Replicated systems are a specific case of distributed system. This Ph.D. thesis is
centered in the replicated database field due to their extended usage, requiring
among other properties: low response times, high throughput, load balancing
among replicas, data consistency, data integrity and fault tolerance.
In this scope, the development of applications that use replicated databases
raises some problems that can be reduced using other fault-tolerant building
blocks, as group communication and membership services. Thus, the usage
of the services provided by group communication systems (GCS) hides several
communication details, simplifying the design of replication and recovery protocols.
This Ph.D. thesis surveys the alternatives and strategies being used in the replication
and recovery protocols for database replication systems. It also summarizes
different concepts about group communication systems and virtual synchrony.
As a result, the thesis provides a classification of database replication
protocols according to their support to (and interaction with) recovery protocols,
always assuming that both kinds of protocol rely on a GCS.
Since current commercial DBMSs allow that programmers and database administrators
sacrifice consistency with the aim of improving performance, it is
important to select the appropriate level of consistency. Regarding (replicated)
databases, consistency is strongly related to the isolation levels being assigned
to transactions.
One of the main proposals of this thesis is a recovery protocol for a replication
protocol based on certification. Certification-based database replication protocols
provide a good basis for the development of their recovery strategies when
a snapshot isolation level is assumed. In that level readsets are not needed in
the validation step. As a result, they do not need to be transmitted to other
replicas. Additionally, these protocols hold a writeset list that is used in the
certification/validation step. That list maintains the set of writesets needed by the recovery protocol. This thesis evaluates the performance of a recovery
protocol based on the writeset list tranfer (basic protocol) and of an optimized
version that compacts the information to be transferred.
The second proposal applies the compaction principle to a recovery protocol
designed for weak-voting replication protocols. Its aim is to minimize the time
needed for transferring and applying the writesets lost by the recovering replica,
obtaining in this way an efficient recovery. The performance of this recovery
algorithm has been checked implementing a simulator. To this end, the Omnet++
simulating framework has been used. The simulation results confirm
that this recovery protocol provides good results in multiple scenarios.
Finally, the correction of both recovery protocols is also justified and presented
in Chapter 5.García Muñoz, LH. (2013). Optimizing recovery protocols for replicated database systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/31632TESI
- …