550 research outputs found

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    Parallel Deferred Update Replication

    Full text link
    Deferred update replication (DUR) is an established approach to implementing highly efficient and available storage. While the throughput of read-only transactions scales linearly with the number of deployed replicas in DUR, the throughput of update transactions experiences limited improvements as replicas are added. This paper presents Parallel Deferred Update Replication (P-DUR), a variation of classical DUR that scales both read-only and update transactions with the number of cores available in a replica. In addition to introducing the new approach, we describe its full implementation and compare its performance to classical DUR and to Berkeley DB, a well-known standalone database

    Scalable transactions in the cloud: partitioning revisited

    Get PDF
    Lecture Notes in Computer Science, 6427Cloud computing is becoming one of the most used paradigms to deploy highly available and scalable systems. These systems usually demand the management of huge amounts of data, which cannot be solved with traditional nor replicated database systems as we know them. Recent solutions store data in special key-value structures, in an approach that commonly lacks the consistency provided by transactional guarantees, as it is traded for high scalability and availability. In order to ensure consistent access to the information, the use of transactions is required. However, it is well-known that traditional replication protocols do not scale well for a cloud environment. Here we take a look at current proposals to deploy transactional systems in the cloud and we propose a new system aiming at being a step forward in achieving this goal. We proceed to focus on data partitioning and describe the key role it plays in achieving high scalability.This work has been partially supported by the Spanish Government under grant TIN2009-14460-C03-02 and by the Spanish MEC under grant BES-2007-17362 and by project ReD Resilient Database Clusters (PDTC/EIA-EIA/109044/2008)

    Partial replication with strong consistency

    Get PDF
    In response to the increasing expectations of their clients, cloud services exploit geo-replication to provide fault-tolerance, availability and low latency when executing requests. However, cloud platforms tend to adopt weak consistency semantics, in which replicas may diverge in state independently. These systems offer good response times but at the disadvantage of allowing potential data inconsistencies that may affect user experience. Some systems propose to adopt solutions with strong consistency, which are not as efficient but simplify the development of correct applications by guaranteeing that all replicas in the system maintain the same database state. Therefore, it is interesting to explore a system that can offer strong consistency while minimizing its main disadvantage: the impact in performance that results from coordinating every replica in the system. A possible solution to reduce the cost of replica coordination is to support partial replication. Partially replicating a database allows for each server to only be responsible for a subset of the data - a partition - which means that when updating the database only some of replicas have to be synchronized, improving response times. In this dissertation, we propose an algorithm that implements a distributed replicated database that offers strong consistency with support for partial replication. To achieve strong consistency in a partially replicated scenario, our algorithm is in part based on the Clock-SI[10] research, which presents an algorithm that implements a multi-versioned database for strong consistency (snapshot-isolation) and performs the Two-Phase Commit protocol when coordinating replicas during updates. The algorithm is supported by an architecture that simplifies distributing partitions among datacenters and efficiently propagating operations across nodes in the same partition, thanks to the ChainPaxos[27] algorithm.Como forma de responder às expectativas cada vez maiores dos seus clientes, as operadoras cloud tiram partido da geo-replicação para oferecer tolerância a falhas, disponibilidade e baixa latência dos seus sistemas na resposta aos pedidos. No entanto, as plataformas cloud tendem a adotar uma semântica de consistência fraca, na qual as réplicas podem variar em estado de forma independente. Estes sistemas oferecem bons tempos de resposta mas com a desvantagem de que têm de lidar com potenciais inconsistências nos dados que podem ter impacto na experiência dos utilizadores. Alguns sistemas propõem adotar soluções com consistência forte, as quais não são tão eficientes mas simplificam o desenvolvimento de aplicações ao garantir que todas as réplicas do sistema mantêm o mesmo estado da base de dados. É então interessante explorar um sistema que garanta replicação forte mas que minimize a sua principal desvantagem: o impacto de performance no momento de coordenar o estado das réplicas nos sistema. Uma possível solução para reduzir o custo de coordenação das réplicas durante transações é o suporte à replicação parcial. Replicar parcialmente uma base de dados permite que cada servidor seja apenas responsável por uma parte dos dados - uma partição - o que significa que quando são realizadas escritas apenas algumas das réplicas têm de ser sincronizadas, melhorando os tempos de resposta. Neste trabalho propomos um algoritmo que implementa um sistema de armazenamento distríbuido replicado que oferece consistência forte com suporte a replicação parcial. A fim de garantir consistência forte num cenário de replicação parcial, o nosso algoritmo é em parte baseado no algoritmo Clock-SI[10], que implementa uma base de dados parcial com multi-versões para garantir consistência forte (snapshot-isolation) e que realiza o protocolo Two-Phase Commit para coordenar as réplicas no momento de aplicar escritas. O algoritmo é suportado por uma arquitectura que torna simples distribuir partições por vários centros de dados e propagar de forma eficiente operações entre todos os nós numa mesma partição, através do algoritmo ChainPaxos[27]

    A formal characterization of SI-based ROWA replication protocols

    Full text link
    Snapshot isolation (SI) is commonly used in some commercial DBMSs with a multiversion concurrency control mechanism since it never blocks read-only transactions. Recent database replication protocols have been designed using SI replicas where transactions are firstly executed in a delegate replica and their updates (if any) are propagated to the rest of the replicas at commit time; i.e. they follow the Read One Write All (ROWA) approach. This paper provides a formalization that shows the correctness of abstract protocols which cover these replication proposals. These abstract protocols differ in the properties demanded for achieving a global SI level and those needed for its generalized SI (GSI) variant ¿ allowing reads from old snapshots. Additionally, we propose two more relaxed properties that also ensure a global GSI level. Thus, some applications can further optimize their performance in a replicated system while obtaining GSI. © 2010 Elsevier B.V. All rights reserved.The authors wish to thank the reviewers for their valuable comments that helped us to greatly improve the quality and readability of this paper. This work has been supported by the Spanish Government under research grant TIN2009-14460-C03. Besides, the authors wish to thank the reviewers for their valuable comments that helped us to greatly improve the quality and readability of this paper.Armendáriz-Iñigo, J.; Juárez-Rodríguez, J.; González De Mendívil, J.; Garitagoitia, J.; Irún Briz, L.; Muñoz Escoí, FD. (2011). A formal characterization of SI-based ROWA replication protocols. Data and Knowledge Engineering. 70(1):21-34. doi:10.1016/j.datak.2010.07.012S213470

    On the Scalability of Snapshot Isolation

    Get PDF
    International audienceMany distributed applications require transactions. However, transactional protocols that require strong synchronization are costly in large scale environments. Two properties help with scalability of a transactional system: genuine partial replication (GPR), which leverages the intrinsic parallelism of a workload, and snapshot isolation (SI), which decreases the need for synchronization. We show that under standard assumptions (data store accesses are not known in advance, and transactions may access arbitrary objects in the data store), it is impossible to have both SI and GPR. Our impossibility result is based on a novel decomposition of SI which proves that, like serializability, SI is expressible on plain histories

    High performance deferred update replication

    Get PDF
    Replication is a well-known approach to implementing storage systems that can tolerate failures. Replicated storage systems are designed such that the state of the system is kept at several replicas. A replication protocol ensures that the failure of a replica is masked by the rest of the system, in a way that is transparent to its users. Replicated storage systems are among the most important building blocks in the design of large scale applications. Applications at scale are often deployed on top of commodity hardware, store a vast amount of data, and serve a large number of users. The larger the system, the higher its vulnerability to failures. The ability to tolerate failures is not the only desirable feature in a replicated system. Storage systems need to be efficient in order to accommodate requests from a large user base while achieving low response times. In that respect, replication can leverage multiple replicas to parallelize the execution of user requests. This thesis focuses on Deferred Update Replication (DUR), a well-established database replication approach. It provides high availability in that every replica can execute client transactions. In terms of performance, it is better than other replication techniques in that only one replica executes a given transaction while the other replicas only apply state changes. However, DUR suffers from the following drawback: each replica stores a full copy of the database, which has consequences in terms of performance. The first consequence is that DUR cannot take advantage of the aggregated memory available to the replicas. Our first contribution is a distributed caching mechanism that addresses the problem. It makes efficient use of the main memory of an entire cluster of machines, while guaranteeing strong consistency. The second consequence is that DUR cannot scale with the number of replicas. The throughput of a fully replicated system is inherently limited by the number of transactions that a single replica can apply to its local storage. We propose a scalable version of the DUR approach where the system state is partitioned in smaller replica sets. Transactions that access disjoint partitions are parallelized. The last part of the thesis focuses on latency. We show that the scalable DUR-based approach may have detrimental effects on response time, especially when replicas are geographically distributed. The thesis considers different deployments and their implications on latency. We propose optimizations that provide substantial gains in geographically distributed environments

    Desarrollo de un sistema de replicación de bases de datos en entornos dinámicos: particionado y protocolos de replicación asociados

    Get PDF
    We took the Master thesis of I. Arrieta-Salinas and M. Louis Rodríguez as a starting point for this project. We are going to deploy a distributed database to be used in a cloud environment as a specific case of Platform-as-a-service. We assume that data is partitioned and several replicas store a copy of a given partition. The clients issue transactions by means of a standard library such as JDBC. To do so, they need information about the data placement that is managed by a Metadata Manager. The Metadata Manager manages the partitioning and the replica placement among all replicas building a replica cluster on each partition. The replication cluster has a few replicas running a replication protocol to provide strong consistency and the rest receive the propagation of updates in a lazy manner. These replicas are logically constituted as onion layers around the core replicas running a given replication protocol. The implementation of this system had several drawbacks that we try to fix in this work. First of all, clients an the MM need to be physically in the same machine which leads to a penalty performance in heavily loaded scenarios. The system was optimized for YCSB that consisted in transactions with a single operation and they are run over two replication protocols: primary copy and active replication that are known to perform badly update intensive scenarios. Moreover, there was no load balancing at all according to replica performance, it was merely a round-robin policy among all replicas at the core level. We try to argument the system limitations (described in more detail in Section 2.1) and to going into the system implementation. This is going to be explained in the rest of this work. The main goals of this project are focused in the different parts of the system. In regard to the Client Module, originally the client was the OLPT-Benchmark, a module that consist in send specific types of transactions to the system by a JDBC connection. In the actual version this module has been modified allowing to the transaction to have more than one operation and several parameters has been introduced to the transaction which allow to the system to treat them differently. Respecting to the Metadata Manager one of the main goals between the others developed in this project is the decentralization of the Client and Meta- data Manager modules physically. The rest of modifications are the creation of a structure that allow to the Metadata Manager to know the architecture of the Replicas Cluster and the development of a new ReplicaChooser function based on the CPU charge allowing a correct load balancing. And finally in the Replicas Cluster has been implemented new protocols that have permitted to run different replication protocols in different partitions simultaneously without the knowledge of the Client and the Metadata Manager.Ingeniería en InformáticaInformatika Ingeniaritz

    From cluster databases to cloud storage: Providing transactional support on the cloud

    Get PDF
    Durant les últimes tres dècades, les limitacions tecnològiques (com per exemple la capacitat dels dispositius d'emmagatzematge o l'ample de banda de les xarxes de comunicació) i les creixents demandes dels usuaris (estructures d'informació, volums de dades) han conduït l'evolució de les bases de dades distribuïdes. Des dels primers repositoris de dades per arxius plans que es van desenvolupar en la dècada dels vuitanta, s'han produït importants avenços en els algoritmes de control de concurrència, protocols de replicació i en la gestió de transaccions. No obstant això, els reptes moderns d'emmagatzematge de dades que plantegen el Big Data i el cloud computing—orientats a millorar la limitacions pel que fa a escalabilitat i elasticitat de les bases de dades estàtiques—estan empenyent als professionals a relaxar algunes propietats importants dels sistemes transaccionals clàssics, cosa que exclou a diverses aplicacions les quals no poden encaixar en aquesta estratègia degut a la seva alta dependència transaccional. El propòsit d'aquesta tesi és abordar dos reptes importants encara latents en el camp de les bases de dades distribuïdes: (1) les limitacions pel que fa a escalabilitat dels sistemes transaccionals i (2) el suport transaccional en repositoris d'emmagatzematge en el núvol. Analitzar les tècniques tradicionals de control de concurrència i de replicació, utilitzades per les bases de dades clàssiques per suportar transaccions, és fonamental per identificar les raons que fan que aquests sistemes degradin el seu rendiment quan el nombre de nodes i / o quantitat de dades creix. A més, aquest anàlisi està orientat a justificar el disseny dels repositoris en el núvol que deliberadament han deixat de banda el suport transaccional. Efectivament, apropar el paradigma de l'emmagatzematge en el núvol a les aplicacions que tenen una forta dependència en les transaccions és fonamental per a la seva adaptació als requeriments actuals pel que fa a volums de dades i models de negoci. Aquesta tesi comença amb la proposta d'un simulador de protocols per a bases de dades distribuïdes estàtiques, el qual serveix com a base per a la revisió i comparativa de rendiment dels protocols de control de concurrència i les tècniques de replicació existents. Pel que fa a la escalabilitat de les bases de dades i les transaccions, s'estudien els efectes que té executar diferents perfils de transacció sota diferents condicions. Aquesta anàlisi contínua amb una revisió dels repositoris d'emmagatzematge de dades en el núvol existents—que prometen encaixar en entorns dinàmics que requereixen alta escalabilitat i disponibilitat—, el qual permet avaluar els paràmetres i característiques que aquests sistemes han sacrificat per tal de complir les necessitats actuals pel que fa a emmagatzematge de dades a gran escala. Per explorar les possibilitats que ofereix el paradigma del cloud computing en un escenari real, es presenta el desenvolupament d'una arquitectura d'emmagatzematge de dades inspirada en el cloud computing la qual s’utilitza per emmagatzemar la informació generada en les Smart Grids. Concretament, es combinen les tècniques de replicació en bases de dades transaccionals i la propagació epidèmica amb els principis de disseny usats per construir els repositoris de dades en el núvol. Les lliçons recollides en l'estudi dels protocols de replicació i control de concurrència en el simulador de base de dades, juntament amb les experiències derivades del desenvolupament del repositori de dades per a les Smart Grids, desemboquen en el que hem batejat com Epidemia: una infraestructura d'emmagatzematge per Big Data concebuda per proporcionar suport transaccional en el núvol. A més d'heretar els beneficis dels repositoris en el núvol en quant a escalabilitat, Epidemia inclou una capa de gestió de transaccions que reenvia les transaccions dels clients a un conjunt jeràrquic de particions de dades, cosa que permet al sistema oferir diferents nivells de consistència i adaptar elàsticament la seva configuració a noves demandes de càrrega de treball. Finalment, els resultats experimentals posen de manifest la viabilitat de la nostra contribució i encoratgen als professionals a continuar treballant en aquesta àrea.Durante las últimas tres décadas, las limitaciones tecnológicas (por ejemplo la capacidad de los dispositivos de almacenamiento o el ancho de banda de las redes de comunicación) y las crecientes demandas de los usuarios (estructuras de información, volúmenes de datos) han conducido la evolución de las bases de datos distribuidas. Desde los primeros repositorios de datos para archivos planos que se desarrollaron en la década de los ochenta, se han producido importantes avances en los algoritmos de control de concurrencia, protocolos de replicación y en la gestión de transacciones. Sin embargo, los retos modernos de almacenamiento de datos que plantean el Big Data y el cloud computing—orientados a mejorar la limitaciones en cuanto a escalabilidad y elasticidad de las bases de datos estáticas—están empujando a los profesionales a relajar algunas propiedades importantes de los sistemas transaccionales clásicos, lo que excluye a varias aplicaciones las cuales no pueden encajar en esta estrategia debido a su alta dependencia transaccional. El propósito de esta tesis es abordar dos retos importantes todavía latentes en el campo de las bases de datos distribuidas: (1) las limitaciones en cuanto a escalabilidad de los sistemas transaccionales y (2) el soporte transaccional en repositorios de almacenamiento en la nube. Analizar las técnicas tradicionales de control de concurrencia y de replicación, utilizadas por las bases de datos clásicas para soportar transacciones, es fundamental para identificar las razones que hacen que estos sistemas degraden su rendimiento cuando el número de nodos y/o cantidad de datos crece. Además, este análisis está orientado a justificar el diseño de los repositorios en la nube que deliberadamente han dejado de lado el soporte transaccional. Efectivamente, acercar el paradigma del almacenamiento en la nube a las aplicaciones que tienen una fuerte dependencia en las transacciones es crucial para su adaptación a los requerimientos actuales en cuanto a volúmenes de datos y modelos de negocio. Esta tesis empieza con la propuesta de un simulador de protocolos para bases de datos distribuidas estáticas, el cual sirve como base para la revisión y comparativa de rendimiento de los protocolos de control de concurrencia y las técnicas de replicación existentes. En cuanto a la escalabilidad de las bases de datos y las transacciones, se estudian los efectos que tiene ejecutar distintos perfiles de transacción bajo diferentes condiciones. Este análisis continua con una revisión de los repositorios de almacenamiento en la nube existentes—que prometen encajar en entornos dinámicos que requieren alta escalabilidad y disponibilidad—, el cual permite evaluar los parámetros y características que estos sistemas han sacrificado con el fin de cumplir las necesidades actuales en cuanto a almacenamiento de datos a gran escala. Para explorar las posibilidades que ofrece el paradigma del cloud computing en un escenario real, se presenta el desarrollo de una arquitectura de almacenamiento de datos inspirada en el cloud computing para almacenar la información generada en las Smart Grids. Concretamente, se combinan las técnicas de replicación en bases de datos transaccionales y la propagación epidémica con los principios de diseño usados para construir los repositorios de datos en la nube. Las lecciones recogidas en el estudio de los protocolos de replicación y control de concurrencia en el simulador de base de datos, junto con las experiencias derivadas del desarrollo del repositorio de datos para las Smart Grids, desembocan en lo que hemos acuñado como Epidemia: una infraestructura de almacenamiento para Big Data concebida para proporcionar soporte transaccional en la nube. Además de heredar los beneficios de los repositorios en la nube altamente en cuanto a escalabilidad, Epidemia incluye una capa de gestión de transacciones que reenvía las transacciones de los clientes a un conjunto jerárquico de particiones de datos, lo que permite al sistema ofrecer distintos niveles de consistencia y adaptar elásticamente su configuración a nuevas demandas cargas de trabajo. Por último, los resultados experimentales ponen de manifiesto la viabilidad de nuestra contribución y alientan a los profesionales a continuar trabajando en esta área.Over the past three decades, technology constraints (e.g., capacity of storage devices, communication networks bandwidth) and an ever-increasing set of user demands (e.g., information structures, data volumes) have driven the evolution of distributed databases. Since flat-file data repositories developed in the early eighties, there have been important advances in concurrency control algorithms, replication protocols, and transactions management. However, modern concerns in data storage posed by Big Data and cloud computing—related to overcome the scalability and elasticity limitations of classic databases—are pushing practitioners to relax some important properties featured by transactions, which excludes several applications that are unable to fit in this strategy due to their intrinsic transactional nature. The purpose of this thesis is to address two important challenges still latent in distributed databases: (1) the scalability limitations of transactional databases and (2) providing transactional support on cloud-based storage repositories. Analyzing the traditional concurrency control and replication techniques, used by classic databases to support transactions, is critical to identify the reasons that make these systems degrade their throughput when the number of nodes and/or amount of data rockets. Besides, this analysis is devoted to justify the design rationale behind cloud repositories in which transactions have been generally neglected. Furthermore, enabling applications which are strongly dependent on transactions to take advantage of the cloud storage paradigm is crucial for their adaptation to current data demands and business models. This dissertation starts by proposing a custom protocol simulator for static distributed databases, which serves as a basis for revising and comparing the performance of existing concurrency control protocols and replication techniques. As this thesis is especially concerned with transactions, the effects on the database scalability of different transaction profiles under different conditions are studied. This analysis is followed by a review of existing cloud storage repositories—that claim to be highly dynamic, scalable, and available—, which leads to an evaluation of the parameters and features that these systems have sacrificed in order to meet current large-scale data storage demands. To further explore the possibilities of the cloud computing paradigm in a real-world scenario, a cloud-inspired approach to store data from Smart Grids is presented. More specifically, the proposed architecture combines classic database replication techniques and epidemic updates propagation with the design principles of cloud-based storage. The key insights collected when prototyping the replication and concurrency control protocols at the database simulator, together with the experiences derived from building a large-scale storage repository for Smart Grids, are wrapped up into what we have coined as Epidemia: a storage infrastructure conceived to provide transactional support on the cloud. In addition to inheriting the benefits of highly-scalable cloud repositories, Epidemia includes a transaction management layer that forwards client transactions to a hierarchical set of data partitions, which allows the system to offer different consistency levels and elastically adapt its configuration to incoming workloads. Finally, experimental results highlight the feasibility of our contribution and encourage practitioners to further research in this area

    Partial replication in distributed software transactional memory

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaDistributed software transactional memory (DSTM) is emerging as an interesting alternative for distributed concurrency control. Usually, DSTM systems resort to data distribution and full replication techniques in order to provide scalability and fault tolerance. Nevertheless, distribution does not provide support for fault tolerance and full replication limits the system’s total storage capacity. In this context, partial data replication rises as an intermediate solution that combines the best of the previous two trying to mitigate their disadvantages. This strategy has been explored by the distributed databases research field, but has been little addressed in the context of transactional memory and, to the best of our knowledge, it has never before been incorporated into a DSTM system for a general-purpose programming language. Thus, we defend the claim that it is possible to combine both full and partial data replication in such systems. Accordingly, we developed a prototype of a DSTM system combining full and partial data replication for Java programs. We built from an existent DSTM framework and extended it with support for partial data replication. With the proposed framework, we implemented a partially replicated DSTM. We evaluated the proposed system using known benchmarks, and the evaluation showcases the existence of scenarios where partial data replication can be advantageous, e.g., in scenarios with small amounts of transactions modifying fully replicated data. The results of this thesis show that we were able to sustain our claim by implementing a prototype that effectively combines full and partial data replication in a DSTM system. The modularity of the presented framework allows the easy implementation of its various components, and it provides a non-intrusive interface to applications.Fundação para a Ciência e Tecnologia - (FCT/MCTES) in the scope of the research project PTDC/EIA-EIA/113613/2009 (Synergy-VM
    corecore