1,570 research outputs found

    High-performance state-machine replication

    Get PDF
    Replication, a common approach to protecting applications against failures, refers to maintaining several copies of a service on independent machines (replicas). Unlike a stand-alone service, a replicated service remains available to its clients despite the failure of some of its copies. Consistency among replicas is an immediate concern raised by replication. In effect, an important factor for providing the illusion of an uninterrupted service to clients is to preserve consistency among the multiple copies. State-machine replication is a popular replication technique that ensures consistency by ordering client requests and making all the replicas execute them deterministically and sequentially. The overhead of ordering the requests, and the sequentiality of request execution, the two essential requirements in realizing state-machine replication, are also the two major obstacles that prevent the performance of state-machine replication from scaling. In this thesis we concentrate on the performance of state-machine replication and enhance it by overcoming the two aforementioned bottlenecks, the overhead of ordering and the overhead of sequentially executing commands. To realize a truly scalable system, one must iteratively examine and analyze all the layers and components of a system and avoid or eliminate potential performance obstructions and congestion points. In this dissertation, we iterate between optimizing the ordering of requests and the strategies of replicas at request execution, in order to stretch the performance boundaries of state-machine replication. To eliminate the negative implications of the ordering layer on performance, we devise and implement several novel and highly efficient ordering protocols. Our proposals are based on practical observations we make after closely assessing and identifying the shortcomings of existing approaches. Communication is one of the most important components of any distributed system and thus selecting efficient communication patterns is a must in designing scalable systems. We base our protocols on the most suitable communication patterns and extend their design with additional features that altogether realize our protocol's high efficiency. The outcome of this phase is the design and implementation of the Ring Paxos family of protocols. According to our evaluations these protocols are highly scalable and efficient. We then assess the performance ramifications of sequential execution of requests on the replicas of state-machine replication. We use some known techniques such as state-partitioning and speculative execution, and thoroughly examine their advantages when combined with our ordering protocols. We then exploit the features of multicore hardware and propose our final solution as a parallelized form of state-machine replication, built on top of Ring Paxos protocols, that is capable of accomplishing significantly high performance. Given the popularity of state-machine replication in designing fault-tolerant systems, we hope this thesis provides useful and practical guidelines for the enhancement of the existing and the design of future fault-tolerant systems that share similar performance goals

    Proceedings of the real-time database workshop, Eindhoven, 23 February 1995

    Get PDF

    High performance deferred update replication

    Get PDF
    Replication is a well-known approach to implementing storage systems that can tolerate failures. Replicated storage systems are designed such that the state of the system is kept at several replicas. A replication protocol ensures that the failure of a replica is masked by the rest of the system, in a way that is transparent to its users. Replicated storage systems are among the most important building blocks in the design of large scale applications. Applications at scale are often deployed on top of commodity hardware, store a vast amount of data, and serve a large number of users. The larger the system, the higher its vulnerability to failures. The ability to tolerate failures is not the only desirable feature in a replicated system. Storage systems need to be efficient in order to accommodate requests from a large user base while achieving low response times. In that respect, replication can leverage multiple replicas to parallelize the execution of user requests. This thesis focuses on Deferred Update Replication (DUR), a well-established database replication approach. It provides high availability in that every replica can execute client transactions. In terms of performance, it is better than other replication techniques in that only one replica executes a given transaction while the other replicas only apply state changes. However, DUR suffers from the following drawback: each replica stores a full copy of the database, which has consequences in terms of performance. The first consequence is that DUR cannot take advantage of the aggregated memory available to the replicas. Our first contribution is a distributed caching mechanism that addresses the problem. It makes efficient use of the main memory of an entire cluster of machines, while guaranteeing strong consistency. The second consequence is that DUR cannot scale with the number of replicas. The throughput of a fully replicated system is inherently limited by the number of transactions that a single replica can apply to its local storage. We propose a scalable version of the DUR approach where the system state is partitioned in smaller replica sets. Transactions that access disjoint partitions are parallelized. The last part of the thesis focuses on latency. We show that the scalable DUR-based approach may have detrimental effects on response time, especially when replicas are geographically distributed. The thesis considers different deployments and their implications on latency. We propose optimizations that provide substantial gains in geographically distributed environments

    Practical database replication

    Get PDF
    Tese de doutoramento em InformáticaSoftware-based replication is a cost-effective approach for fault-tolerance when combined with commodity hardware. In particular, shared-nothing database clusters built upon commodity machines and synchronized through eager software-based replication protocols have been driven by the distributed systems community in the last decade. The efforts on eager database replication, however, stem from the late 1970s with initial proposals designed by the database community. From that time, we have the distributed locking and atomic commitment protocols. Briefly speaking, before updating a data item, all copies are locked through a distributed lock, and upon commit, an atomic commitment protocol is responsible for guaranteeing that the transaction’s changes are written to a non-volatile storage at all replicas before committing it. Both these processes contributed to a poor performance. The distributed systems community improved these processes by reducing the number of interactions among replicas through the use of group communication and by relaxing the durability requirements imposed by the atomic commitment protocol. The approach requires at most two interactions among replicas and disseminates updates without necessarily applying them before committing a transaction. This relies on a high number of machines to reduce the likelihood of failures and ensure data resilience. Clearly, the availability of commodity machines and their increasing processing power makes this feasible. Proving the feasibility of this approach requires us to build several prototypes and evaluate them with different workloads and scenarios. Although simulation environments are a good starting point, mainly those that allow us to combine real (e.g., replication protocols, group communication) and simulated-code (e.g., database, network), full-fledged implementations should be developed and tested. Unfortunately, database vendors usually do not provide native support for the development of third-party replication protocols, thus forcing protocol developers to either change the database engines, when the source code is available, or construct in the middleware server wrappers that intercept client requests otherwise. The former solution is hard to maintain as new database releases are constantly being produced, whereas the latter represents a strenuous development effort as it requires us to rebuild several database features at the middleware. Unfortunately, the group-based replication protocols, optimistic or conservative, that had been proposed so far have drawbacks that present a major hurdle to their practicability. The optimistic protocols make it difficult to commit transactions in the presence of hot-spots, whereas the conservative protocols have a poor performance due to concurrency issues. In this thesis, we propose using a generic architecture and programming interface, titled GAPI, to facilitate the development of different replication strategies. The idea consists of providing key extensions to multiple DBMSs (Database Management Systems), thus enabling a replication strategy to be developed once and tested on several databases that have such extensions, i.e., those that are replication-friendly. To tackle the aforementioned problems in groupbased replication protocols, we propose using a novel protocol, titled AKARA. AKARA guarantees fairness, and thus all transactions have a chance to commit, and ensures great performance while exploiting parallelism as provided by local database engines. Finally, we outline a simple but comprehensive set of components to build group-based replication protocols and discuss key points in its design and implementation.A replicação baseada em software é uma abordagem que fornece um bom custo benefício para tolerância a falhas quando combinada com hardware commodity. Em particular, os clusters de base de dados “shared-nothing” construídos com hardware commodity e sincronizados através de protocolos “eager” têm sido impulsionados pela comunidade de sistemas distribuídos na última década. Os primeiros esforços na utilização dos protocolos “eager”, decorrem da década de 70 do século XX com as propostas da comunidade de base de dados. Dessa época, temos os protocolos de bloqueio distribuído e de terminação atómica (i.e. “two-phase commit”). De forma sucinta, antes de actualizar um item de dados, todas as cópias são bloqueadas através de um protocolo de bloqueio distribuído e, no momento de efetivar uma transacção, um protocolo de terminação atómica é responsável por garantir que as alterações da transacção são gravadas em todas as réplicas num sistema de armazenamento não-volátil. No entanto, ambos os processos contribuem para um mau desempenho do sistema. A comunidade de sistemas distribuídos melhorou esses processos, reduzindo o número de interacções entre réplicas, através do uso da comunicação em grupo e minimizando a rigidez os requisitos de durabilidade impostos pelo protocolo de terminação atómica. Essa abordagem requer no máximo duas interacções entre as réplicas e dissemina actualizações sem necessariamente aplicá-las antes de efectivar uma transacção. Para funcionar, a solução depende de um elevado número de máquinas para reduzirem a probabilidade de falhas e garantir a resiliência de dados. Claramente, a disponibilidade de hardware commodity e o seu poder de processamento crescente tornam essa abordagem possível. Comprovar a viabilidade desta abordagem obriga-nos a construir vários protótipos e a avaliálos com diferentes cargas de trabalho e cenários. Embora os ambientes de simulação sejam um bom ponto de partida, principalmente aqueles que nos permitem combinar o código real (por exemplo, protocolos de replicação, a comunicação em grupo) e o simulado (por exemplo, base de dados, rede), implementações reais devem ser desenvolvidas e testadas. Infelizmente, os fornecedores de base de dados, geralmente, não possuem suporte nativo para o desenvolvimento de protocolos de replicação de terceiros, forçando os desenvolvedores de protocolo a mudar o motor de base de dados, quando o código fonte está disponível, ou a construir no middleware abordagens que interceptam as solicitações do cliente. A primeira solução é difícil de manter já que novas “releases” das bases de dados estão constantemente a serem produzidas, enquanto a segunda representa um desenvolvimento árduo, pois obriga-nos a reconstruir vários recursos de uma base de dados no middleware. Infelizmente, os protocolos de replicação baseados em comunicação em grupo, optimistas ou conservadores, que foram propostos até agora apresentam inconvenientes que são um grande obstáculo à sua utilização. Com os protocolos optimistas é difícil efectivar transacções na presença de “hot-spots”, enquanto que os protocolos conservadores têm um fraco desempenho devido a problemas de concorrência. Nesta tese, propomos utilizar uma arquitetura genérica e uma interface de programação, intitulada GAPI, para facilitar o desenvolvimento de diferentes estratégias de replicação. A ideia consiste em fornecer extensões chaves para múltiplos SGBDs (Database Management Systems), permitindo assim que uma estratégia de replicação possa ser desenvolvida uma única vez e testada em várias bases de dados que possuam tais extensões, ou seja, aquelas que são “replicationfriendly”. Para resolver os problemas acima referidos nos protocolos de replicação baseados em comunicação em grupo, propomos utilizar um novo protocolo, intitulado AKARA. AKARA garante a equidade, portanto, todas as operações têm uma oportunidade de serem efectivadas, e garante um excelente desempenho ao tirar partido do paralelismo fornecido pelos motores de base de dados. Finalmente, propomos um conjunto simples, mas abrangente de componentes para construir protocolos de replicação baseados em comunicação em grupo e discutimos pontoschave na sua concepção e implementação

    Database replication for enterprise applications

    Get PDF
    The MAP-i Doctoral Programme in Informatics, of the Universities of Minho, Aveiro and PortoA common pattern for enterprise applications, particularly in small and medium businesses, is the reliance on an integrated traditional relational database system that provides persistence and where the relational aspect underlies the core logic of the application. While several solutions are proposed for scaling out such applications, database replication is key if the relational aspect is to be preserved. However, it is worrisome that because proposed solutions for database replication have been evaluated using simple synthetic benchmarks, their applicability to enterprise applications is not straightforward: the performance of conservative solutions hinges on the ability to conveniently partition applications while optimistic solutions may experience unacceptable abort rates, compromising fairness, particularly considering long-running transactions. In this thesis, we address these challenges. First, by performing a detailed evaluation of the applicability of database replication protocols based on conservative concurrency control to enterprise applications. Results invalidate the common assumption that real-world databases can be easily partitioned. Then, we tackle the issue of unacceptable abort rates in optimistic solutions by proposing a novel transaction scheduler, AJITTS, which uses an adaptive mechanism that by reaching and maintaining the optimal level of concurrency in the system, minimizes aborts and improves throughput.Um padrão comum no que toca a aplicações empresariais, particularmente em pequenas e médias empresas, é a dependência de um sistema de base dados relacional integrado que garante a persistência dos dados e no qual o aspeto relacional é parte integral da logica da aplicação. Embora várias soluções tenham sido propostas para dotar este tipo de aplicações de escalabilidade horizontal, a replicação de base de dados é a solução se o aspeto relacional deve ser preservado. No entanto, é preocupante que, dado que as soluções existentes para replicação de base de dados têm sido avaliadas utilizando testes de desempenho sintéticos e simples, a aplicabilidade destes a aplicações empresariais não é directa: o desempenho de soluções conservadoras está intimamente ligado à capacidade de particionar a aplicação convenientemente, enquanto que soluções optimistas podem sofrer de taxas de insucesso inaceitáveis o que compromete a equidade das mesmas, em particular no caso de transações especialmente longas. Nesta tese, abordamos estes desafios. Primeiro, através de uma avaliação detalhada da aplicabilidade de protocolos de replicação de base de dados baseados em controlo de concorrência conservador a aplicações empresariais. Os resultados obtidos invalidam o pressuposto comum de que bases de dados reais podem ser facilmente particionadas. Assim sendo, abordámos o problema das possíveis taxas de insucesso inaceitáveis em soluções optimistas propondo um novo escalonador de transações, o AJITTS, que utiliza um mecanismo adaptativo que ao atingir e manter o nível ótimo de concorrência no sistema, minimiza a taxa de insucesso e melhora o desempenho do mesmo

    Invariant preservation in geo-replicated data stores

    Get PDF
    The Internet has enabled people from all around the globe to communicate with each other in a matter of milliseconds. This possibility has a great impact in the way we work, behave and communicate, while the full extent of possibilities are yet to be known. As we become more dependent of Internet services, the more important is to ensure that these systems operate correctly, with low latency and high availability for millions of clients scattered all around the globe. To be able to provide service to a large number of clients, and low access latency for clients in different geographical locations, Internet services typically rely on georeplicated storage systems. Replication comes with costs that may affect service quality. To propagate updates between replicas, systems either choose to lose consistency in favor of better availability and latency (weak consistency), or maintain consistency, but the system might become unavailable during partitioning (strong consistency). In practice, many production systems rely on weak consistency storage systems to enhance user experience, overlooking that applications can become incorrect due to the weaker consistency assumptions. In this thesis, we study how to exploit application’s semantics to build correct applications without affecting the availability and latency of operations. We propose a new consistency model that breaks apart from traditional knowledge that applications consistency is dependent on coordinating the execution of operations across replicas. We show that it is possible to execute most operations with low latency and in an highly available way, while preserving application’s correctness. Our approach consists in specifying the fundamental properties that define the correctness of applications, i.e. the application invariants, and identify and prevent concurrent executions that potentially can make the state of the database inconsistent, i.e. that may violate some invariant. We explore different, complementary, approaches to implement this model. The Indigo approach consists in preventing conflicting operations from executing concurrently, by restricting the operations that each replica can execute at each moment to maintain application’s correctness. The IPA approach does not preclude the execution of any operation, ensuring high availability. To maintain application correctness, operations are modified to prevent invariant violations during replica reconciliation, or, if modifying operations provides an unsatisfactory semantics, it is possible to correct any invariant violations before a client can read an inconsistent state, by executing compensations. Evaluation shows that our approaches can ensure both low latency and high availability for most operations in common Internet application workloads, with small execution overhead in comparison to unmodified weak consistency systems, while enforcing application invariants, as in strong consistency systems

    Speculation in Parallel and Distributed Event Processing Systems

    Get PDF
    Event stream processing (ESP) applications enable the real-time processing of continuous flows of data. Algorithmic trading, network monitoring, and processing data from sensor networks are good examples of applications that traditionally rely upon ESP systems. In addition, technological advances are resulting in an increasing number of devices that are network enabled, producing information that can be automatically collected and processed. This increasing availability of on-line data motivates the development of new and more sophisticated applications that require low-latency processing of large volumes of data. ESP applications are composed of an acyclic graph of operators that is traversed by the data. Inside each operator, the events can be transformed, aggregated, enriched, or filtered out. Some of these operations depend only on the current input events, such operations are called stateless. Other operations, however, depend not only on the current event, but also on a state built during the processing of previous events. Such operations are, therefore, named stateful. As the number of ESP applications grows, there are increasingly strong requirements, which are often difficult to satisfy. In this dissertation, we address two challenges created by the use of stateful operations in a ESP application: (i) stateful operators can be bottlenecks because they are sensitive to the order of events and cannot be trivially parallelized by replication; and (ii), if failures are to be tolerated, the accumulated state of an stateful operator needs to be saved, saving this state traditionally imposes considerable performance costs. Our approach is to evaluate the use of speculation to address these two issues. For handling ordering and parallelization issues in a stateful operator, we propose a speculative approach that both reduces latency when the operator must wait for the correct ordering of the events and improves throughput when the operation in hand is parallelizable. In addition, our approach does not require that user understand concurrent programming or that he or she needs to consider out-of-order execution when writing the operations. For fault-tolerant applications, traditional approaches have imposed prohibitive performance costs due to pessimistic schemes. We extend such approaches, using speculation to mask the cost of fault tolerance.:1 Introduction 1 1.1 Event stream processing systems ......................... 1 1.2 Running example ................................. 3 1.3 Challenges and contributions ........................... 4 1.4 Outline ...................................... 6 2 Background 7 2.1 Event stream processing ............................. 7 2.1.1 State in operators: Windows and synopses ............................ 8 2.1.2 Types of operators ............................ 12 2.1.3 Our prototype system........................... 13 2.2 Software transactional memory.......................... 18 2.2.1 Overview ................................. 18 2.2.2 Memory operations............................ 19 2.3 Fault tolerance in distributed systems ...................................... 23 2.3.1 Failure model and failure detection ...................................... 23 2.3.2 Recovery semantics............................ 24 2.3.3 Active and passive replication ...................... 24 2.4 Summary ..................................... 26 3 Extending event stream processing systems with speculation 27 3.1 Motivation..................................... 27 3.2 Goals ....................................... 28 3.3 Local versus distributed speculation ....................... 29 3.4 Models and assumptions ............................. 29 3.4.1 Operators................................. 30 3.4.2 Events................................... 30 3.4.3 Failures .................................. 31 4 Local speculation 33 4.1 Overview ..................................... 33 4.2 Requirements ................................... 35 4.2.1 Order ................................... 35 4.2.2 Aborts................................... 37 4.2.3 Optimism control ............................. 38 4.2.4 Notifications ............................... 39 4.3 Applications.................................... 40 4.3.1 Out-of-order processing ......................... 40 4.3.2 Optimistic parallelization......................... 42 4.4 Extensions..................................... 44 4.4.1 Avoiding unnecessary aborts ....................... 44 4.4.2 Making aborts unnecessary........................ 45 4.5 Evaluation..................................... 47 4.5.1 Overhead of speculation ......................... 47 4.5.2 Cost of misspeculation .......................... 50 4.5.3 Out-of-order and parallel processing micro benchmarks ........... 53 4.5.4 Behavior with example operators .................... 57 4.6 Summary ..................................... 60 5 Distributed speculation 63 5.1 Overview ..................................... 63 5.2 Requirements ................................... 64 5.2.1 Speculative events ............................ 64 5.2.2 Speculative accesses ........................... 69 5.2.3 Reliable ordered broadcast with optimistic delivery .................. 72 5.3 Applications .................................... 75 5.3.1 Passive replication and rollback recovery ................................ 75 5.3.2 Active replication ............................. 80 5.4 Extensions ..................................... 82 5.4.1 Active replication and software bugs ..................................... 82 5.4.2 Enabling operators to output multiple events ........................ 87 5.5 Evaluation .................................... 87 5.5.1 Passive replication ............................ 88 5.5.2 Active replication ............................. 88 5.6 Summary ..................................... 93 6 Related work 95 6.1 Event stream processing engines ......................... 95 6.2 Parallelization and optimistic computing ................................ 97 6.2.1 Speculation ................................ 97 6.2.2 Optimistic parallelization ......................... 98 6.2.3 Parallelization in event processing .................................... 99 6.2.4 Speculation in event processing ..................... 99 6.3 Fault tolerance .................................. 100 6.3.1 Passive replication and rollback recovery ............................... 100 6.3.2 Active replication ............................ 101 6.3.3 Fault tolerance in event stream processing systems ............. 103 7 Conclusions 105 7.1 Summary of contributions ............................ 105 7.2 Challenges and future work ............................ 106 Appendices Publications 107 Pseudocode for the consensus protocol 10

    From cluster databases to cloud storage: Providing transactional support on the cloud

    Get PDF
    Durant les últimes tres dècades, les limitacions tecnològiques (com per exemple la capacitat dels dispositius d'emmagatzematge o l'ample de banda de les xarxes de comunicació) i les creixents demandes dels usuaris (estructures d'informació, volums de dades) han conduït l'evolució de les bases de dades distribuïdes. Des dels primers repositoris de dades per arxius plans que es van desenvolupar en la dècada dels vuitanta, s'han produït importants avenços en els algoritmes de control de concurrència, protocols de replicació i en la gestió de transaccions. No obstant això, els reptes moderns d'emmagatzematge de dades que plantegen el Big Data i el cloud computing—orientats a millorar la limitacions pel que fa a escalabilitat i elasticitat de les bases de dades estàtiques—estan empenyent als professionals a relaxar algunes propietats importants dels sistemes transaccionals clàssics, cosa que exclou a diverses aplicacions les quals no poden encaixar en aquesta estratègia degut a la seva alta dependència transaccional. El propòsit d'aquesta tesi és abordar dos reptes importants encara latents en el camp de les bases de dades distribuïdes: (1) les limitacions pel que fa a escalabilitat dels sistemes transaccionals i (2) el suport transaccional en repositoris d'emmagatzematge en el núvol. Analitzar les tècniques tradicionals de control de concurrència i de replicació, utilitzades per les bases de dades clàssiques per suportar transaccions, és fonamental per identificar les raons que fan que aquests sistemes degradin el seu rendiment quan el nombre de nodes i / o quantitat de dades creix. A més, aquest anàlisi està orientat a justificar el disseny dels repositoris en el núvol que deliberadament han deixat de banda el suport transaccional. Efectivament, apropar el paradigma de l'emmagatzematge en el núvol a les aplicacions que tenen una forta dependència en les transaccions és fonamental per a la seva adaptació als requeriments actuals pel que fa a volums de dades i models de negoci. Aquesta tesi comença amb la proposta d'un simulador de protocols per a bases de dades distribuïdes estàtiques, el qual serveix com a base per a la revisió i comparativa de rendiment dels protocols de control de concurrència i les tècniques de replicació existents. Pel que fa a la escalabilitat de les bases de dades i les transaccions, s'estudien els efectes que té executar diferents perfils de transacció sota diferents condicions. Aquesta anàlisi contínua amb una revisió dels repositoris d'emmagatzematge de dades en el núvol existents—que prometen encaixar en entorns dinàmics que requereixen alta escalabilitat i disponibilitat—, el qual permet avaluar els paràmetres i característiques que aquests sistemes han sacrificat per tal de complir les necessitats actuals pel que fa a emmagatzematge de dades a gran escala. Per explorar les possibilitats que ofereix el paradigma del cloud computing en un escenari real, es presenta el desenvolupament d'una arquitectura d'emmagatzematge de dades inspirada en el cloud computing la qual s’utilitza per emmagatzemar la informació generada en les Smart Grids. Concretament, es combinen les tècniques de replicació en bases de dades transaccionals i la propagació epidèmica amb els principis de disseny usats per construir els repositoris de dades en el núvol. Les lliçons recollides en l'estudi dels protocols de replicació i control de concurrència en el simulador de base de dades, juntament amb les experiències derivades del desenvolupament del repositori de dades per a les Smart Grids, desemboquen en el que hem batejat com Epidemia: una infraestructura d'emmagatzematge per Big Data concebuda per proporcionar suport transaccional en el núvol. A més d'heretar els beneficis dels repositoris en el núvol en quant a escalabilitat, Epidemia inclou una capa de gestió de transaccions que reenvia les transaccions dels clients a un conjunt jeràrquic de particions de dades, cosa que permet al sistema oferir diferents nivells de consistència i adaptar elàsticament la seva configuració a noves demandes de càrrega de treball. Finalment, els resultats experimentals posen de manifest la viabilitat de la nostra contribució i encoratgen als professionals a continuar treballant en aquesta àrea.Durante las últimas tres décadas, las limitaciones tecnológicas (por ejemplo la capacidad de los dispositivos de almacenamiento o el ancho de banda de las redes de comunicación) y las crecientes demandas de los usuarios (estructuras de información, volúmenes de datos) han conducido la evolución de las bases de datos distribuidas. Desde los primeros repositorios de datos para archivos planos que se desarrollaron en la década de los ochenta, se han producido importantes avances en los algoritmos de control de concurrencia, protocolos de replicación y en la gestión de transacciones. Sin embargo, los retos modernos de almacenamiento de datos que plantean el Big Data y el cloud computing—orientados a mejorar la limitaciones en cuanto a escalabilidad y elasticidad de las bases de datos estáticas—están empujando a los profesionales a relajar algunas propiedades importantes de los sistemas transaccionales clásicos, lo que excluye a varias aplicaciones las cuales no pueden encajar en esta estrategia debido a su alta dependencia transaccional. El propósito de esta tesis es abordar dos retos importantes todavía latentes en el campo de las bases de datos distribuidas: (1) las limitaciones en cuanto a escalabilidad de los sistemas transaccionales y (2) el soporte transaccional en repositorios de almacenamiento en la nube. Analizar las técnicas tradicionales de control de concurrencia y de replicación, utilizadas por las bases de datos clásicas para soportar transacciones, es fundamental para identificar las razones que hacen que estos sistemas degraden su rendimiento cuando el número de nodos y/o cantidad de datos crece. Además, este análisis está orientado a justificar el diseño de los repositorios en la nube que deliberadamente han dejado de lado el soporte transaccional. Efectivamente, acercar el paradigma del almacenamiento en la nube a las aplicaciones que tienen una fuerte dependencia en las transacciones es crucial para su adaptación a los requerimientos actuales en cuanto a volúmenes de datos y modelos de negocio. Esta tesis empieza con la propuesta de un simulador de protocolos para bases de datos distribuidas estáticas, el cual sirve como base para la revisión y comparativa de rendimiento de los protocolos de control de concurrencia y las técnicas de replicación existentes. En cuanto a la escalabilidad de las bases de datos y las transacciones, se estudian los efectos que tiene ejecutar distintos perfiles de transacción bajo diferentes condiciones. Este análisis continua con una revisión de los repositorios de almacenamiento en la nube existentes—que prometen encajar en entornos dinámicos que requieren alta escalabilidad y disponibilidad—, el cual permite evaluar los parámetros y características que estos sistemas han sacrificado con el fin de cumplir las necesidades actuales en cuanto a almacenamiento de datos a gran escala. Para explorar las posibilidades que ofrece el paradigma del cloud computing en un escenario real, se presenta el desarrollo de una arquitectura de almacenamiento de datos inspirada en el cloud computing para almacenar la información generada en las Smart Grids. Concretamente, se combinan las técnicas de replicación en bases de datos transaccionales y la propagación epidémica con los principios de diseño usados para construir los repositorios de datos en la nube. Las lecciones recogidas en el estudio de los protocolos de replicación y control de concurrencia en el simulador de base de datos, junto con las experiencias derivadas del desarrollo del repositorio de datos para las Smart Grids, desembocan en lo que hemos acuñado como Epidemia: una infraestructura de almacenamiento para Big Data concebida para proporcionar soporte transaccional en la nube. Además de heredar los beneficios de los repositorios en la nube altamente en cuanto a escalabilidad, Epidemia incluye una capa de gestión de transacciones que reenvía las transacciones de los clientes a un conjunto jerárquico de particiones de datos, lo que permite al sistema ofrecer distintos niveles de consistencia y adaptar elásticamente su configuración a nuevas demandas cargas de trabajo. Por último, los resultados experimentales ponen de manifiesto la viabilidad de nuestra contribución y alientan a los profesionales a continuar trabajando en esta área.Over the past three decades, technology constraints (e.g., capacity of storage devices, communication networks bandwidth) and an ever-increasing set of user demands (e.g., information structures, data volumes) have driven the evolution of distributed databases. Since flat-file data repositories developed in the early eighties, there have been important advances in concurrency control algorithms, replication protocols, and transactions management. However, modern concerns in data storage posed by Big Data and cloud computing—related to overcome the scalability and elasticity limitations of classic databases—are pushing practitioners to relax some important properties featured by transactions, which excludes several applications that are unable to fit in this strategy due to their intrinsic transactional nature. The purpose of this thesis is to address two important challenges still latent in distributed databases: (1) the scalability limitations of transactional databases and (2) providing transactional support on cloud-based storage repositories. Analyzing the traditional concurrency control and replication techniques, used by classic databases to support transactions, is critical to identify the reasons that make these systems degrade their throughput when the number of nodes and/or amount of data rockets. Besides, this analysis is devoted to justify the design rationale behind cloud repositories in which transactions have been generally neglected. Furthermore, enabling applications which are strongly dependent on transactions to take advantage of the cloud storage paradigm is crucial for their adaptation to current data demands and business models. This dissertation starts by proposing a custom protocol simulator for static distributed databases, which serves as a basis for revising and comparing the performance of existing concurrency control protocols and replication techniques. As this thesis is especially concerned with transactions, the effects on the database scalability of different transaction profiles under different conditions are studied. This analysis is followed by a review of existing cloud storage repositories—that claim to be highly dynamic, scalable, and available—, which leads to an evaluation of the parameters and features that these systems have sacrificed in order to meet current large-scale data storage demands. To further explore the possibilities of the cloud computing paradigm in a real-world scenario, a cloud-inspired approach to store data from Smart Grids is presented. More specifically, the proposed architecture combines classic database replication techniques and epidemic updates propagation with the design principles of cloud-based storage. The key insights collected when prototyping the replication and concurrency control protocols at the database simulator, together with the experiences derived from building a large-scale storage repository for Smart Grids, are wrapped up into what we have coined as Epidemia: a storage infrastructure conceived to provide transactional support on the cloud. In addition to inheriting the benefits of highly-scalable cloud repositories, Epidemia includes a transaction management layer that forwards client transactions to a hierarchical set of data partitions, which allows the system to offer different consistency levels and elastically adapt its configuration to incoming workloads. Finally, experimental results highlight the feasibility of our contribution and encourage practitioners to further research in this area
    corecore