864 research outputs found
GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database
Multinational enterprises conduct global business that has a demand for
geo-distributed transactional databases. Existing state-of-the-art databases
adopt a sharded master-follower replication architecture. However, the
single-master serving mode incurs massive cross-region writes from clients, and
the sharded architecture requires multiple round-trip acknowledgments (e.g.,
2PC) to ensure atomicity for cross-shard transactions. These limitations drive
us to seek yet another design choice. In this paper, we propose a strongly
consistent OLTP database GeoGauss with full replica multi-master architecture.
To efficiently merge the updates from different master nodes, we propose a
multi-master OCC that unifies data replication and concurrent transaction
processing. By leveraging an epoch-based delta state merge rule and the
optimistic asynchronous execution, GeoGauss ensures strong consistency with
light-coordinated protocol and allows more concurrency with weak isolation,
which are sufficient to meet our needs. Our geo-distributed experimental
results show that GeoGauss achieves 7.06X higher throughput and 17.41X lower
latency than the state-of-the-art geo-distributed database CockroachDB on the
TPC-C benchmark
Towards Transaction as a Service
This paper argues for decoupling transaction processing from existing
two-layer cloud-native databases and making transaction processing as an
independent service. By building a transaction as a service (TaaS) layer, the
transaction processing can be independently scaled for high resource
utilization and can be independently upgraded for development agility.
Accordingly, we architect an execution-transaction-storage three-layer
cloud-native database. By connecting to TaaS, 1) the AP engines can be
empowered with ACID TP capability, 2) multiple standalone TP engine instances
can be incorporated to support multi-master distributed TP for horizontal
scalability, 3) multiple execution engines with different data models can be
integrated to support multi-model transactions, and 4) high performance TP is
achieved through extensive TaaS optimizations and consistent evolution.
Cloud-native databases deserve better architecture: we believe that TaaS
provides a path forward to better cloud-native databases
Byzantine state machine replication for the masses
Tese de doutoramento, Informática (Ciência da Computação), Universidade de Lisboa, Faculdade de Ciências, 2018The state machine replication technique is a popular approach for building Byzantine fault-tolerant services. However, despite the widespread adoption of this paradigm for crash fault-tolerant systems, there are still few examples of this paradigm for real Byzantine fault-tolerant systems. Our view of this situation is that there is a lack of robust implementations of Byzantine fault-tolerant state machine replication middleware, and that the performance penalty is too high, specially for geo-replication. These hindrances are tightly coupled to the distributed protocols used for enforcing such resilience. This thesis has the objective of finding methodologies for enhancing robustness and performance of state machine replication systems. The first contribution is Mod-SMaRt, a modular protocol that preserves optimal latency in terms of the communications steps exchanged among processes. By being a modular protocol, it becomes simpler to validate and implement, thus resulting in greater robustness; by also preserving optimal message-exchanges among processes, the protocol is capable of delivering desirable performance. The second contribution is concerned with implementing Mod-SMaRt into BFTSMART, a reliable and high-performance codebase that was maintained and improved over the entire course of the PhD that offers multicore-awareness, reconfiguration support, and a flexible API. The third contribution presents WHEAT, a protocol derived from Mod-SMaRt that uses optimizations shown to be effective in reducing latency via a practical evaluation conducted in a geo distributed environment. We additionally conducted an evaluation of both BFT-SMART and WHEAT applied to a relational database middleware and an ordering service for a permissioned blockchain platform. These evaluations revealed encouraging results for both systems and validated our work conducted in the geo-distributed context.A técnica de replicação máquina de estados é um paradigma popular usado em vários sistemas distribuídos modernos. No entanto, apesar da adoção deste paradigma em sistemas reais tolerantes a faltas por paragem, ainda existem poucos exemplos de sistemas reais tolerantes a faltas bizantinas. Segundo a nossa experiência nesta área de investigação, isto deve-se ao fato de existirem poucas concretizações robustas para replicação máquina de estados tolerante a faltas bizantinas, assim como uma perda de desempenho demasiado elevada, especialmente em ambientes geo-replicados. A razão fundamental para a existência destes obstáculos vem dos protocolos distribuídos necessários para assegurar este tipo de resiliência. Esta tese tem como objetivo explorar metodologias para a robustez e eficiência da replicação máquina de estados. A primeira contribuição da tese é o algoritmo Mod-SMaRt, um protocolo modular que preserva latência ótima em termos de passos de comunicação executados pelos processos. Sendo um protocolo modular, torna-se mais simples de validar e concretizar, o que resulta em maior robustez; ao preservar troca de mensagens ótima entre processos, também é capaz de entregar um desempenho desejável. A segunda contribuição consiste em concretizar o protocolo Mod SMaRt na ferramenta BFT-SMART, uma biblioteca fiável de alto desempenho, mantida e melhorada ao longo de todo o período correspondente ao doutoramento, capaz de suportar arquiteturas multi-núcleo, reconfiguração do grupo de réplicas, e uma API de programação flexível. A terceira contribuição consiste em um protocolo derivado do Mod-SMaRt designado WHEAT, que usa otimizações que demostraram serem eficientes na redução da latência segundo uma avaliação prática em ambiente geo-replicado. Adicionalmente, foram também realizadas avaliações de ambos os protocolos quando aplicados num middleware para base de dados relacionais, e num serviço de ordenação para uma plataforma blockchain. Ambas as avaliações revelam resultados encorajadores para ambos os sistemas e validam o trabalho realizado em contexto geo-distribuído.Projeto IRCoC (PTDC/EEI-SCR/6970/2014); Comissão Europeia, FP7 (Seventh Framework Programme for Research and Technological Development), projetos FP7/2007-2013, ICT-25724
Recommended from our members
From Controlled Data-Center Environments to Open Distributed Environments: Scalable, Efficient, and Robust Systems with Extended Functionality
The past two decades have witnessed several paradigm shifts in computing environments. Starting from cloud computing which offers on-demand allocation of storage, network, compute, and memory resources, as well as other services, in a pay-as-you-go billingmodel. Ending with the rise of permissionless blockchain technology, a decentralized computing paradigm with lower trust assumptions and limitless number of participants. Unlike in the cloud, where all the computing resources are owned by some trusted cloud provider, permissionless blockchains allow computing resources owned by possibly malicious parties to join and leave their network without obtaining permission from some centralized trusted authority. Still, in the presence of malicious parties, permissionlessblockchain networks can perform general computations and make progress. Cloud computing is powered by geographically distributed data-centers controlled and managed by trusted cloud service providers and promises theoretically infinite computing resources. On the other hand, permissionless blockchains are powered by open networks of geographically distributed computing nodes owned by entities that are not necessarily known or trusted. This paradigm shift requires a reconsideration of distributed data management protocols and distributed system designs that assume low latency across system components, inelastic computing resources, or fully trusted computing resources.In this dissertation, we propose new system designs and optimizations that address scalability and efficiency of distributed data management systems in cloud environments. We also propose several protocols and new programming paradigms to extend the functionality and enhance the robustness of permissionless blockchains. The work presented spans global-scale transaction processing, large-scale stream processing, atomic transaction processing across permissionless blockchains, and extending the functionality and the use-cases of permissionless blockchains. In all these directions, the focus is on rethinking system and protocol designs to account for novel cloud and permissionless blockchain assumptions. For global-scale transaction processing, we propose GPlacer, a placement optimization framework that decides replica placement of fully and partial geo-replicated databases. For large-scale stream processing, we propose Cache-on-Track (CoT) an adaptive and elastic client-side cache that addresses server-side load-imbalances that occur in large-scale distributed storage layers. In permissionless blockchain transaction processing, we propose AC3WN, the first correct cross-chain commitment protocol that guarantees atomicity of cross-chain transactions. Also, we propose TXSC, a transactional smart contract programming framework. TXSC provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions. In addition, we propose a forward-looking architecture that unifies both permissioned and permissionless blockchains and exploits the running infrastructure of permissionless blockchains to build global asset management systems
The OTree: multidimensional indexing with efficient data sampling for HPC
Spatial big data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of petabytes of spatial data per year. However, many authors have pointed out that the lack of specialized frameworks for multidimensional Big Data is limiting possible applications and precluding many scientific breakthroughs. Paramount in achieving High-Performance Data Analytics is to optimize and reduce the I/O operations required to analyze large data sets. To do so, we need to organize and index the data according to its multidimensional attributes. At the same time, to enable fast and interactive exploratory analysis, it is vital to generate approximate representations of large datasets efficiently. In this paper, we propose the Outlook Tree (or OTree), a novel Multidimensional Indexing with efficient data Sampling (MIS) algorithm. The OTree enables exploratory analysis of large multidimensional datasets with arbitrary precision, a vital missing feature in current distributed data management solutions. Our algorithm reduces the indexing overhead and achieves high performance even for write-intensive HPC applications. Indeed, we use the OTree to store the scientific results of a study on the efficiency of drug inhalers. Then we compare the OTree implementation on Apache Cassandra, named Qbeast, with PostgreSQL and plain storage. Lastly, we demonstrate that our proposal delivers better performance and scalability.Peer ReviewedPostprint (author's final draft
- …