Search CORE

662 research outputs found

PaRiS: Causally Consistent Transactions with Non-blocking Reads and Partial Replication

Author: Didona Diego
Spirovska Kristina
Zwaenepoel Willy
Publication venue
Publication date: 25/02/2019
Field of study

Geo-replicated data platforms are at the backbone of several large-scale online services. Transactional Causal Consistency (TCC) is an attractive consistency level for building such platforms. TCC avoids many anomalies of eventual consistency, eschews the synchronization costs of strong consistency, and supports interactive read-write transactions. Partial replication is another attractive design choice for building geo-replicated platforms, as it increases the storage capacity and reduces update propagation costs. This paper presents PaRiS, the first TCC system that supports partial replication and implements non-blocking parallel read operations, whose latency is paramount for the performance of read-intensive applications. PaRiS relies on a novel protocol to track dependencies, called Universal Stable Time (UST). By means of a lightweight background gossip process, UST identifies a snapshot of the data that has been installed by every DC in the system. Hence, transactions can consistently read from such a snapshot on any server in any replication site without having to block. Moreover, PaRiS requires only one timestamp to track dependencies and define transactional snapshots, thereby achieving resource efficiency and scalability. We evaluate PaRiS on a large-scale AWS deployment composed of up to 10 replication sites. We show that PaRiS scales well with the number of DCs and partitions, while being able to handle larger data-sets than existing solutions that assume full replication. We also demonstrate a performance gain of non-blocking reads vs. a blocking alternative (up to 1.47x higher throughput with 5.91x lower latency for read-dominated workloads and up to 1.46x higher throughput with 20.56x lower latency for write-heavy workloads)

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

MDCC: Multi-Data Center Consistency

Author: Fekete Alan
Franklin Michael J.
Kraska Tim
Madden Samuel R.
Pang Gene
Publication venue
Publication date: 01/01/2012
Field of study

Replicating data across multiple data centers not only allows moving the data closer to the user and, thus, reduces latency for applications, but also increases the availability in the event of a data center failure. Therefore, it is not surprising that companies like Google, Yahoo, and Netflix already replicate user data across geographically different regions. However, replication across data centers is expensive. Inter-data center network delays are in the hundreds of milliseconds and vary significantly. Synchronous wide-area replication is therefore considered to be unfeasible with strong consistency and current solutions either settle for asynchronous replication which implies the risk of losing data in the event of failures, restrict consistency to small partitions, or give up consistency entirely. With MDCC (Multi-Data Center Consistency), we describe the first optimistic commit protocol, that does not require a master or partitioning, and is strongly consistent at a cost similar to eventually consistent protocols. MDCC can commit transactions in a single round-trip across data centers in the normal operational case. We further propose a new programming model which empowers the application developer to handle longer and unpredictable latencies caused by inter-data center communication. Our evaluation using the TPC-W benchmark with MDCC deployed across 5 geographically diverse data centers shows that MDCC is able to achieve throughput and latency similar to eventually consistent quorum protocols and that MDCC is able to sustain a data center outage without a significant impact on response times while guaranteeing strong consistency

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

CloudTPS: Scalable Transactions for Web Applications in the Cloud

Author: Chi C.-H.
Pierre G.E.O.
Zhou W.
Publication venue: Amsterdam, the Netherlands
Publication date: 01/01/2010
Field of study

NoSQL Cloud data services provide scalability and high availability properties for web applications but at the same time they sacrifice data consistency. However, many applications cannot afford any data inconsistency. CloudTPS is a scalable transaction manager to allow cloud database services to execute the ACID transactions of web applications, even in the presence of server failures and network partitions. We implement this approach on top of the two main families of scalable data layers: Bigtable and SimpleDB. Performance evaluation on top of HBase (an open-source version of Bigtable) in our local cluster and Amazon SimpleDB in the Amazon cloud shows that our system scales linearly at least up to 40 nodes in our local cluster and 80 nodes in the Amazon cloud

CiteSeerX

VU Research Portal

ElasTraS: An Elastic Transactional Data Store in the Cloud

Author: Abbadi Amr El
Agrawal Divyakant
Das Sudipto
Publication venue
Publication date: 01/01/2009
Field of study

Over the last couple of years, "Cloud Computing" or "Elastic Computing" has emerged as a compelling and successful paradigm for internet scale computing. One of the major contributing factors to this success is the elasticity of resources. In spite of the elasticity provided by the infrastructure and the scalable design of the applications, the elephant (or the underlying database), which drives most of these web-based applications, is not very elastic and scalable, and hence limits scalability. In this paper, we propose ElasTraS which addresses this issue of scalability and elasticity of the data store in a cloud computing environment to leverage from the elastic nature of the underlying infrastructure, while providing scalable transactional data access. This paper aims at providing the design of a system in progress, highlighting the major design choices, analyzing the different guarantees provided by the system, and identifying several important challenges for the research community striving for computing in the cloud.Comment: 5 Pages, In Proc. of USENIX HotCloud 200

arXiv.org e-Print Archive

CiteSeerX

Transaction Processing over Geo-Partitioned Data

Author: Braz Sofia Frederico de Sousa
Publication venue
Publication date: 01/01/2022
Field of study

Databases are a fundamental component of any web service, storing and managing all the service data. In large-scale web services, it is essential that the data storage systems used consider techniques such as partial replication, geo-replication, and weaker consistency models so that the expectations of these systems regarding availability and latency can be met as best as possible. In this dissertation, we address the problem of executing transactions on data that is partially replicated. In this sense, we adopt the transactional causal consistency semantics, the consistency model where a transaction accesses a causally consistent snapshot of the database. However, implementing this consistency model in a partially replicated setting raises several challenges regarding handling transactions that access data items replicated in different nodes. Our work aims to design and implement a novel algorithm for executing transactions over geo-partitioned data with transactional causal consistency semantics. We discuss the problems and design choices for executing transactions over partially replicated data and present a design to implement the proposed algorithm by extending a weakly consistent geo-replicated key-value store with partial replication, adding support for executing transactions involving geo-partitioned data items. In this context, we also addressed the problem of deciding the best strategy for searching data in replicas that hold only a part of the total data of a service and where the state of each replica might diverge. We evaluate our solution using microbenchmarks based on the TPC-H database. Our results show that the overhead of the system is low for the expected scenario of a low ratio of remote transactions.As bases de dados representam um componente fundamental de qualquer serviço web, armazenando e gerindo todos os dados do serviço. Em serviços web de grande escala, é essencial que os sistemas de armazenamento de dados utilizados considerem técnicas como a replicação parcial, geo-replicação e modelos de consistência mais fracos, de forma a que as expectativas dos utilizadores desses sistemas em relação à disponibilidade e latência possam ser atendidas da melhor forma possível. Nesta dissertação, abordamos o problema de executar transações sobre dados que estão parcialmente replicados. Nesse sentido, adotamos uma semântica de consistência transacional causal, o modelo de consistência em que uma transação acede a um snapshot causalmente consistente da base de dados. No entanto, implementar este modelo de consistência numa configuração parcialmente replicada levanta vários desafios relativamente à execução de transações que acedem a dados replicados em nós diferentes. O objetivo do nosso trabalho é projetar e implementar um novo algoritmo para a execução de transações sobre dados geo-particionados com semântica de consistência causal transacional. Discutimos os problemas e as opções de design para a execução de transações em dados parcialmente replicados e apresentamos um design para implementar o algoritmo proposto, estendendo um sistema de armazenamento chave-valor geo-replicado de consistência fraca com replicação parcial, adicionando suporte para executar transações envolvendo dados geo-particionados. Nesse contexto, também abordamos o problema de decidir a melhor estratégia para procurar dados em réplicas que guardam apenas uma parte total dos dados de um serviço e onde o estado de cada réplica pode divergir. Avaliamos a nossa solução utilizando microbenchmarks baseados na base de dados TPC-H. Os nossos resultados mostram que a carga adicional do sistema é baixa para o cenário esperado de uma baixa percentagem de transações remotas

Repositório da Universidade Nova de Lisboa

Recommended from our members

From Controlled Data-Center Environments to Open Distributed Environments: Scalable, Efficient, and Robust Systems with Extended Functionality

Author: Zakhary Victor
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The past two decades have witnessed several paradigm shifts in computing environments. Starting from cloud computing which offers on-demand allocation of storage, network, compute, and memory resources, as well as other services, in a pay-as-you-go billingmodel. Ending with the rise of permissionless blockchain technology, a decentralized computing paradigm with lower trust assumptions and limitless number of participants. Unlike in the cloud, where all the computing resources are owned by some trusted cloud provider, permissionless blockchains allow computing resources owned by possibly malicious parties to join and leave their network without obtaining permission from some centralized trusted authority. Still, in the presence of malicious parties, permissionlessblockchain networks can perform general computations and make progress. Cloud computing is powered by geographically distributed data-centers controlled and managed by trusted cloud service providers and promises theoretically infinite computing resources. On the other hand, permissionless blockchains are powered by open networks of geographically distributed computing nodes owned by entities that are not necessarily known or trusted. This paradigm shift requires a reconsideration of distributed data management protocols and distributed system designs that assume low latency across system components, inelastic computing resources, or fully trusted computing resources.In this dissertation, we propose new system designs and optimizations that address scalability and efficiency of distributed data management systems in cloud environments. We also propose several protocols and new programming paradigms to extend the functionality and enhance the robustness of permissionless blockchains. The work presented spans global-scale transaction processing, large-scale stream processing, atomic transaction processing across permissionless blockchains, and extending the functionality and the use-cases of permissionless blockchains. In all these directions, the focus is on rethinking system and protocol designs to account for novel cloud and permissionless blockchain assumptions. For global-scale transaction processing, we propose GPlacer, a placement optimization framework that decides replica placement of fully and partial geo-replicated databases. For large-scale stream processing, we propose Cache-on-Track (CoT) an adaptive and elastic client-side cache that addresses server-side load-imbalances that occur in large-scale distributed storage layers. In permissionless blockchain transaction processing, we propose AC3WN, the first correct cross-chain commitment protocol that guarantees atomicity of cross-chain transactions. Also, we propose TXSC, a transactional smart contract programming framework. TXSC provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions. In addition, we propose a forward-looking architecture that unifies both permissioned and permissionless blockchains and exploits the running infrastructure of permissionless blockchains to build global asset management systems

eScholarship - University of California

Robust data storage in a network of computer systems

Author: Anyanwu Josephine Ada
Publication venue: Newcastle University
Publication date: 01/01/1984
Field of study

PhD ThesisRobustness of data in this thesis is taken to mean reliable storage of data and also high availability of data .objects in spite of the occurrence of faults. Algorithms and data structures which can be used to provide such robustness in the presence of various disk, processor and communication network failures are described. Reliable storage of data at individual nodes in a network of computer systems is based on the use of a stable storage mechanism combined with strategies which are used to help ensure crash resis- tance of file operations in spite of the use of buffering mechan- isms by operating systems. High availability of data in the net- work is maintained by replicating data on different computers and mutual consistency between replicas is ensured in spite of network partitioning. A stable storage system which provides atomicity for more complex data structures instead of the usual fixed size page has been designed and implemented and its performance evaluated. A crash resistant file system has also been implemented and evaluated. Many of the techniques presented here are used in the design of what we call CRES (Crash-resistant, Replicated and Stable) storage. CRES storage provides fault tolerance facilities for various disk and processor faults. It also provides fault tolerance facilities for network partitioning through the provision of an algorithm for the update and merge of a partitioned data storage system

Newcastle University eTheses