10 research outputs found

    Ensuring Serializable Executions with Snapshot Isolation DBMS

    Get PDF
    Snapshot Isolation (SI) is a multiversion concurrency control that has been implemented by open source and commercial database systems such as PostgreSQL and Oracle. The main feature of SI is that a read operation does not block a write operation and vice versa, which allows higher degree of concurrency than traditional two-phase locking. SI prevents many anomalies that appear in other isolation levels, but it still can result in non-serializable execution, in which database integrity constraints can be violated. Several techniques have been proposed to ensure serializable execution with engines running SI; these techniques are based on modifying the applications by introducing conflicting SQL statements. However, with each of these techniques the DBA has to make a difficult choice among possible transactions to modify. This thesis helps the DBA’s to choose between these different techniques and choices by understanding how the choices affect system performance. It also proposes a novel technique called ’External Lock Manager’ (ELM) which introduces conflicts in a separate lock-manager object so that every execution will be serializable. We build a prototype system for ELM and we run experiments to demonstrate the robustness of the new technique compare to the previous techniques. Experiments show that modifying the application code for some transactions has a high impact on performance for some choices, which makes it very hard for DBA’s to choose wisely. However, ELM has peak performance which is similar to SI, no matter which transactions are chosen for modification. Thus we say that ELM is a robust technique for ensure serializable execution

    Ensuring Serializable Executions with Snapshot Isolation DBMS

    Get PDF
    Snapshot Isolation (SI) is a multiversion concurrency control that has been implemented by open source and commercial database systems such as PostgreSQL and Oracle. The main feature of SI is that a read operation does not block a write operation and vice versa, which allows higher degree of concurrency than traditional two-phase locking. SI prevents many anomalies that appear in other isolation levels, but it still can result in non-serializable execution, in which database integrity constraints can be violated. Several techniques have been proposed to ensure serializable execution with engines running SI; these techniques are based on modifying the applications by introducing conflicting SQL statements. However, with each of these techniques the DBA has to make a difficult choice among possible transactions to modify. This thesis helps the DBA’s to choose between these different techniques and choices by understanding how the choices affect system performance. It also proposes a novel technique called ’External Lock Manager’ (ELM) which introduces conflicts in a separate lock-manager object so that every execution will be serializable. We build a prototype system for ELM and we run experiments to demonstrate the robustness of the new technique compare to the previous techniques. Experiments show that modifying the application code for some transactions has a high impact on performance for some choices, which makes it very hard for DBA’s to choose wisely. However, ELM has peak performance which is similar to SI, no matter which transactions are chosen for modification. Thus we say that ELM is a robust technique for ensure serializable execution

    Integrated approach to recovery and high availability in an updatable, distributed data warehouse

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 99-105).Any highly available data warehouse will use some form of data replication to ensure that it can continue to service queries despite machine failures. In this thesis, I demonstrate that it is possible to leverage the data replication available in these environments to build a simple yet efficient crash recovery mechanism that revives a crashed site by querying remote replicas for missing updates. My new integrated approach to recovery and high availability, called HARBOR (High Availability and Replication-Based Online Recovery), targets updatable data warehouses and offers an attractive alternative to the widely used log-based crash recovery algorithms found in existing database systems. Aside from its simplicity over log-based approaches, HARBOR also avoids the runtime overhead of maintaining an on-disk log, accomplishes recovery without quiescing the system, allows replicated data to be stored in non-identical formats, and supports the parallel recovery of multiple sites and database objects. To evaluate HARBOR's feasibility, I compare HARBOR's runtime overhead and recovery performance with those of two-phase commit and ARIES, the gold standard for log-based recovery, on a four-node distributed database system that I have implemented.(cont.) My experiments show that HARBOR incurs lower runtime overhead because it does not require log writes to be forced to disk during transaction commit. Furthermore, they indicate that HARBOR's recovery performance is comparable to ARIES's performance on many workloads and even surpasses it on characteristic warehouse workloads with few updates to historical data. The results are highly encouraging and suggest that my integrated approach is quite tenable.by Edmond Lau.M.Eng

    Serializable Isolation for Snapshot Databases

    Get PDF
    Many popular database management systems implement a multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking. There are well-known anomalies permitted by snapshot isolation that can lead to violations of data consistency by interleaving transactions that would maintain consistency if run serially. Until now, the only way to prevent these anomalies was to modify the applications by introducing explicit locking or artificial update conflicts, following careful analysis of conflicts between all pairs of transactions. This thesis describes a modification to the concurrency control algorithm of a database management system that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary applications, thus providing serializable isolation. The new algorithm preserves the properties that make snapshot isolation attractive, including that readers do not block writers and vice versa. An implementation of the algorithm in a relational database management system is described, along with a benchmark and performance study, showing that the throughput approaches that of snapshot isolation in most cases

    Consistency Models in Distributed Systems with Physical Clocks

    Get PDF
    Most existing distributed systems use logical clocks to order events in the implementation of various consistency models. Although logical clocks are straightforward to implement and maintain, they may affect the scalability, availability, and latency of the system when being used to totally order events in strong consistency models. They can also incur considerable overhead when being used to track and check the causal relationships among events in some weak consistency models. In this thesis we explore how to efficiently implement different consistency models using loosely synchronized physical clocks. Compared with logical clocks, physical clocks move forward at approximately the same speed and can be loosely synchronized with well-known standard protocols. Hence a group of physical clocks located at different servers can be used to order events in a distributed system at very low cost. We first describe Clock-SI, a fully distributed implementation of snapshot isolation for partitioned data stores. It uses the local physical clock at each partition to assign snapshot and commit timestamps to transactions. By avoiding a centralized service for timestamp management, Clock-SI improves the throughput, latency, and availability of the system. We then introduce Clock-RSM, which is a low-latency state machine replication protocol that provides linearizability. It totally orders state machine commands by assigning them physical timestamps obtained from the local replica. By eliminating the message step for command ordering in existing solutions, Clock-RSM reduces the latency of consistent geo-replication across multiple data centers. Finally, we present Orbe, which provides an efficient and scalable implementation of causal consistency for both partitioned and replicated data stores. Orbe builds an explicit total order, consistent with causality, among all operations using physical timestamps. It reduces the number of dependencies that have to be carried in update replication messages and checked on installation of replicated updates. As a result, Orbe improves the throughput of the system

    Detecting and Tolerating Byzantine Faults in Database Systems

    Get PDF
    This thesis describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares answers from queries and updates on multiple replicas which are off-the-shelf database systems, to provide a single database that is Byzantine fault tolerant. The scheme works when the replicas are homogeneous, but it also allows heterogeneous replication in which replicas come from different vendors. Heterogeneous replicas reduce the impact of bugs and security compromises because they are implemented independently and are thus less likely to suffer correlated failures. A final component of the scheme is a repair mechanism that can correct the state of a faulty replica, ensuring the longevity of the scheme.The main challenge in designing a replication scheme for transaction processingsystems is ensuring that the replicas state does not diverge while allowing a high degree of concurrency. We have developed two novel concurrency control protocols, commit barrier scheduling (CBS) and snapshot epoch scheduling (SES) that provide strong consistency and good performance. The two protocols provide different types of consistency: CBS provides single-copy serializability and SES provides single-copy snapshot isolation. We have implemented both protocols in the context of a replicated SQL database. Our implementation has been tested with production versions of several commercial and open source databases as replicas. Our experiments show a configuration that can tolerate one faulty replica has only a modest performance overhead (about 10-20% for the TPC-C benchmark). Our implementation successfully masks several Byzantine faults observed in practice and we have used it to find a new bug in MySQL

    Dynamic Content Web Applications: Crash, Failover, And Recovery Analysis

    No full text
    This work assesses how crashes and recoveries affect the performance of a replicated dynamic content web application. RobustStore is the result of retrofitting TPC-W's on-line bookstore with Treplica, a middleware for building dependable applications. Implementations of Paxos and Fast Paxos are at the core of Treplica's efficient and programmer-friendly support for replication and recovery. The TPC-W benchmark, augmented with faultloads and dependability measures, is used to evaluate the behaviour of RobustStore. Experiments apply faultloads that cause sequential and concurrent replica crashes. RobustStore's performance drops by less than 13% during the recovery from two simultaneous replica crashes. When subject to an identical faultload and a shopping workload, a five-replicas RobustStore maintains an accuracy of 99.999%. Our results display not only good performance, total autonomy and uninterrupted availability, they also show that it is simple to develop efficient recovery-oriented applications using Treplica. ©2009 IEEE.229238Amza, C., Cox, A.L., Zwaenepoel, W., Distributed versioning: Consistent replication for scaling back-end databases of dynamic content web sites (2003) MiddlewareBurrows, M., The Chubby lock service for loosely-coupled distributed systems (2006) 7th USENIX Symp. on Operating Systems Design and ImplementationCain, H.W., Rajwar, R., Marden, M., Lipasti, M.H., An architectural evaluation of Java TPC-W (2001) Proc. of 7th Int. Symp. on High-Performance Computer ArchitectureCamargos, L., Pedone, F., Weiloch, M., Sprint: A middleware for high-performance transaction processing (2007) Proc. of 2nd European Conf. on Computer SystemsChang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Gruber, R.E., Bigtable: A distributed storage system for structured data (2008) ACM Trans. Comput. Syst, 26 (2), pp. 1-26T. P. Council. TPC-W Specification, Feb. 2002DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vogels, W., Dynamo: Amazon's highly available key-value store (2007) Proc. of 21st ACM SIGOPS Symp. on Operating Systems Principles, pp. 205-220X. De7acute;fago, A. Schiper, and P. Urbán. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv., 36(4):372-421, 2004Durães, J., Vieira, M., Madeira, H., Dependability bench-marking of web-servers (2004) Proc. of 23rd Computer Safety, Reliability, and Security Int. Conf, pp. 297-310Elnikety, S., Dropsho, S., Pedone, F., Tashkent: Uniting durability with transaction ordering for high-performance scalable database replication (2006) Proc. of 1st European Conference on Computer Systems (EuroSys)Elnikety, S., Dropsho, S., Zwaenepoel, W., Tashkent+: Memory-aware load balancing and update filtering in replicated databases (2007) Proc. of the 2nd European Conference on Computer Systems (EuroSys)Isard, M., Autopilot: Automatic data center management (2007) Oper. Syst. Rev, 41, pp. 60-67Jiang, Y., Xue, G., You, J., Toward fault-tolerant atomic data access in mutable distributed hash tables (2006) Proc. of First Int. Multi-Symp. on Computer and Computational SciencesLamport, L., Time, Clocks, and the Ordering of Events in a Distributed System (1978) CACM, 21 (7), pp. 558-565Lamport, L., The part-time parliament (1998) ACM Trans. Comput. Syst, 16 (2), pp. 133-169Lamport, L., Paxos, F., (2006) Distrib. Comput, 19 (2), pp. 79-103. , OctLiang, W., Kemme, B., Online recovery in cluster databases (2008) EDBT '08: Proceedings of the 11th international conference on Extending database technology, pp. 121-132. , New York, NY, USA, ACMMacCormick, J., Murphy, N., Najork, M., Thekkath, C.A., Zhou, L., Boxwood: Abstractions as the foundation for storage infrastructure (2004) Proc. of 6th USENIX Symp. on Operating Systems Design and ImplementationManassiev, K., Amza, C., Scaling and continuous availability in database server clusters through multiversion replication (2007) Int. Conf. on Dependable Systems and NetworksOstell, J., Databases of discovery (2005) Queue, 3 (3), pp. 40-48Saito, Y., Frolund, S., Veitch, A., Merchant, A., Spence, S., Fab: Building distributed enterprise disk arrays from commodity components (2004) SIGPLAN Not, 39 (11). , 48-58Vieira, G.M.D., Buzato, L.E., Treplica: Ubiquitous replication (2008) Proc. of 26th Brazilian Symp. on Computer Networks and Distributed SystemsWu, S., Kemme, B., Postgres-R (SI): Combining Replica Control with Concurrency Control Based on Snapshot Isolation (2005) Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, pp. 422-43
    corecore