14 research outputs found

    Fast Algorithms for Maintaining Replica Consistency in Lazy Master Replicated Databases

    Get PDF
    Projet RODIN, Projet REFLECSIn a lazy master replicated database, a transaction can commit after updating one replica copy (primary copy) at some master node. After the transaction commits, the updates are propagated towards the other replicas (secondary copies), which are updated in separate refresh transactions. A central problem is the design of algorithms that maintain replica's consistency while at the same time minimizing the performance degradation due to the synchronization of refresh transactions. In this paper, we propose a simple and general refreshment algorithm that solves this problem and we prove its correctness. The principle of the algorithm is to let refresh transactions wait for a certain «deliver time» before being executed at a node having secondary copies. We then present two main optimizations to this algorithm. One is based on specific properties of the topology of replica distribution across nodes. In particular, we characterize the nodes for which the deliver time can be null. The other improves the refreshment algorithm by using an immediate update propagation strategy. Our performance evaluation demonstrate the effectiveness of this optimization

    Updates in Highly Unreliable, Replicated Peer-to-Peer Systems

    Get PDF
    In this paper we study the problem of updates in truly decentralised and self-organising systems such as pure P2P systems. We assume low online probabilities of the peers (Full Document</a

    RAIDb: Redundant Array of Inexpensive Databases

    Get PDF
    Clusters of workstations become more and more popular to power data server applications such as large scale Web sites or e-Commerce applications. There has been much research on scaling the front tiers (web servers and application servers) using clusters, but databases usually remain on large dedicated SMP machines. In this paper, we address database performance scalability and high availability using clusters of commodity hardware. Our approach consists of studying different replication and partitioning strategies to achieve various degree of performance and fault tolerance. We propose the concept of Redundant Array of Inexpensive Databases (RAIDb). RAIDb is to databases what RAID is to disks. RAIDb aims at providing better performance and fault tolerance than a single database, at low cost, by combining multiple database instances into an array of databases. Like RAID, we define different RAIDb levels that provide various cost/performance/fault tolerance tradeoffs. RAIDb-0 features full partitioning, RAIDb-1 offers full replication and RAIDb-2 introduces an intermediate solution called partial replication, in which the user can define the degree of replication of each database table. We present a Java implementation of RAIDb called Clustered JDBC or C-JDBC. C-JDBC achieves both database performance scalability and high availability at the middleware level without changing existing applications. We show, using the TPC-W benchmark, that RAIDb-2 can offer better performance scalability (up to 25%) than traditional approaches by allowing fine-grain control on replication. Distributing and restricting the replication of frequently written tables to a small set of backends reduces I/O usage and improves CPU utilization of each cluster node

    Elasca: Workload-Aware Elastic Scalability for Partition Based Database Systems

    Get PDF
    Providing the ability to increase or decrease allocated resources on demand as the transactional load varies is essential for database management systems (DBMS) deployed on today's computing platforms, such as the cloud. The need to maintain consistency of the database, at very large scales, while providing high performance and reliability makes elasticity particularly challenging. In this thesis, we exploit data partitioning as a way to provide elastic DBMS scalability. We assert that the flexibility provided by a partitioned, shared-nothing parallel DBMS can be used to implement elasticity. Our idea is to start with a small number of servers that manage all the partitions, and to elastically scale out by dynamically adding new servers and redistributing database partitions among these servers as the load varies. Implementing this approach requires (a) efficient mechanisms for addition/removal of servers and migration of partitions, and (b) policies to efficiently determine the optimal placement of partitions on the given servers as well as plans for partition migration. This thesis presents Elasca, a system that implements both these features in an existing shared-nothing DBMS (namely VoltDB) to provide automatic elastic scalability. Elasca consists of a mechanism for enabling elastic scalability, and a workload-aware optimizer for determining optimal partition placement and migration plans. Our optimizer minimizes computing resources required and balances load effectively without compromising system performance, even in the presence of variations in intensity and skew of the load. The results of our experiments show that Elasca is able to achieve performance close to a fully provisioned system while saving 35% resources on average. Furthermore, Elasca's workload-aware optimizer performs up to 79% less data movement than a greedy approach to resource minimization, and also balance load much more effectively

    EdgeX: Edge Replication for Web Applications

    Get PDF
    Global web applications face the problem of high network latency due to their need to communicate with distant data centers. Many applications use edge networks for caching images, CSS, javascript, and other static content in order to avoid some of this network latency. However, for updates and for anything other than static content, communication with the data center is still required, and can dominate application request latencies. One way to address this problem is to push more of the web application, as well the database on which it depends, from the remote data center towards the edge of the network. This thesis presents preliminary work in this direction. Speci cally, it presents an edge-aware dynamic data replication architecture for relational database systems supporting web applications. The objective is to allow dynamic content to be served from the edge of the network, with low latency

    Workload Interleaving with Performance Guarantees in Data Centers

    Get PDF
    In the era of global, large scale data centers residing in clouds, many applications and users share the same pool of resources for the purposes of reducing energy and operating costs, and of improving availability and reliability. Along with the above benefits, resource sharing also introduces performance challenges: when multiple workloads access the same resources concurrently, contention may occur and introduce delays in the performance of individual workloads. Providing performance isolation to individual workloads needs effective management methodologies. The challenges of deriving effective management methodologies lie in finding accurate, robust, compact metrics and models to drive algorithms that can meet different performance objectives while achieving efficient utilization of resources. This dissertation proposes a set of methodologies aiming at solving the challenging performance isolation problem in workload interleaving in data centers, focusing on both storage components and computing components. at the storage node level, we focus on methodologies for better interleaving user traffic with background workloads, such as tasks for improving reliability, availability, and power savings. More specifically, a scheduling policy for background workload based on the statistical characteristics of the system busy periods and a methodology that quantitatively estimates the performance impact of power savings are developed. at the storage cluster level, we consider methodologies on how to efficiently conduct work consolidation and schedule asynchronous updates without violating user performance targets. More specifically, we develop a framework that can estimate beforehand the benefits and overheads of each option in order to automate the process of reaching intelligent consolidation decisions while achieving faster eventual consistency. at the computing node level, we focus on improving workload interleaving at off-the-shelf servers as they are the basic building blocks of large-scale data centers. We develop priority scheduling middleware that employs different policies to schedule background tasks based on the instantaneous resource requirements of the high priority applications running on the server node. Finally, at the computing cluster level, we investigate popular computing frameworks for large-scale data intensive distributed processing, such as MapReduce and its Hadoop implementation. We develop a new Hadoop scheduler called DyScale to exploit capabilities offered by heterogeneous cores in order to achieve a variety of performance objectives

    Fast algorithms for maintaining replica consistency in lazy master replicated databases

    No full text
    Themes 3 et 1 - Interaction homme-machine, images, donnees, connaissances - Reseaux et systemes. Projets RODIN et REFLECSSIGLEAvailable from INIST (FR), Document Supply Service, under shelf-number : 14802 E, issue : a.1999 n.3654 / INIST-CNRS - Institut de l'Information Scientifique et TechniqueFRFranc
    corecore