10,608 research outputs found

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

    Full text link
    Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

    Instant restore after a media failure

    Full text link
    Media failures usually leave database systems unavailable for several hours until recovery is complete, especially in applications with large devices and high transaction volume. Previous work introduced a technique called single-pass restore, which increases restore bandwidth and thus substantially decreases time to repair. Instant restore goes further as it permits read/write access to any data on a device undergoing restore--even data not yet restored--by restoring individual data segments on demand. Thus, the restore process is guided primarily by the needs of applications, and the observed mean time to repair is effectively reduced from several hours to a few seconds. This paper presents an implementation and evaluation of instant restore. The technique is incrementally implemented on a system starting with the traditional ARIES design for logging and recovery. Experiments show that the transaction latency perceived after a media failure can be cut down to less than a second and that the overhead imposed by the technique on normal processing is minimal. The net effect is that a few "nines" of availability are added to the system using simple and low-overhead software techniques

    Building and Protecting vSphere Data Centers Using Site Recovery Manager (SRM)

    Get PDF
    With the evolution of cloud computing technology, companies like Amazon, Microsoft, Google, Softlayer, and Rackspace have started providing Infrastructure as a Service, Software as a Service, and Platform as a Service offering to their customers. For these companies, providing a high degree of availability is as important as providing an overall great hosting service. Disaster is always being unpredictable, the destruction caused by it is always worse than expected. Sometimes it ends up with the loose of information, data and records. Disaster can also make services inaccessible for very long time if disaster recovery was not planned properly. This paper focuses on protecting a vSphere virtual datacenter using Site Recovery Manager (SRM). A study says 23% of companies close within one year after the disaster struck. This paper also discusses on how SRM can be a cost effective disaster recovery solution compared to all the recovery solutions available. It will also cover Recovery Point Objective and Recovery Time Objective. The SRM works on two different replication methodologies that is vSphere replication and Array based replications. These technologies used by Site Recovery Manager to protect Tier-1, 2, and 3 applications. The recent study explains that Traditional DR solutions often fail to meet business requirements because they are too expensive, complex and unreliable. Organizations using Site Recovery Manager ensure highly predictable RTOs at a much lower cost and level of complexity. Lower cost for DR. Site Recovery Manager can reduce the operating overhead by 50% by replacing complex manual run books with simple, automated recovery plans that can be tested without disruption. For organizations with an RPO of 15 minutes or higher, vSphere Replication can eliminate up to 10,000perTBofprotecteddatawithstoragebasedtechnologies.ThecombinedsolutioncansaveoverUSD10,000 per TB of protected data with storage-based technologies. The combined solution can save over USD 1,100 per protected virtual machine per year. These calculations were validated by a third-party global research firm. Integration with Virtual SAN reduces the DR footprint through hyper-converged, software-defined storage that runs on any standard x86 platform. Virtual SAN can decrease the total cost of ownership for recovery storage by 50 percent
    corecore