10,608 research outputs found
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments
Data centres that use consumer-grade disks drives and distributed
peer-to-peer systems are unreliable environments to archive data without enough
redundancy. Most redundancy schemes are not completely effective for providing
high availability, durability and integrity in the long-term. We propose alpha
entanglement codes, a mechanism that creates a virtual layer of highly
interconnected storage devices to propagate redundant information across a
large scale storage system. Our motivation is to design flexible and practical
erasure codes with high fault-tolerance to improve data durability and
availability even in catastrophic scenarios. By flexible and practical, we mean
code settings that can be adapted to future requirements and practical
implementations with reasonable trade-offs between security, resource usage and
performance. The codes have three parameters. Alpha increases storage overhead
linearly but increases the possible paths to recover data exponentially. Two
other parameters increase fault-tolerance even further without the need of
additional storage. As a result, an entangled storage system can provide high
availability, durability and offer additional integrity: it is more difficult
to modify data undetectably. We evaluate how several redundancy schemes perform
in unreliable environments and show that alpha entanglement codes are flexible
and practical codes. Remarkably, they excel at code locality, hence, they
reduce repair costs and become less dependent on storage locations with poor
availability. Our solution outperforms Reed-Solomon codes in many disaster
recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially
supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018
48th Annual IEEE/IFIP International Conference on Dependable Systems and
Networks (DSN
Instant restore after a media failure
Media failures usually leave database systems unavailable for several hours
until recovery is complete, especially in applications with large devices and
high transaction volume. Previous work introduced a technique called
single-pass restore, which increases restore bandwidth and thus substantially
decreases time to repair. Instant restore goes further as it permits read/write
access to any data on a device undergoing restore--even data not yet
restored--by restoring individual data segments on demand. Thus, the restore
process is guided primarily by the needs of applications, and the observed mean
time to repair is effectively reduced from several hours to a few seconds.
This paper presents an implementation and evaluation of instant restore. The
technique is incrementally implemented on a system starting with the
traditional ARIES design for logging and recovery. Experiments show that the
transaction latency perceived after a media failure can be cut down to less
than a second and that the overhead imposed by the technique on normal
processing is minimal. The net effect is that a few "nines" of availability are
added to the system using simple and low-overhead software techniques
Building and Protecting vSphere Data Centers Using Site Recovery Manager (SRM)
With the evolution of cloud computing technology, companies like Amazon, Microsoft, Google, Softlayer, and Rackspace have started providing Infrastructure as a Service, Software as a Service, and Platform as a Service offering to their customers. For these companies, providing a high degree of availability is as important as providing an overall great hosting service. Disaster is always being unpredictable, the destruction caused by it is always worse than expected. Sometimes it ends up with the loose of information, data and records. Disaster can also make services inaccessible for very long time if disaster recovery was not planned properly. This paper focuses on protecting a vSphere virtual datacenter using Site Recovery Manager (SRM). A study says 23% of companies close within one year after the disaster struck. This paper also discusses on how SRM can be a cost effective disaster recovery solution compared to all the recovery solutions available. It will also cover Recovery Point Objective and Recovery Time Objective. The SRM works on two different replication methodologies that is vSphere replication and Array based replications. These technologies used by Site Recovery Manager to protect Tier-1, 2, and 3 applications. The recent study explains that Traditional DR solutions often fail to meet business requirements because they are too expensive, complex and unreliable. Organizations using Site Recovery Manager ensure highly predictable RTOs at a much lower cost and level of complexity. Lower cost for DR. Site Recovery Manager can reduce the operating overhead by 50% by replacing complex manual run books with simple, automated recovery plans that can be tested without disruption. For organizations with an RPO of 15 minutes or higher, vSphere Replication can eliminate up to 1,100 per protected virtual machine per year. These calculations were validated by a third-party global research firm. Integration with Virtual SAN reduces the DR footprint through hyper-converged, software-defined storage that runs on any standard x86 platform. Virtual SAN can decrease the total cost of ownership for recovery storage by 50 percent
- …