Search CORE

9,014 research outputs found

Instant restore after a media failure

Author: C Mohan
C Mohan
E Levy
G Graefe
G Graefe
J Gray
JN Gray
PM Chen
T Härder
Publication venue
Publication date: 26/02/2017
Field of study

Media failures usually leave database systems unavailable for several hours until recovery is complete, especially in applications with large devices and high transaction volume. Previous work introduced a technique called single-pass restore, which increases restore bandwidth and thus substantially decreases time to repair. Instant restore goes further as it permits read/write access to any data on a device undergoing restore--even data not yet restored--by restoring individual data segments on demand. Thus, the restore process is guided primarily by the needs of applications, and the observed mean time to repair is effectively reduced from several hours to a few seconds. This paper presents an implementation and evaluation of instant restore. The technique is incrementally implemented on a system starting with the traditional ARIES design for logging and recovery. Experiments show that the transaction latency perceived after a media failure can be cut down to less than a second and that the overhead imposed by the technique on normal processing is minimal. The net effect is that a few "nines" of availability are added to the system using simple and low-overhead software techniques

arXiv.org e-Print Archive

Crossref

CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores

Author: Arad Cosmin
Haridi Seif
Shafaat Tallat M.
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2012
Field of study

Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a distributed key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system shows that consistency can be achieved with practical performance and modest throughput overhead (5%) for read-intensive workloads

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

The End of a Myth: Distributed Transactions Can Scale

Author: Binnig Carsten
Harris Tim
Kraska Tim
Zamanian Erfan
Publication venue
Publication date: 21/11/2016
Field of study

The common wisdom is that distributed transactions do not scale. But what if distributed transactions could be made scalable using the next generation of networks and a redesign of distributed databases? There would be no need for developers anymore to worry about co-partitioning schemes to achieve decent performance. Application development would become easier as data placement would no longer determine how scalable an application is. Hardware provisioning would be simplified as the system administrator can expect a linear scale-out when adding more machines rather than some complex sub-linear function, which is highly application specific. In this paper, we present the design of our novel scalable database system NAM-DB and show that distributed transactions with the very common Snapshot Isolation guarantee can indeed scale using the next generation of RDMA-enabled network technology without any inherent bottlenecks. Our experiments with the TPC-C benchmark show that our system scales linearly to over 6.5 million new-order (14.5 million total) distributed transactions per second on 56 machines.Comment: 12 page

arXiv.org e-Print Archive

TUbiblio