329 research outputs found
Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments
Data centres that use consumer-grade disks drives and distributed
peer-to-peer systems are unreliable environments to archive data without enough
redundancy. Most redundancy schemes are not completely effective for providing
high availability, durability and integrity in the long-term. We propose alpha
entanglement codes, a mechanism that creates a virtual layer of highly
interconnected storage devices to propagate redundant information across a
large scale storage system. Our motivation is to design flexible and practical
erasure codes with high fault-tolerance to improve data durability and
availability even in catastrophic scenarios. By flexible and practical, we mean
code settings that can be adapted to future requirements and practical
implementations with reasonable trade-offs between security, resource usage and
performance. The codes have three parameters. Alpha increases storage overhead
linearly but increases the possible paths to recover data exponentially. Two
other parameters increase fault-tolerance even further without the need of
additional storage. As a result, an entangled storage system can provide high
availability, durability and offer additional integrity: it is more difficult
to modify data undetectably. We evaluate how several redundancy schemes perform
in unreliable environments and show that alpha entanglement codes are flexible
and practical codes. Remarkably, they excel at code locality, hence, they
reduce repair costs and become less dependent on storage locations with poor
availability. Our solution outperforms Reed-Solomon codes in many disaster
recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially
supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018
48th Annual IEEE/IFIP International Conference on Dependable Systems and
Networks (DSN
Exploring heterogeneity of unreliable machines for p2p backup
P2P architecture is a viable option for enterprise backup. In contrast to
dedicated backup servers, nowadays a standard solution, making backups directly
on organization's workstations should be cheaper (as existing hardware is
used), more efficient (as there is no single bottleneck server) and more
reliable (as the machines are geographically dispersed).
We present the architecture of a p2p backup system that uses pairwise
replication contracts between a data owner and a replicator. In contrast to
standard p2p storage systems using directly a DHT, the contracts allow our
system to optimize replicas' placement depending on a specific optimization
strategy, and so to take advantage of the heterogeneity of the machines and the
network. Such optimization is particularly appealing in the context of backup:
replicas can be geographically dispersed, the load sent over the network can be
minimized, or the optimization goal can be to minimize the backup/restore time.
However, managing the contracts, keeping them consistent and adjusting them in
response to dynamically changing environment is challenging.
We built a scientific prototype and ran the experiments on 150 workstations
in the university's computer laboratories and, separately, on 50 PlanetLab
nodes. We found out that the main factor affecting the quality of the system is
the availability of the machines. Yet, our main conclusion is that it is
possible to build an efficient and reliable backup system on highly unreliable
machines (our computers had just 13% average availability)
In-Network Redundancy Generation for Opportunistic Speedup of Backup
Erasure coding is a storage-efficient alternative to replication for
achieving reliable data backup in distributed storage systems. During the
storage process, traditional erasure codes require a unique source node to
create and upload all the redundant data to the different storage nodes.
However, such a source node may have limited communication and computation
capabilities, which constrain the storage process throughput. Moreover, the
source node and the different storage nodes might not be able to send and
receive data simultaneously -- e.g., nodes might be busy in a datacenter
setting, or simply be offline in a peer-to-peer setting -- which can further
threaten the efficacy of the overall storage process. In this paper we propose
an "in-network" redundancy generation process which distributes the data
insertion load among the source and storage nodes by allowing the storage nodes
to generate new redundant data by exchanging partial information among
themselves, improving the throughput of the storage process. The process is
carried out asynchronously, utilizing spare bandwidth and computing resources
from the storage nodes. The proposed approach leverages on the local
repairability property of newly proposed erasure codes tailor made for the
needs of distributed storage systems. We analytically show that the performance
of this technique relies on an efficient usage of the spare node resources, and
we derive a set of scheduling algorithms to maximize the same. We
experimentally show, using availability traces from real peer-to-peer
applications as well as Google data center availability and workload traces,
that our algorithms can, depending on the environment characteristics, increase
the throughput of the storage process significantly (up to 90% in data centers,
and 60% in peer-to-peer settings) with respect to the classical naive data
insertion approach
Fast Lean Erasure-Coded Atomic Memory Object
In this work, we propose FLECKS, an algorithm which implements atomic memory objects in a multi-writer multi-reader (MWMR) setting in asynchronous networks and server failures. FLECKS substantially reduces storage and communication costs over its replication-based counterparts by employing erasure-codes. FLECKS outperforms the previously proposed algorithms in terms of the metrics that to deliver good performance such as storage cost per object, communication cost a high fault-tolerance of clients and servers, guaranteed liveness of operation, and a given number of communication rounds per operation, etc. We provide proofs for liveness and atomicity properties of FLECKS and derive worst-case latency bounds for the operations. We implemented and deployed FLECKS in cloud-based clusters and demonstrate that FLECKS has substantially lower storage and bandwidth costs, and significantly lower latency of operations than the replication-based mechanisms
- …