96 research outputs found
Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments
Data centres that use consumer-grade disks drives and distributed
peer-to-peer systems are unreliable environments to archive data without enough
redundancy. Most redundancy schemes are not completely effective for providing
high availability, durability and integrity in the long-term. We propose alpha
entanglement codes, a mechanism that creates a virtual layer of highly
interconnected storage devices to propagate redundant information across a
large scale storage system. Our motivation is to design flexible and practical
erasure codes with high fault-tolerance to improve data durability and
availability even in catastrophic scenarios. By flexible and practical, we mean
code settings that can be adapted to future requirements and practical
implementations with reasonable trade-offs between security, resource usage and
performance. The codes have three parameters. Alpha increases storage overhead
linearly but increases the possible paths to recover data exponentially. Two
other parameters increase fault-tolerance even further without the need of
additional storage. As a result, an entangled storage system can provide high
availability, durability and offer additional integrity: it is more difficult
to modify data undetectably. We evaluate how several redundancy schemes perform
in unreliable environments and show that alpha entanglement codes are flexible
and practical codes. Remarkably, they excel at code locality, hence, they
reduce repair costs and become less dependent on storage locations with poor
availability. Our solution outperforms Reed-Solomon codes in many disaster
recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially
supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018
48th Annual IEEE/IFIP International Conference on Dependable Systems and
Networks (DSN
Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance
This initial version of this document was written back in 2014 for the sole
purpose of providing fundamentals of reliability theory as well as to identify
the theoretical types of machinery for the prediction of
durability/availability of erasure-coded storage systems. Since the definition
of a "system" is too broad, we specifically focus on warm/cold storage systems
where the data is stored in a distributed fashion across different storage
units with or without continuous operation. The contents of this document are
dedicated to a review of fundamentals, a few major improved stochastic models,
and several contributions of my work relevant to the field. One of the
contributions of this document is the introduction of the most general form of
Markov models for the estimation of mean time to failure. This work was
partially later published in IEEE Transactions on Reliability. Very good
approximations for the closed-form solutions for this general model are also
investigated. Various storage configurations under different policies are
compared using such advanced models. Later in a subsequent chapter, we have
also considered multi-dimensional Markov models to address detached
drive-medium combinations such as those found in optical disk and tape storage
systems. It is not hard to anticipate such a system structure would most likely
be part of future DNA storage libraries. This work is partially published in
Elsevier Reliability and System Safety. Topics that include simulation
modelings for more accurate estimations are included towards the end of the
document by noting the deficiencies of the simplified canonical as well as more
complex Markov models, due mainly to the stationary and static nature of
Markovinity. Throughout the document, we shall focus on concurrently maintained
systems although the discussions will only slightly change for the systems
repaired one device at a time.Comment: 58 pages, 20 figures, 9 tables. arXiv admin note: substantial text
overlap with arXiv:1911.0032
Improve the Performance and Scalability of RAID-6 Systems Using Erasure Codes
RAID-6 is widely used to tolerate concurrent failures of any two disks to provide a higher level of reliability with the support of erasure codes. Among many implementations, one class of codes called Maximum Distance Separable (MDS) codes aims to offer data protection against disk failures with optimal storage efficiency. Typical MDS codes contain horizontal and vertical codes. However, because of the limitation of horizontal parity or diagonal/anti-diagonal parities used in MDS codes, existing RAID-6 systems suffer several important problems on performance and scalability, such as low write performance, unbalanced I/O, and high migration cost in the scaling process. To address these problems, in this dissertation, we design techniques for high performance and scalable RAID-6 systems. It includes high performance and load balancing erasure codes (H-Code and HDP Code), and Stripe-based Data Migration (SDM) scheme. We also propose a flexible MDS Scaling Framework (MDS-Frame), which can integrate H-Code, HDP Code and SDM scheme together. Detailed evaluation results are also given in this dissertation
Community computation
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Materials Science and Engineering, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 171-186).In this thesis we lay the foundations for a distributed, community-based computing environment to tap the resources of a community to better perform some tasks, either computationally hard or economically prohibitive, or physically inconvenient, that one individual is unable to accomplish efficiently. We introduce community coding, where information systems meet social networks, to tackle some of the challenges in this new paradigm of community computation. We design algorithms, protocols and build system prototypes to demonstrate the power of community computation to better deal with reliability, scalability and security issues, which are the main challenges in many emerging community-computing environments, in several application scenarios such as community storage, community sensing and community security. For example, we develop a community storage system that is based upon a distributed P2P (peer-to-peer) storage paradigm, where we take an array of small, periodically accessible, individual computers/peer nodes and create a secure, reliable and large distributed storage system. The goal is for each one of them to act as if they have immediate access to a pool of information that is larger than they could hold themselves, and into which they can contribute new stuff in a both open and secure manner. Such a contributory and self-scaling community storage system is particularly useful where reliable infrastructure is not readily available in that such a system facilitates easy ad-hoc construction and easy portability. In another application scenario, we develop a novel framework of community sensing with a group of image sensors. The goal is to present a set of novel tools in which software, rather than humans, examines the collection of images sensed by a group of image sensors to determine what is happening in the field of view. We also present several design principles in the aspects of community security. In one application example, we present community-based email spain detection approach to deal with email spams more efficiently.by Fulu Li.Ph.D
On feedback-based rateless codes for data collection in vehicular networks
The ability to transfer data reliably and with low delay over an unreliable service is intrinsic to a number of emerging technologies, including digital video broadcasting, over-the-air software updates, public/private cloud storage, and, recently, wireless vehicular networks. In particular, modern vehicles incorporate tens of sensors to provide vital sensor information to electronic control units (ECUs). In the current architecture, vehicle sensors are connected to ECUs via physical wires, which increase the cost, weight and maintenance effort of the car, especially as the number of electronic components keeps increasing. To mitigate the issues with physical wires, wireless sensor networks (WSN) have been contemplated for replacing the current wires with wireless links, making modern cars cheaper, lighter, and more efficient. However, the ability to reliably communicate with the ECUs is complicated by the dynamic channel properties that the car experiences as it travels through areas with different radio interference patterns, such as urban versus highway driving, or even different road quality, which may physically perturb the wireless sensors.
This thesis develops a suite of reliable and efficient communication schemes built upon feedback-based rateless codes, and with a target application of vehicular networks. In particular, we first investigate the feasibility of multi-hop networking for intra-car WSN, and illustrate the potential gains of using the Collection Tree Protocol (CTP), the current state of the art in multi-hop data aggregation. Our results demonstrate, for example, that the packet delivery rate of a node using a single-hop topology protocol can be below 80% in practical scenarios, whereas CTP improves reliability performance beyond 95% across all nodes while simultaneously reducing radio energy consumption. Next, in order to migrate from a wired intra-car network to a wireless system, we consider an intermediate step to deploy a hybrid communication structure, wherein wired and wireless networks coexist. Towards this goal, we design a hybrid link scheduling algorithm that guarantees reliability and robustness under harsh vehicular environments. We further enhance the hybrid link scheduler with the rateless codes such that information leakage to an eavesdropper is almost zero for finite block lengths.
In addition to reliability, one key requirement for coded communication schemes is to achieve a fast decoding rate. This feature is vital in a wide spectrum of communication systems, including multimedia and streaming applications (possibly inside vehicles) with real-time playback requirements, and delay-sensitive services, where the receiver needs to recover some data symbols before the recovery of entire frame. To address this issue, we develop feedback-based rateless codes with dynamically-adjusted nonuniform symbol selection distributions. Our simulation results, backed by analysis, show that feedback information paired with a nonuniform distribution significantly improves the decoding rate compared with the state of the art algorithms. We further demonstrate that amount of feedback sent can be tuned to the specific transmission properties of a given feedback channel
On the use of erasure codes in unreliable data networks
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 62-64).Modern data networks are approaching the state where a large number of independent and heterogeneous paths are available between a source node and destination node. In this work, we explore the case where each path has an independent level of reliability characterized by a probability of path failure. Instead of simply repeating the message across all the paths, we use the path diversity to achieve reliable transmission of messages by using a coding technique known as an erasure correcting code. We develop a model of the network and present an analysis of the system that invokes the Central Limit Theorem to approximate the total number of bits received from all the paths. We then optimize the number of bits to send over each path in order to maximize the probability of receiving a sufficient number of bits at the destination to reconstruct the message using the erasure correcting code. Three cases are investigated: when the paths are very reliable, when the paths are very unreliable, and when the paths have a probability of failure within an interval surrounding 0.5. We present an overview of the mechanics of an erasure coding process applicable to packet-based transactions. Finally, as avenues for further research, we discuss several applications of erasure coding in networks that have only a single path between source and destination: for latency reduction in interactive web sessions; as a transport layer for critical messaging; and an application layer protocol for high-bandwidth networks.by Salil Parikh.S.M
- …