Search CORE

9,120 research outputs found

Practical Functional Regenerating Codes for Broadcast Repair of Multiple Nodes

Author: Gunduz Deniz
Kralevska Katina
Ling Cong
Mital Nitish
Publication venue
Publication date: 15/04/2019
Field of study

A code construction and repair scheme for optimal functional regeneration of multiple node failures is presented, which is based on stitching together short MDS codes on carefully chosen sets of points lying on a linearized polynomial. The nodes are connected wirelessly, hence all transmissions by helper nodes during a repair round are available to all the nodes being repaired. The scheme is simple and practical because of low subpacketization, low I/O cost and low computational cost. Achievability of the minimum-bandwidth regenerating (MBR) point, as well as an interior point, on the optimal storage-repair bandwidth tradeoff curve is shown. The subspace properties derived in the paper provide insight into the general properties of functional regenerating codes.Comment: 5 pages, ISIT 201

arXiv.org e-Print Archive

Repair Strategies for Storage on Mobile Clouds

Author: Calis Gokhan
Koyluoglu O. Ozan
Lazos Loukas
Shivaramaiah Swetha
Publication venue
Publication date: 01/03/2017
Field of study

We study the data reliability problem for a community of devices forming a mobile cloud storage system. We consider the application of regenerating codes for file maintenance within a geographically-limited area. Such codes require lower bandwidth to regenerate lost data fragments compared to file replication or reconstruction. We investigate threshold-based repair strategies where data repair is initiated after a threshold number of data fragments have been lost due to node mobility. We show that at a low departure-to-repair rate regime, a lazy repair strategy in which repairs are initiated after several nodes have left the system outperforms eager repair in which repairs are initiated after a single departure. This optimality is reversed when nodes are highly mobile. We further compare distributed and centralized repair strategies and derive the optimal repair threshold for minimizing the average repair cost per unit of time, as a function of underlying code parameters. In addition, we examine cooperative repair strategies and show performance improvements compared to non-cooperative codes. We investigate several models for the time needed for node repair including a simple fixed time model that allows for the computation of closed-form expressions and a more realistic model that takes into account the number of repaired nodes. We derive the conditions under which the former model approximates the latter. Finally, an extended model where additional failures are allowed during the repair process is investigated. Overall, our results establish the joint effect of code design and repair algorithms on the maintenance cost of distributed storage systems.Comment: 23 pages, 11 figure

arXiv.org e-Print Archive

Decentralized Minimum-Cost Repair for Distributed Storage Systems

Author: Fischione Carlo
Gerami Majid
Skoglund Mikael
Xiao Ming
Publication venue
Publication date: 30/01/2013
Field of study

There have been emerging lots of applications for distributed storage systems e.g., those in wireless sensor networks or cloud storage. Since storage nodes in wireless sensor networks have limited battery, it is valuable to find a repair scheme with optimal transmission costs (e.g., energy). The optimal-cost repair has been recently investigated in a centralized way. However a centralized control mechanism may not be available or is very expensive. For the scenarios, it is interesting to study optimal-cost repair in a decentralized setup. We formulate the optimal-cost repair as convex optimization problems for the network with convex transmission costs. Then we use primal and dual decomposition approaches to decouple the problem into subproblems to be solved locally. Thus, each surviving node, collaborating with other nodes, can minimize its transmission cost such that the global cost is minimized. We further study the optimality and convergence of the algorithms. Finally, we discuss the code construction and determine the field size for finding feasible network codes in our approaches

arXiv.org e-Print Archive

Determinant Codes with Helper-Independent Repair for Single and Multiple Failures

Author: Elyasi Mehran
Mohajer Soheil
Publication venue
Publication date: 07/03/2019
Field of study

Determinant codes are a class of exact-repair regenerating codes for distributed storage systems with parameters (n, k = d, d). These codes cover the entire trade-off between per-node storage and repair-bandwidth. In an earlier work of the authors, the repair data of the determinant code sent by a helper node to repair a failed node depends on the identity of the other helper nodes participating in the process, which is practically undesired. In this work, a new repair mechanism is proposed for determinant codes, which relaxes this dependency, while preserving all other properties of the code. Moreover, it is shown that the determinant codes are capable of repairing multiple failures, with a per-node repair-bandwidth which scales sub-linearly with the number of failures

arXiv.org e-Print Archive

Repairing Multiple Failures for Scalar MDS Codes

Author: Bartan Burak
Mardia Jay
Wootters Mary
Publication venue
Publication date: 19/04/2018
Field of study

In distributed storage, erasure codes -- like Reed-Solomon Codes -- are often employed to provide reliability. In this setting, it is desirable to be able to repair one or more failed nodes while minimizing the repair bandwidth. In this work, motivated by Reed-Solomon codes, we study the problem of repairing multiple failed nodes in a scalar MDS code. We extend the framework of (Guruswami and Wootters, 2017) to give a framework for constructing repair schemes for multiple failures in general scalar MDS codes, in the centralized repair model. We then specialize our framework to Reed-Solomon codes, and extend and improve upon recent results of (Dau et al., 2017)

arXiv.org e-Print Archive

The Storage vs Repair Bandwidth Trade-off for Multiple Failures in Clustered Storage Networks

Author: Abdrashitov Vitaly
Médard Muriel
Prakash N.
Publication venue
Publication date: 17/08/2017
Field of study

We study the trade-off between storage overhead and inter-cluster repair bandwidth in clustered storage systems, while recovering from multiple node failures within a cluster. A cluster is a collection of

m

nodes, and there are

n

clusters. For data collection, we download the entire content from any

k

clusters. For repair of

t \geq 2

nodes within a cluster, we take help from

\ell

local nodes, as well as

d

helper clusters. We characterize the optimal trade-off under functional repair, and also under exact repair for the minimum storage and minimum inter-cluster bandwidth (MBR) operating points. Our bounds show the following interesting facts:

1)

When

t|(m-\ell)

the trade-off is the same as that under

t=1

, and thus there is no advantage in jointly repairing multiple nodes,

2)

When

t \nmid (m-\ell)

, the optimal file-size at the MBR point under exact repair can be strictly less than that under functional repair.

3)

Unlike the case of

t=1

, increasing the number of local helper nodes does not necessarily increase the system capacity under functional repair.Comment: Accepted to IEEE Information Theory Workshop(ITW) 201

arXiv.org e-Print Archive

Capacity of Wireless Distributed Storage Systems with Broadcast Repair

Author: Chan Terence H.
Hu Ping
Sung Chi Wan
Publication venue
Publication date: 06/07/2017
Field of study

In wireless distributed storage systems, storage nodes are connected by wireless channels, which are broadcast in nature. This paper exploits this unique feature to design an efficient repair mechanism, called broadcast repair, for wireless distributed storage systems in the presence of multiple-node failures. Due to the broadcast nature of wireless transmission, we advocate a new measure on repair performance called repair-transmission bandwidth. In contrast to repair bandwidth, which measures the average number of packets downloaded by a newcomer to replace a failed node, repair-transmission bandwidth measures the average number of packets transmitted by helper nodes per failed node. A fundamental study on the storage capacity of wireless distributed storage systems with broadcast repair is conducted by modeling the storage system as a multicast network and analyzing the minimum cut of the corresponding information flow graph. The fundamental tradeoff between storage efficiency and repair-transmission bandwidth is also obtained for functional repair. The performance of broadcast repair is compared both analytically and numerically with that of cooperative repair, the basic repair method for wired distributed storage systems with multiple-node failures. While cooperative repair is based on the idea of allowing newcomers to exchange packets, broadcast repair is based on the idea of allowing a helper to broadcast packets to all newcomers simultaneously. We show that broadcast repair outperforms cooperative repair, offering a better tradeoff between storage efficiency and repair-transmission bandwidth.Comment: 28 pages, 7 figure

arXiv.org e-Print Archive

Storage-Repair Bandwidth Trade-off for Wireless Caching with Partial Failure and Broadcast Repair

Author: Gunduz Deniz
Kralevska Katina
Ling Cong
Mital Nitish
Publication venue
Publication date: 30/06/2018
Field of study

Repair of multiple partially failed cache nodes is studied in a distributed wireless content caching system, where

r

out of a total of

n

cache nodes lose part of their cached data. Broadcast repair of failed cache contents at the network edge is studied; that is, the surviving cache nodes transmit broadcast messages to the failed ones, which are then used, together with the surviving data in their local cache memories, to recover the lost content. The trade-off between the storage capacity and the repair bandwidth is derived. It is shown that utilizing the broadcast nature of the wireless medium and the surviving cache contents at partially failed nodes significantly reduces the required repair bandwidth per node.Comment: Conference version of this paper has been submitted for review in ITW 2018. This submission includes the proof of theorem

arXiv.org e-Print Archive

Distributed and Optimal Resilient Planning of Large-Scale Interdependent Critical Infrastructures

Author: Chen Juntao
Huang Linan
Zhu Quanyan
Publication venue
Publication date: 06/09/2018
Field of study

The complex interconnections between heterogeneous critical infrastructure sectors make the system of systems (SoS) vulnerable to natural or human-made disasters and lead to cascading failures both within and across sectors. Hence, the robustness and resilience of the interdependent critical infrastructures (ICIs) against extreme events are essential for delivering reliable and efficient services to our society. To this end, we first establish a holistic probabilistic network model to model the interdependencies between infrastructure components. To capture the underlying failure and recovery dynamics of ICIs, we further propose a Markov decision processes (MDP) model in which the repair policy determines a long-term performance of the ICIs. To address the challenges that arise from the curse of dimensionality of the MDP, we reformulate the problem as an approximate linear program and then simplify it using factored graphs. We further obtain the distributed optimal control for ICIs under mild assumptions. Finally, we use a case study of the interdependent power and subway systems to corroborate the results and show that the optimal resilience resource planning and allocation can reduce the failure probability and mitigate the impact of failures caused by natural or artificial disasters

arXiv.org e-Print Archive

A Taxonomy of Peer-to-Peer Based Complex Queries: a Grid perspective

Author: Buyya Rajkumar
Harwood Aaron
Ranjan Rajiv
Publication venue
Publication date: 01/01/2006
Field of study

Grid superscheduling requires support for efficient and scalable discovery of resources. Resource discovery activities involve searching for the appropriate resource types that match the user's job requirements. To accomplish this goal, a resource discovery system that supports the desired look-up operation is mandatory. Various kinds of solutions to this problem have been suggested, including the centralised and hierarchical information server approach. However, both of these approaches have serious limitations in regards to scalability, fault-tolerance and network congestion. To overcome these limitations, organising resource information using Peer-to-Peer (P2P) network model has been proposed. Existing approaches advocate an extension to structured P2P protocols, to support the Grid resource information system (GRIS). In this paper, we identify issues related to the design of such an efficient, scalable, fault-tolerant, consistent and practical GRIS system using a P2P network model. We compile these issues into various taxonomies in sections III and IV. Further, we look into existing works that apply P2P based network protocols to GRIS. We think that this taxonomy and its mapping to relevant systems would be useful for academic and industry based researchers who are engaged in the design of scalable Grid systems

arXiv.org e-Print Archive

CiteSeerX