9,120 research outputs found
Practical Functional Regenerating Codes for Broadcast Repair of Multiple Nodes
A code construction and repair scheme for optimal functional regeneration of
multiple node failures is presented, which is based on stitching together short
MDS codes on carefully chosen sets of points lying on a linearized polynomial.
The nodes are connected wirelessly, hence all transmissions by helper nodes
during a repair round are available to all the nodes being repaired. The scheme
is simple and practical because of low subpacketization, low I/O cost and low
computational cost. Achievability of the minimum-bandwidth regenerating (MBR)
point, as well as an interior point, on the optimal storage-repair bandwidth
tradeoff curve is shown. The subspace properties derived in the paper provide
insight into the general properties of functional regenerating codes.Comment: 5 pages, ISIT 201
Repair Strategies for Storage on Mobile Clouds
We study the data reliability problem for a community of devices forming a
mobile cloud storage system. We consider the application of regenerating codes
for file maintenance within a geographically-limited area. Such codes require
lower bandwidth to regenerate lost data fragments compared to file replication
or reconstruction. We investigate threshold-based repair strategies where data
repair is initiated after a threshold number of data fragments have been lost
due to node mobility. We show that at a low departure-to-repair rate regime, a
lazy repair strategy in which repairs are initiated after several nodes have
left the system outperforms eager repair in which repairs are initiated after a
single departure. This optimality is reversed when nodes are highly mobile. We
further compare distributed and centralized repair strategies and derive the
optimal repair threshold for minimizing the average repair cost per unit of
time, as a function of underlying code parameters. In addition, we examine
cooperative repair strategies and show performance improvements compared to
non-cooperative codes. We investigate several models for the time needed for
node repair including a simple fixed time model that allows for the computation
of closed-form expressions and a more realistic model that takes into account
the number of repaired nodes. We derive the conditions under which the former
model approximates the latter. Finally, an extended model where additional
failures are allowed during the repair process is investigated. Overall, our
results establish the joint effect of code design and repair algorithms on the
maintenance cost of distributed storage systems.Comment: 23 pages, 11 figure
Decentralized Minimum-Cost Repair for Distributed Storage Systems
There have been emerging lots of applications for distributed storage systems
e.g., those in wireless sensor networks or cloud storage. Since storage nodes
in wireless sensor networks have limited battery, it is valuable to find a
repair scheme with optimal transmission costs (e.g., energy). The optimal-cost
repair has been recently investigated in a centralized way. However a
centralized control mechanism may not be available or is very expensive. For
the scenarios, it is interesting to study optimal-cost repair in a
decentralized setup. We formulate the optimal-cost repair as convex
optimization problems for the network with convex transmission costs. Then we
use primal and dual decomposition approaches to decouple the problem into
subproblems to be solved locally. Thus, each surviving node, collaborating with
other nodes, can minimize its transmission cost such that the global cost is
minimized. We further study the optimality and convergence of the algorithms.
Finally, we discuss the code construction and determine the field size for
finding feasible network codes in our approaches
Determinant Codes with Helper-Independent Repair for Single and Multiple Failures
Determinant codes are a class of exact-repair regenerating codes for
distributed storage systems with parameters (n, k = d, d). These codes cover
the entire trade-off between per-node storage and repair-bandwidth. In an
earlier work of the authors, the repair data of the determinant code sent by a
helper node to repair a failed node depends on the identity of the other helper
nodes participating in the process, which is practically undesired. In this
work, a new repair mechanism is proposed for determinant codes, which relaxes
this dependency, while preserving all other properties of the code. Moreover,
it is shown that the determinant codes are capable of repairing multiple
failures, with a per-node repair-bandwidth which scales sub-linearly with the
number of failures
Repairing Multiple Failures for Scalar MDS Codes
In distributed storage, erasure codes -- like Reed-Solomon Codes -- are often
employed to provide reliability. In this setting, it is desirable to be able to
repair one or more failed nodes while minimizing the repair bandwidth. In this
work, motivated by Reed-Solomon codes, we study the problem of repairing
multiple failed nodes in a scalar MDS code. We extend the framework of
(Guruswami and Wootters, 2017) to give a framework for constructing repair
schemes for multiple failures in general scalar MDS codes, in the centralized
repair model. We then specialize our framework to Reed-Solomon codes, and
extend and improve upon recent results of (Dau et al., 2017)
The Storage vs Repair Bandwidth Trade-off for Multiple Failures in Clustered Storage Networks
We study the trade-off between storage overhead and inter-cluster repair
bandwidth in clustered storage systems, while recovering from multiple node
failures within a cluster. A cluster is a collection of nodes, and there
are clusters. For data collection, we download the entire content from any
clusters. For repair of nodes within a cluster, we take help
from local nodes, as well as helper clusters. We characterize the
optimal trade-off under functional repair, and also under exact repair for the
minimum storage and minimum inter-cluster bandwidth (MBR) operating points. Our
bounds show the following interesting facts: When the
trade-off is the same as that under , and thus there is no advantage in
jointly repairing multiple nodes, When , the optimal
file-size at the MBR point under exact repair can be strictly less than that
under functional repair. Unlike the case of , increasing the number
of local helper nodes does not necessarily increase the system capacity under
functional repair.Comment: Accepted to IEEE Information Theory Workshop(ITW) 201
Capacity of Wireless Distributed Storage Systems with Broadcast Repair
In wireless distributed storage systems, storage nodes are connected by
wireless channels, which are broadcast in nature. This paper exploits this
unique feature to design an efficient repair mechanism, called broadcast
repair, for wireless distributed storage systems in the presence of
multiple-node failures. Due to the broadcast nature of wireless transmission,
we advocate a new measure on repair performance called repair-transmission
bandwidth. In contrast to repair bandwidth, which measures the average number
of packets downloaded by a newcomer to replace a failed node,
repair-transmission bandwidth measures the average number of packets
transmitted by helper nodes per failed node. A fundamental study on the storage
capacity of wireless distributed storage systems with broadcast repair is
conducted by modeling the storage system as a multicast network and analyzing
the minimum cut of the corresponding information flow graph. The fundamental
tradeoff between storage efficiency and repair-transmission bandwidth is also
obtained for functional repair. The performance of broadcast repair is compared
both analytically and numerically with that of cooperative repair, the basic
repair method for wired distributed storage systems with multiple-node
failures. While cooperative repair is based on the idea of allowing newcomers
to exchange packets, broadcast repair is based on the idea of allowing a helper
to broadcast packets to all newcomers simultaneously. We show that broadcast
repair outperforms cooperative repair, offering a better tradeoff between
storage efficiency and repair-transmission bandwidth.Comment: 28 pages, 7 figure
Storage-Repair Bandwidth Trade-off for Wireless Caching with Partial Failure and Broadcast Repair
Repair of multiple partially failed cache nodes is studied in a distributed
wireless content caching system, where out of a total of cache nodes
lose part of their cached data. Broadcast repair of failed cache contents at
the network edge is studied; that is, the surviving cache nodes transmit
broadcast messages to the failed ones, which are then used, together with the
surviving data in their local cache memories, to recover the lost content. The
trade-off between the storage capacity and the repair bandwidth is derived. It
is shown that utilizing the broadcast nature of the wireless medium and the
surviving cache contents at partially failed nodes significantly reduces the
required repair bandwidth per node.Comment: Conference version of this paper has been submitted for review in ITW
2018. This submission includes the proof of theorem
Distributed and Optimal Resilient Planning of Large-Scale Interdependent Critical Infrastructures
The complex interconnections between heterogeneous critical infrastructure
sectors make the system of systems (SoS) vulnerable to natural or human-made
disasters and lead to cascading failures both within and across sectors. Hence,
the robustness and resilience of the interdependent critical infrastructures
(ICIs) against extreme events are essential for delivering reliable and
efficient services to our society. To this end, we first establish a holistic
probabilistic network model to model the interdependencies between
infrastructure components. To capture the underlying failure and recovery
dynamics of ICIs, we further propose a Markov decision processes (MDP) model in
which the repair policy determines a long-term performance of the ICIs. To
address the challenges that arise from the curse of dimensionality of the MDP,
we reformulate the problem as an approximate linear program and then simplify
it using factored graphs. We further obtain the distributed optimal control for
ICIs under mild assumptions. Finally, we use a case study of the interdependent
power and subway systems to corroborate the results and show that the optimal
resilience resource planning and allocation can reduce the failure probability
and mitigate the impact of failures caused by natural or artificial disasters
A Taxonomy of Peer-to-Peer Based Complex Queries: a Grid perspective
Grid superscheduling requires support for efficient and scalable discovery of
resources. Resource discovery activities involve searching for the appropriate
resource types that match the user's job requirements. To accomplish this goal,
a resource discovery system that supports the desired look-up operation is
mandatory. Various kinds of solutions to this problem have been suggested,
including the centralised and hierarchical information server approach.
However, both of these approaches have serious limitations in regards to
scalability, fault-tolerance and network congestion. To overcome these
limitations, organising resource information using Peer-to-Peer (P2P) network
model has been proposed. Existing approaches advocate an extension to
structured P2P protocols, to support the Grid resource information system
(GRIS). In this paper, we identify issues related to the design of such an
efficient, scalable, fault-tolerant, consistent and practical GRIS system using
a P2P network model. We compile these issues into various taxonomies in
sections III and IV. Further, we look into existing works that apply P2P based
network protocols to GRIS. We think that this taxonomy and its mapping to
relevant systems would be useful for academic and industry based researchers
who are engaged in the design of scalable Grid systems
- …