93,691 research outputs found
Production networks and failure avalanches
Although standard economics textbooks are seldom interested in production
networks, modern economies are more and more based upon suppliers/customers
interactions. One can consider entire sectors of the economy as generalised
supply chains. We will take this view in the present paper and study under
which conditions local failures to produce or simply to deliver can result in
avalanches of shortage and bankruptcies across the network. We will show that a
large class of models exhibit scale free distributions of production and wealth
among firms and that metastable regions of high production are highly
localised
Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments
Data centres that use consumer-grade disks drives and distributed
peer-to-peer systems are unreliable environments to archive data without enough
redundancy. Most redundancy schemes are not completely effective for providing
high availability, durability and integrity in the long-term. We propose alpha
entanglement codes, a mechanism that creates a virtual layer of highly
interconnected storage devices to propagate redundant information across a
large scale storage system. Our motivation is to design flexible and practical
erasure codes with high fault-tolerance to improve data durability and
availability even in catastrophic scenarios. By flexible and practical, we mean
code settings that can be adapted to future requirements and practical
implementations with reasonable trade-offs between security, resource usage and
performance. The codes have three parameters. Alpha increases storage overhead
linearly but increases the possible paths to recover data exponentially. Two
other parameters increase fault-tolerance even further without the need of
additional storage. As a result, an entangled storage system can provide high
availability, durability and offer additional integrity: it is more difficult
to modify data undetectably. We evaluate how several redundancy schemes perform
in unreliable environments and show that alpha entanglement codes are flexible
and practical codes. Remarkably, they excel at code locality, hence, they
reduce repair costs and become less dependent on storage locations with poor
availability. Our solution outperforms Reed-Solomon codes in many disaster
recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially
supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018
48th Annual IEEE/IFIP International Conference on Dependable Systems and
Networks (DSN
Convergence Rate Analysis of Distributed Gossip (Linear Parameter) Estimation: Fundamental Limits and Tradeoffs
The paper considers gossip distributed estimation of a (static) distributed
random field (a.k.a., large scale unknown parameter vector) observed by
sparsely interconnected sensors, each of which only observes a small fraction
of the field. We consider linear distributed estimators whose structure
combines the information \emph{flow} among sensors (the \emph{consensus} term
resulting from the local gossiping exchange among sensors when they are able to
communicate) and the information \emph{gathering} measured by the sensors (the
\emph{sensing} or \emph{innovations} term.) This leads to mixed time scale
algorithms--one time scale associated with the consensus and the other with the
innovations. The paper establishes a distributed observability condition
(global observability plus mean connectedness) under which the distributed
estimates are consistent and asymptotically normal. We introduce the
distributed notion equivalent to the (centralized) Fisher information rate,
which is a bound on the mean square error reduction rate of any distributed
estimator; we show that under the appropriate modeling and structural network
communication conditions (gossip protocol) the distributed gossip estimator
attains this distributed Fisher information rate, asymptotically achieving the
performance of the optimal centralized estimator. Finally, we study the
behavior of the distributed gossip estimator when the measurements fade (noise
variance grows) with time; in particular, we consider the maximum rate at which
the noise variance can grow and still the distributed estimator being
consistent, by showing that, as long as the centralized estimator is
consistent, the distributed estimator remains consistent.Comment: Submitted for publication, 30 page
Cascading failures in spatially-embedded random networks
Cascading failures constitute an important vulnerability of interconnected
systems. Here we focus on the study of such failures on networks in which the
connectivity of nodes is constrained by geographical distance. Specifically, we
use random geometric graphs as representative examples of such spatial
networks, and study the properties of cascading failures on them in the
presence of distributed flow. The key finding of this study is that the process
of cascading failures is non-self-averaging on spatial networks, and thus,
aggregate inferences made from analyzing an ensemble of such networks lead to
incorrect conclusions when applied to a single network, no matter how large the
network is. We demonstrate that this lack of self-averaging disappears with the
introduction of a small fraction of long-range links into the network. We
simulate the well studied preemptive node removal strategy for cascade
mitigation and show that it is largely ineffective in the case of spatial
networks. We introduce an altruistic strategy designed to limit the loss of
network nodes in the event of a cascade triggering failure and show that it
performs better than the preemptive strategy. Finally, we consider a real-world
spatial network viz. a European power transmission network and validate that
our findings from the study of random geometric graphs are also borne out by
simulations of cascading failures on the empirical network.Comment: 13 pages, 15 figure
Disaster-Resilient Control Plane Design and Mapping in Software-Defined Networks
Communication networks, such as core optical networks, heavily depend on
their physical infrastructure, and hence they are vulnerable to man-made
disasters, such as Electromagnetic Pulse (EMP) or Weapons of Mass Destruction
(WMD) attacks, as well as to natural disasters. Large-scale disasters may cause
huge data loss and connectivity disruption in these networks. As our dependence
on network services increases, the need for novel survivability methods to
mitigate the effects of disasters on communication networks becomes a major
concern. Software-Defined Networking (SDN), by centralizing control logic and
separating it from physical equipment, facilitates network programmability and
opens up new ways to design disaster-resilient networks. On the other hand, to
fully exploit the potential of SDN, along with data-plane survivability, we
also need to design the control plane to be resilient enough to survive network
failures caused by disasters. Several distributed SDN controller architectures
have been proposed to mitigate the risks of overload and failure, but they are
optimized for limited faults without addressing the extent of large-scale
disaster failures. For disaster resiliency of the control plane, we propose to
design it as a virtual network, which can be solved using Virtual Network
Mapping techniques. We select appropriate mapping of the controllers over the
physical network such that the connectivity among the controllers
(controller-to-controller) and between the switches to the controllers
(switch-to-controllers) is not compromised by physical infrastructure failures
caused by disasters. We formally model this disaster-aware control-plane design
and mapping problem, and demonstrate a significant reduction in the disruption
of controller-to-controller and switch-to-controller communication channels
using our approach.Comment: 6 page
Tolerating Correlated Failures in Massively Parallel Stream Processing Engines
Fault-tolerance techniques for stream processing engines can be categorized
into passive and active approaches. A typical passive approach periodically
checkpoints a processing task's runtime states and can recover a failed task by
restoring its runtime state using its latest checkpoint. On the other hand, an
active approach usually employs backup nodes to run replicated tasks. Upon
failure, the active replica can take over the processing of the failed task
with minimal latency. However, both approaches have their own inadequacies in
Massively Parallel Stream Processing Engines (MPSPE). The passive approach
incurs a long recovery latency especially when a number of correlated nodes
fail simultaneously, while the active approach requires extra replication
resources. In this paper, we propose a new fault-tolerance framework, which is
Passive and Partially Active (PPA). In a PPA scheme, the passive approach is
applied to all tasks while only a selected set of tasks will be actively
replicated. The number of actively replicated tasks depends on the available
resources. If tasks without active replicas fail, tentative outputs will be
generated before the completion of the recovery process. We also propose
effective and efficient algorithms to optimize a partially active replication
plan to maximize the quality of tentative outputs. We implemented PPA on top of
Storm, an open-source MPSPE and conducted extensive experiments using both real
and synthetic datasets to verify the effectiveness of our approach
- …