93,691 research outputs found

    Production networks and failure avalanches

    Get PDF
    Although standard economics textbooks are seldom interested in production networks, modern economies are more and more based upon suppliers/customers interactions. One can consider entire sectors of the economy as generalised supply chains. We will take this view in the present paper and study under which conditions local failures to produce or simply to deliver can result in avalanches of shortage and bankruptcies across the network. We will show that a large class of models exhibit scale free distributions of production and wealth among firms and that metastable regions of high production are highly localised

    Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments

    Full text link
    Data centres that use consumer-grade disks drives and distributed peer-to-peer systems are unreliable environments to archive data without enough redundancy. Most redundancy schemes are not completely effective for providing high availability, durability and integrity in the long-term. We propose alpha entanglement codes, a mechanism that creates a virtual layer of highly interconnected storage devices to propagate redundant information across a large scale storage system. Our motivation is to design flexible and practical erasure codes with high fault-tolerance to improve data durability and availability even in catastrophic scenarios. By flexible and practical, we mean code settings that can be adapted to future requirements and practical implementations with reasonable trade-offs between security, resource usage and performance. The codes have three parameters. Alpha increases storage overhead linearly but increases the possible paths to recover data exponentially. Two other parameters increase fault-tolerance even further without the need of additional storage. As a result, an entangled storage system can provide high availability, durability and offer additional integrity: it is more difficult to modify data undetectably. We evaluate how several redundancy schemes perform in unreliable environments and show that alpha entanglement codes are flexible and practical codes. Remarkably, they excel at code locality, hence, they reduce repair costs and become less dependent on storage locations with poor availability. Our solution outperforms Reed-Solomon codes in many disaster recovery scenarios.Comment: The publication has 12 pages and 13 figures. This work was partially supported by Swiss National Science Foundation SNSF Doc.Mobility 162014, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN

    Convergence Rate Analysis of Distributed Gossip (Linear Parameter) Estimation: Fundamental Limits and Tradeoffs

    Full text link
    The paper considers gossip distributed estimation of a (static) distributed random field (a.k.a., large scale unknown parameter vector) observed by sparsely interconnected sensors, each of which only observes a small fraction of the field. We consider linear distributed estimators whose structure combines the information \emph{flow} among sensors (the \emph{consensus} term resulting from the local gossiping exchange among sensors when they are able to communicate) and the information \emph{gathering} measured by the sensors (the \emph{sensing} or \emph{innovations} term.) This leads to mixed time scale algorithms--one time scale associated with the consensus and the other with the innovations. The paper establishes a distributed observability condition (global observability plus mean connectedness) under which the distributed estimates are consistent and asymptotically normal. We introduce the distributed notion equivalent to the (centralized) Fisher information rate, which is a bound on the mean square error reduction rate of any distributed estimator; we show that under the appropriate modeling and structural network communication conditions (gossip protocol) the distributed gossip estimator attains this distributed Fisher information rate, asymptotically achieving the performance of the optimal centralized estimator. Finally, we study the behavior of the distributed gossip estimator when the measurements fade (noise variance grows) with time; in particular, we consider the maximum rate at which the noise variance can grow and still the distributed estimator being consistent, by showing that, as long as the centralized estimator is consistent, the distributed estimator remains consistent.Comment: Submitted for publication, 30 page

    Cascading failures in spatially-embedded random networks

    Get PDF
    Cascading failures constitute an important vulnerability of interconnected systems. Here we focus on the study of such failures on networks in which the connectivity of nodes is constrained by geographical distance. Specifically, we use random geometric graphs as representative examples of such spatial networks, and study the properties of cascading failures on them in the presence of distributed flow. The key finding of this study is that the process of cascading failures is non-self-averaging on spatial networks, and thus, aggregate inferences made from analyzing an ensemble of such networks lead to incorrect conclusions when applied to a single network, no matter how large the network is. We demonstrate that this lack of self-averaging disappears with the introduction of a small fraction of long-range links into the network. We simulate the well studied preemptive node removal strategy for cascade mitigation and show that it is largely ineffective in the case of spatial networks. We introduce an altruistic strategy designed to limit the loss of network nodes in the event of a cascade triggering failure and show that it performs better than the preemptive strategy. Finally, we consider a real-world spatial network viz. a European power transmission network and validate that our findings from the study of random geometric graphs are also borne out by simulations of cascading failures on the empirical network.Comment: 13 pages, 15 figure

    Disaster-Resilient Control Plane Design and Mapping in Software-Defined Networks

    Full text link
    Communication networks, such as core optical networks, heavily depend on their physical infrastructure, and hence they are vulnerable to man-made disasters, such as Electromagnetic Pulse (EMP) or Weapons of Mass Destruction (WMD) attacks, as well as to natural disasters. Large-scale disasters may cause huge data loss and connectivity disruption in these networks. As our dependence on network services increases, the need for novel survivability methods to mitigate the effects of disasters on communication networks becomes a major concern. Software-Defined Networking (SDN), by centralizing control logic and separating it from physical equipment, facilitates network programmability and opens up new ways to design disaster-resilient networks. On the other hand, to fully exploit the potential of SDN, along with data-plane survivability, we also need to design the control plane to be resilient enough to survive network failures caused by disasters. Several distributed SDN controller architectures have been proposed to mitigate the risks of overload and failure, but they are optimized for limited faults without addressing the extent of large-scale disaster failures. For disaster resiliency of the control plane, we propose to design it as a virtual network, which can be solved using Virtual Network Mapping techniques. We select appropriate mapping of the controllers over the physical network such that the connectivity among the controllers (controller-to-controller) and between the switches to the controllers (switch-to-controllers) is not compromised by physical infrastructure failures caused by disasters. We formally model this disaster-aware control-plane design and mapping problem, and demonstrate a significant reduction in the disruption of controller-to-controller and switch-to-controller communication channels using our approach.Comment: 6 page

    Tolerating Correlated Failures in Massively Parallel Stream Processing Engines

    Full text link
    Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE). The passive approach incurs a long recovery latency especially when a number of correlated nodes fail simultaneously, while the active approach requires extra replication resources. In this paper, we propose a new fault-tolerance framework, which is Passive and Partially Active (PPA). In a PPA scheme, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness of our approach
    • …
    corecore