6 research outputs found

    To Transmit Now or Not to Transmit Now

    Get PDF

    Performance Trade-Offs in Cyber–Physical Control Applications With Multi-Connectivity

    Get PDF
    Modern communication devices are often equipped with multiple wireless communication interfaces with diverse characteristics. This enables exploiting a form of multi-connectivity known as interface diversity to provide path diversity with multiple communication interfaces. Interface diversity helps to combat the problems suffered by single-interface systems due to error bursts in the link, which are a consequence of temporal correlation in the wireless channel. The length of an error burst is an essential performance indicator for cyber–physical control applications with periodic traffic, as this defines the period in which the control link is unavailable. However, the available interfaces must be correctly orchestrated to achieve an adequate trade-off between latency, reliability, and energy consumption. This work investigates how the packet error statistics from different interfaces impact the overall latency–reliability characteristics and explores mechanisms to derive adequate interface diversity policies. For this, we model the optimization problem as a partially observable Markov decision process (POMDP), where the state of each interface is determined by a Gilbert–Elliott model whose parameters are estimated based on experimental measurement traces from LTE and Wi-Fi. Our results show that the POMDP approach provides an all-round adaptable solution, whose performance is only 0.1% below the absolute upper bound, dictated by the optimal policy under the impractical assumption of full observability

    Right On Time Distributed Shared Memory

    Get PDF
    The demand for real-time data storage in distributed control systems (DCSs) is growing. Yet, providing real- time DCS guarantees is challenging, especially when more and more sensor and actuator devices are connected to industrial plants and message loss needs to be taken into account. In this paper, we investigate how to build a shared memory abstraction for DCSs as a first step towards implementing different shared storage systems in a DCS context. We first prove that, in the presence of host crashes and message losses, the necessary guarantees of such an abstraction are impossible to implement using a traditional approach that has no access to the internals of existing DCS services, e.g., a modular approach where algorithms are built on top of existing software blocks like failure detectors. We propose a white-box approach that utilizes messages of existing services in any DCS as the sole means of communication. More precisely, we present TapeWorm, an algorithm that attaches itself to the heartbeat messages of the failure detector component in DCSs. We prove that TapeWorm implements the desired shared memory guarantees for applications running on a DCS. We also analyze the performance of TapeWorm and we showcase ways of adapting TapeWorm to various application needs and workloads

    Never Say Never Probabilistic & Temporal Failure Detectors (Extended)

    Get PDF
    The failure detector approach for solving distributed computing problems has been celebrated for its modularity. This approach allows the construction of algorithms using abstract failure detection mechanisms, defined by axiomatic properties, as building blocks. The minimal synchrony assumptions on communication, which enable to implement the failure detection mechanism, are studied separately. Such synchrony assumptions are typically expressed as eventual guarantees that need to hold, after some point in time, forever and deterministically. But in practice, they never do. Synchrony assumptions may hold only probabilistically and temporarily. In this paper, we study failure detectors in a realistic distributed system N, with asynchrony inflicted by probabilistic synchronous communication. We address the following paradox about the weakest failure detector to solve the consensus problem (and many equivalent problems), i.e., S: an implementation of “consensus with probability 1” is possible in N without using randomness in the algorithm itself, while an implementation of “S with probability 1” is impossible to achieve in N. We circumvent this paradox by introducing a new failure detector S*, a variant of S with probabilistic and temporal accuracy. We prove that S* is implementable in N and we provide an optimal S* implementation. Interestingly, we show that S* can replace S , in several existing deterministic consensus algorithms using S, to yield an algorithm that solves “consensus with probability 1”. In fact, we show that such result holds for all decisive problems (not only consensus) and also for failure detector P (not only S). The resulting algorithms combine the modularity of distributed computing practices with the practicality of networking ones

    Reliability Mechanisms for Controllers in Real-Time Cyber-Physical Systems

    Get PDF
    Cyber-physical systems (CPSs) are real-world processes that are controlled by computer algorithms. We consider CPSs where a centralized, software-based controller maintains the process in a desired state by exchanging measurements and setpoints with process agents (PAs). As CPSs control processes with low-inertia, e.g., electric grids and autonomous cars, the controller needs to satisfy stringent real-time constraints. However, the controllers are susceptible to delay and crash faults, and the communication network might drop, delay or reorder messages. This degrades the quality of control of the physical process, failure of which can result in damage to life or property. Existing reliability solutions are either not well-suited for real-time CPSs or impose serious restrictions on the controllers. In this thesis, we design, implement and evaluate reliability mechanisms for real-time CPS controllers that require minimal modifications to the controller itself. We begin by abstracting the execution of a CPS using events in the CPS, and the two inherent relations among those events, namely network and computation relations. We use these relations to introduce the intentionality relation that uses these events to capture the state of the physical process. Based on the intentionality relation, we define three correctness properties namely, state safety, optimal selection and consistency, that together provide linearizability (one-copy equivalence) for CPS controllers. We propose intentionality clocks and Quarts, and prove that they provide linearizability. To provide consistency, Quarts ensures agreement among controller replicas, which is typically achieved using consensus. Consensus can add an unbounded-latency overhead. Quarts leverages the properties specific to CPSs to perform agreement using pre-computed priorities among sets of received measurements, resulting in a bounded-latency overhead with high availability. Using simulation, we show that availability of Quarts, with two replicas, is more than an order of magnitude higher than consensus. We also propose Axo, a fault-tolerance protocol that uses active replication to detect and recover faulty replicas, and provide timeliness that requires delayed setpoints be masked from the PAs. We study the effect of delay faults and the impact of fault-tolerance with Axo, by deploying Axo in two real-world CPSs. Then, we realize that the proposed reliability mechanisms also apply to unconventional CPSs such as software defined networking (SDN), where the controlled process is the routing fabric of the network. We show that, in SDN, violating consistency can cause implementation of incorrect routing policies. Thus, we use Quarts and intentionality clocks, to design and implement QCL, a coordination layer for SDN controllers that guarantees control-plane consistency. QCL also drastically reduces the response time of SDN controllers when compared to consensus-based techniques. In the last part of the thesis, we address the problem of reliable communication between the software agents, in a wide-area network that can drop, delay or reorder messages. For this, we propose iPRP, an IP-friendly parallel redundancy protocol for 0 ms repair of packet losses. iPRP requires fail-independent paths for high-reliability. So, we study the fail-independence of Wi-Fi links using real-life measurements, as a first step towards using Wi-Fi for real-time communication in CPSs
    corecore