622 research outputs found

    An Autonomous Distributed Fault-Tolerant Local Positioning System

    Get PDF
    We describe a fault-tolerant, GPS-independent (Global Positioning System) distributed autonomous positioning system for static/mobile objects and present solutions for providing highly-accurate geo-location data for the static/mobile objects in dynamic environments. The reliability and accuracy of a positioning system fundamentally depends on two factors; its timeliness in broadcasting signals and the knowledge of its geometry, i.e., locations and distances of the beacons. Existing distributed positioning systems either synchronize to a common external source like GPS or establish their own time synchrony using a scheme similar to a master-slave by designating a particular beacon as the master and other beacons synchronize to it, resulting in a single point of failure. Another drawback of existing positioning systems is their lack of addressing various fault manifestations, in particular, communication link failures, which, as in wireless networks, are increasingly dominating the process failures and are typically transient and mobile, in the sense that they typically affect different messages to/from different processes over time

    Distributed Synthesis in Continuous Time

    Get PDF
    We introduce a formalism modelling communication of distributed agents strictly in continuous-time. Within this framework, we study the problem of synthesising local strategies for individual agents such that a specified set of goal states is reached, or reached with at least a given probability. The flow of time is modelled explicitly based on continuous-time randomness, with two natural implications: First, the non-determinism stemming from interleaving disappears. Second, when we restrict to a subclass of non-urgent models, the quantitative value problem for two players can be solved in EXPTIME. Indeed, the explicit continuous time enables players to communicate their states by delaying synchronisation (which is unrestricted for non-urgent models). In general, the problems are undecidable already for two players in the quantitative case and three players in the qualitative case. The qualitative undecidability is shown by a reduction to decentralized POMDPs for which we provide the strongest (and rather surprising) undecidability result so far

    Foundations for Safety-Critical on-Demand Medical Systems

    Get PDF
    In current medical practice, therapy is delivered in critical care environments (e.g., the ICU) by clinicians who manually coordinate sets of medical devices: The clinicians will monitor patient vital signs and then reconfigure devices (e.g., infusion pumps) as is needed. Unfortunately, the current state of practice is both burdensome on clinicians and error prone. Recently, clinicians have been speculating whether medical devices supporting ``plug & play interoperability\u27\u27 would make it easier to automate current medical workflows and thereby reduce medical errors, reduce costs, and reduce the burden on overworked clinicians. This type of plug & play interoperability would allow clinicians to attach devices to a local network and then run software applications to create a new medical system ``on-demand\u27\u27 which automates clinical workflows by automatically coordinating those devices via the network. Plug & play devices would let the clinicians build new medical systems compositionally. Unfortunately, safety is not considered a compositional property in general. For example, two independently ``safe\u27\u27 devices may interact in unsafe ways. Indeed, even the definition of ``safe\u27\u27 may differ between two device types. In this dissertation we propose a framework and define some conditions that permit reasoning about the safety of plug & play medical systems. The framework includes a logical formalism that permits formal reasoning about the safety of many device combinations at once, as well as a platform that actively prevents unintended timing interactions between devices or applications via a shared resource such as a network or CPU. We describe the various pieces of the framework, report some experimental results, and show how the pieces work together to enable the safety assessment of plug & play medical systems via a two case-studies

    Event-Triggered Decentralized Federated Learning over Resource-Constrained Edge Devices

    Full text link
    Federated learning (FL) is a technique for distributed machine learning (ML), in which edge devices carry out local model training on their individual datasets. In traditional FL algorithms, trained models at the edge are periodically sent to a central server for aggregation, utilizing a star topology as the underlying communication graph. However, assuming access to a central coordinator is not always practical, e.g., in ad hoc wireless network settings. In this paper, we develop a novel methodology for fully decentralized FL, where in addition to local training, devices conduct model aggregation via cooperative consensus formation with their one-hop neighbors over the decentralized underlying physical network. We further eliminate the need for a timing coordinator by introducing asynchronous, event-triggered communications among the devices. In doing so, to account for the inherent resource heterogeneity challenges in FL, we define personalized communication triggering conditions at each device that weigh the change in local model parameters against the available local resources. We theoretically demonstrate that our methodology converges to the globally optimal learning model at a O(lnkk)O{(\frac{\ln{k}}{\sqrt{k}})} rate under standard assumptions in distributed learning and consensus literature. Our subsequent numerical evaluations demonstrate that our methodology obtains substantial improvements in convergence speed and/or communication savings compared with existing decentralized FL baselines.Comment: 23 pages. arXiv admin note: text overlap with arXiv:2204.0372

    Fault Tolerant Gradient Clock Synchronization

    Full text link
    Synchronizing clocks in distributed systems is well-understood, both in terms of fault-tolerance in fully connected systems and the dependence of local and global worst-case skews (i.e., maximum clock difference between neighbors and arbitrary pairs of nodes, respectively) on the diameter of fault-free systems. However, so far nothing non-trivial is known about the local skew that can be achieved in topologies that are not fully connected even under a single Byzantine fault. Put simply, in this work we show that the most powerful known techniques for fault-tolerant and gradient clock synchronization are compatible, in the sense that the best of both worlds can be achieved simultaneously. Concretely, we combine the Lynch-Welch algorithm [Welch1988] for synchronizing a clique of nn nodes despite up to f<n/3f<n/3 Byzantine faults with the gradient clock synchronization (GCS) algorithm by Lenzen et al. [Lenzen2010] in order to render the latter resilient to faults. As this is not possible on general graphs, we augment an input graph G\mathcal{G} by replacing each node by 3f+13f+1 fully connected copies, which execute an instance of the Lynch-Welch algorithm. We then interpret these clusters as supernodes executing the GCS algorithm, where for each cluster its correct nodes' Lynch-Welch clocks provide estimates of the logical clock of the supernode in the GCS algorithm. By connecting clusters corresponding to neighbors in G\mathcal{G} in a fully bipartite manner, supernodes can inform each other about (estimates of) their logical clock values. This way, we achieve asymptotically optimal local skew, granted that no cluster contains more than ff faulty nodes, at factor O(f)O(f) and O(f2)O(f^2) overheads in terms of nodes and edges, respectively. Note that tolerating ff faulty neighbors trivially requires degree larger than ff, so this is asymptotically optimal as well
    corecore