34 research outputs found

    A More Faithful Formal Definition of the Desired Property for Distributed Snapshot Algorithms to Model Check the Property

    Get PDF
    The first distributed snapshot algorithm was invented by Chandy and Lamport: Chandy-Lamport distributed snapshot algorithm (CLDSA). Distributed snapshot algorithms are crucial components to make distributed systems fault tolerant. Such algorithms are extremely important because many modern key software systems are in the form of distributed systems and should be fault tolerant. There are at least two desired properties such algorithms should satisfy: 1) the distributed snapshot reachability property (called the DSR property) and 2) the ability to run concurrently with, but not alter, an underlying distributed system (UDS). This paper identifies subtle errors in a paper on formalization of the DSR property and shows how to correct them. We give a more faithful formal definition of the DSR property; the definition involves two state machines - one state machine M_UDS that formalizes a UDS and the other M_CLDSA that formalizes the UDS on which CLDSA is superimposed (UDS-CLDSA) - and can be used to more precise model checking of the DSR property for CLDSA. We also prove a theorem on equivalence of our new definition and an existing one that only involves M_CLDSA to guarantee the validity of the existing model checking approach. Moreover, we prove the second property, namely that CLDSA does not alter the behaviors of UDS

    Revisiting Snapshot Algorithms by Refinement-based Techniques

    Get PDF
    International audienceThe snapshot problem addresses a collection of important algorithmic issues related to the distributed computations, which are used for debugging or recovering the distributed programs. Among the existing solutions, Chandy and Lamport propose a simple distributed algorithm. In this paper, we explore the correct-by-construction process to formalize the snapshot algorithms in distributed system. The formalization process is based on a modeling language Event B, which supports a refinement-based incremental development using RODIN platform. These refinement-based techniques help to derive a correct distributed algorithm. Moreover, we demonstrate how this class of other distributed algorithms can be revisited. A consequence is to provide a fully mechanized proof of the distributed algorithms

    Model Checking of Robot Gathering

    Get PDF
    Recent advances in distributed computing highlight models and algorithms for autonomous mo- bile robots that self-organize and cooperate together in order to solve a global objective. As results, a large number of algorithms have been proposed. These algorithms are given together with proofs to assess their correctness. However, those proofs are informal, which are error prone. This paper presents our study on formal verification of mobile robot algorithms. We first propose a formal model for mobile robot algorithms on anonymous ring shape network under multiplicity and asynchrony assumptions. We specify this formal model in Maude, a specification and pro- gramming language based on rewriting logic. We then use its model checker to formally verify an algorithm for robot gathering problem on ring enjoys some desired properties. As the result of the model checking, counterexamples have been found. We detect the sources of some unforeseen design errors. We, furthermore, give our interpretations of these errors

    Formal Model-Driven Analysis of Resilience of GossipSub to Attacks from Misbehaving Peers

    Full text link
    GossipSub is a new peer-to-peer communication protocol designed to counter attacks from misbehaving peers by carefully controlling what information is disseminated and to whom, via a score function computed by each peer that captures positive and negative behaviors of its neighbors. The score function depends on several parameters (weights, caps, thresholds, etc.) that can be configured by applications using GossipSub. The specification for GossipSub is written in English and its resilience to attacks from misbehaving peers is supported empirically by emulation testing using an implementation in Golang. In this work we take a foundational approach to understanding the resilience of GossipSub to attacks from misbehaving peers. We build the first formal model of GossipSub, using the ACL2s theorem prover. Our model is officially endorsed by GossipSub developers. It can simulate GossipSub networks of arbitrary size and topology, with arbitrarily configured peers, and can be used to prove and disprove theorems about the protocol. We formalize fundamental security properties stating that the score function is fair, penalizes bad behavior and rewards good behavior. We prove that the score function is always fair, but can be configured in ways that either penalize good behavior or ignore bad behavior. Using our model, we run GossipSub with the specific configurations for two popular real-world applications: the FileCoin and Eth2.0 blockchains. We show that all properties hold for FileCoin. However, given any Eth2.0 network (of any topology and size) with any number of potentially misbehaving peers, we can synthesize attacks where these peers are able to continuously misbehave by never forwarding topic messages, while maintaining positive scores so that they are never pruned from the network by GossipSub.Comment: In revie

    Network-Wide Monitoring And Debugging

    Get PDF
    Modern networks can encompass over 100,000 servers. Managing such an extensive network with a diverse set of network policies has become more complicated with the introduction of programmable hardwares and distributed network functions. Furthermore, service level agreements (SLAs) require operators to maintain high performance and availability with low latencies. Therefore, it is crucial for operators to resolve any issues in networks quickly. The problems can occur at any layer of stack: network (load imbalance), data-plane (incorrect packet processing), control-plane (bugs in configuration) and the coordination among them. Unfortunately, existing debugging tools are not sufficient to monitor, analyze, or debug modern networks; either they lack visibility in the network, require manual analysis, or cannot check for some properties. These limitations arise from the outdated view of the networks, i.e., that we can look at a single component in isolation. In this thesis, we describe a new approach that looks at measuring, understanding, and debugging the network across devices and time. We also target modern stateful packet processing devices: programmable data-planes and distributed network functions as these becoming increasingly common part of the network. Our key insight is to leverage both in-network packet processing (to collect precise measurements) and out-of-network processing (to coordinate measurements and scale analytics). The resulting systems we design based on this approach can support testing and monitoring at the data center scale, and can handle stateful data in the network. We automate the collection and analysis of measurement data to save operator time and take a step towards self driving networks

    On the needs for specification and verification of collaborative and concurrent robots, agents and processes

    Get PDF
    This report summarises and integrates two different tracks of research for the purpose of envisioning and preparing a joint research project proposal. Soft- and hardware systems have become increasingly complex and act "concurrently", both with respect to memory access (i.e. information flow) and computational resources (i.e. "services"). The software development metaphor of cloud-storage, cloud-computing and service-oriented design has been anticipated by artificial intelligence (AI) research at least 30 years ago (parallel and distributed computation already dates back to the 1950’s and 1970s). What is known as a "service" today is what in AI is known as the capability of an agent; and the problem of information flow and consistency has been a headstone of information processing ever since. Based on a real-world robotics application we demonstrate how an increasingly abstract description of collaborating or competing agents correspond to a set of concurrent processes. In the second part we review several approaches to the theory of concurrent systems. Based on the different kinds of program semantics we present corresponding logical and algebraic means for the description of parallel processes and memory access. It turns out that Concurrent Kleene Algebra (CKA) and its related graphlet metaphor appears to deliver a one-to-one matching formal description of the module structures developed in the first part. The problem of snapshotting system states in order to receive (partial) traces of a running system seems to be well describable by a Temporal Logic of Actions (TLA). Finally, the different types of subsystems and their mutual requirements such as exclusiveness etc. seem to be best describable in a separation-logic like approach. We conclude with a list of research questions detailing some of the many promising issues raised in the report

    Performance modelling and the representation of large scale distributed system functions

    Get PDF
    This thesis presents a resource based approach to model generation for performance characterization and correctness checking of large scale telecommunications networks. A notion called the timed automaton is proposed and then developed to encapsulate behaviours of networking equipment, system control policies and non-deterministic user behaviours. The states of pooled network resources and the behaviours of resource consumers are represented as continually varying geometric patterns; these patterns form part of the data operated upon by the timed automata. Such a representation technique allows for great flexibility regarding the level of abstraction that can be chosen in the modelling of telecommunications systems. None the less, the notion of system functions is proposed to serve as a constraining framework for specifying bounded behaviours and features of telecommunications systems. Operational concepts are developed for the timed automata; these concepts are based on limit preserving relations. Relations over system states represent the evolution of system properties observable at various locations within the network under study. The declarative nature of such permutative state relations provides a direct framework for generating highly expressive models suitable for carrying out optimization experiments. The usefulness of the developed procedure is demonstrated by tackling a large scale case study, in particular the problem of congestion avoidance in networks; it is shown that there can be global coupling among local behaviours within a telecommunications network. The uncovering of such a phenomenon through a function oriented simulation is a contribution to the area of network modelling. The direct and faithful way of deriving performance metrics for loss in networks from resource utilization patterns is also a new contribution to the work area

    Decentralized Runtime Verification of LTL Specifications in Distributed Systems

    Get PDF
    Runtime verification is a lightweight automated formal method for specification-based run- time monitoring as well as testing of large real-world systems. While numerous techniques exist for runtime verification of sequential programs, there has been very little work on specification- based monitoring of distributed systems. In this work, we propose the first sound and complete method for runtime verification of asynchronous distributed programs for the 3-valued semantics of LTL specifications defined over the global state of the program. Our technique for evaluating LTL properties is inspired by distributed computation slicing, an approach for abstracting distributed computations with respect to a given predicate. Our monitoring technique is fully decentralized in that each process in the distributed program under inspection maintains a replica of the monitor automaton. Each monitor may maintain a set of possible verification verdicts based upon existence of concurrent events. Our experiments on runtime monitoring of a set of iOS devices running a distributed program show that due to the design of our Algorithm, monitoring overhead grows only in the linear order of the number of processes and events that need to be monitored

    The 5th Conference of PhD Students in Computer Science

    Get PDF

    Automated Extraction of Behaviour Model of Applications

    Get PDF
    Highly replicated cloud applications are deployed only when they are deemed to be func- tional. That is, they generally perform their task and their failure rate is relatively low. However, even though failure is rare, it does occur and is very difficult to diagnose. We devise a tool for failure diagnosis which learns the normal behaviour of an application in terms of the statistical properties of variables used throughout its execution, and then monitors it for deviation from these statistical properties. Our study reveals that many variables have unique statistical characteristics that amount to an invariant of the pro- gram. Therefore, any significant deviation from these characteristics reflects an abnormal behaviour of the application which may be caused by a program error. It is difficult to get the invariant from the application’s static code analysis alone. For example, the name of a person usually does not include a semicolon; however, an intruder may try to do a SQL injection (which will include a semicolon) through the ‘name’ field while entering his information and be successful if there is no checking for this case. This scenario can only be captured at runtime and may not be tested by the application de- veloper. The character range of the ‘name’ variable is one of its statistical properties; by learning this range from the execution of the application it is possible to detect the above described abnormal input. Hence, monitoring the statistics of values taken by the different variables of an application is an effective way to detect anomalies that can help to diagnose the failure of the application. We build a tool that collects frequent snapshots of the application’s heap and build a statistical model solely from the extensional knowledge of the application. The extensional knowledge is only obtainable from runtime data of the application without having any description or explanation of the application’s execution flow. The model characterizes the application’s normal behaviour. Collecting snapshots in form of memory dumps and determine the application’s behaviour model from them without code instrumentation make our tool applicable in cases where instrumentation is computationally expensive. Our approach allows a behaviour model to be automatically and efficiently built using the monitoring data alone. We evaluate the utility of our approach by applying it on an e-commerce application and online bidding system, and then derive different statisti- cal properties of variables from their runtime-exhibited values. Our experimental result demonstrates 96% accuracy in the generated statistical model with a maximum 1% per- formance overhead. This accuracy is measured at the basis of generating less false positive alerts when the application is running without any anomaly. The high accuracy and low performance overhead indicates that our tool can successfully determine the application’s normal behaviour without affecting the performance of the application and can be used to monitor it in production time. Moreover, our tool also correctly detected two anomalous condition while monitoring the application with a small amount of injected fault. In ad- dition to anomaly detection, our tool logs all the variables of the application that violates the learned model. The log file can help to diagnose any failure caused by the variables and gives our tool a source-code granularity in fault localization
    corecore