3,964 research outputs found
Uncovering Bugs in Distributed Storage Systems during Testing (not in Production!)
Testing distributed systems is challenging due to multiple sources of nondeterminism. Conventional testing techniques, such as unit, integration and stress testing, are ineffective in preventing serious but subtle bugs from reaching production. Formal techniques, such as TLA+, can only verify high-level specifications of systems at the level of logic-based models, and fall short of checking the actual executable code. In this paper, we present a new methodology for testing distributed systems. Our approach applies advanced systematic testing techniques to thoroughly check that the executable code adheres to its high-level specifications, which significantly improves coverage of important system behaviors. Our methodology has been applied to three distributed storage systems in the Microsoft Azure cloud computing platform. In the process, numerous bugs were identified, reproduced, confirmed and fixed. These bugs required a subtle combination of concurrency and failures, making them extremely difficult to find with conventional testing techniques. An important advantage of our approach is that a bug is uncovered in a small setting and witnessed by a full system trace, which dramatically increases the productivity of debugging
Smart technologies for effective reconfiguration: the FASTER approach
Current and future computing systems increasingly require that their functionality stays flexible after the system is operational, in order to cope with changing user requirements and improvements in system features, i.e. changing protocols and data-coding standards, evolving demands for support of different user applications, and newly emerging applications in communication, computing and consumer electronics. Therefore, extending the functionality and the lifetime of products requires the addition of new functionality to track and satisfy the customers needs and market and technology trends. Many contemporary products along with the software part incorporate hardware accelerators for reasons of performance and power efficiency. While adaptivity of software is straightforward, adaptation of the hardware to changing requirements constitutes a challenging problem requiring delicate solutions. The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) project aims at introducing a complete methodology to allow designers to easily implement a system specification on a platform which includes a general purpose processor combined with multiple accelerators running on an FPGA, taking as input a high-level description and fully exploiting, both at design time and at run time, the capabilities of partial dynamic reconfiguration. The goal is that for selected application domains, the FASTER toolchain will be able to reduce the design and verification time of complex reconfigurable systems providing additional novel verification features that are not available in existing tool flows
Services for safety-critical applications on dual-scheduled TDMA networks
Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200
Modelling and Verification of a Cluster-tree Formation Protocol Implementation for the IEEE 802.15.4 TSCH MAC Operation Mode
Correct and efficient initialization of wireless sensor networks can be
challenging in the face of many uncertainties present in ad hoc wireless
networks. In this paper we examine an implementation for the formation of a
cluster-tree topology in a network which operates on top of the TSCH MAC
operation mode of the IEEE 802.15.4 standard, and investigate it using formal
methods. We show how both the mCRL2 language and toolset help us in identifying
scenarios where the implementation does not form a proper topology. More
importantly, our analysis leads to the conclusion that the cluster-tree
formation algorithm has a super linear time complexity. So, it does not scale
to large networks.Comment: In Proceedings MARS 2017, arXiv:1703.0581
Towards the Formal Verification of a Distributed Real-Time Automotive System
We present the status of a project which aims at building, formally and pervasively verifying a distributed automotive system. The target system is a gate-level model which consists of several interconnected electronic control units with independent clocks. This model is verified against the specification as seen by a system programmer. The automotive system is implemented on several FPGA boards. The pervasive verification is carried out using combination of interactive theorem proving (Isabelle/HOL) and model checking (LTL)
SDN4CoRE: A Simulation Model for Software-Defined Networking for Communication over Real-Time Ethernet
Ethernet has become the next standard for automotive and industrial
automation networks. Standard extensions such as IEEE 802.1Q Time-Sensitive
Networking (TSN) have been proven to meet the real-time and robustness
requirements of these environments. Augmenting the TSN switching by
Software-Defined Networking functions promises additional benefits: A
programming option for TSN devices can add much value to the resilience,
security, and adaptivity of the environment. Network simulation allows to model
highly complex networks before assembly and is an essential process for the
design and validation of future networks. Still, a simulation environment that
supports programmable real-time networks is missing. This paper fills the gap
by sharing our simulation model for Software-Defined Networking for
Communication over Real-Time Ethernet (SDN4CoRE) and present initial results in
modeling programmable real-time networks. In a case study, we show that
SDN4CoRE can simulate complex programmable real-time networks and allows for
testing and verifying the programming of real-time devices.Comment: If you cite this paper, please use the original reference: T.
H\"ackel, P. Meyer, F. Korf, and T. C. Schmidt. SDN4CoRE: A Simulation Model
for Software-Defined Networking for Communication over Real-Time Ethernet.
In: Proceedings of the 6th International OMNeT++ Community Summit. September,
2019, Easychai
Recommended from our members
Modular and Safe Event-Driven Programming
Asynchronous event-driven systems are ubiquitous across domains such as device drivers, distributed systems, and robotics. These systems are notoriously hard to get right as the programmer needs to reason about numerous control paths resulting from the complex interleaving of events (or messages) and failures. Unsurprisingly, it is easy to introduce subtle errors while attempting to fill in gaps between high-level system specifications and their concrete implementations.This dissertation proposes new methods for programming safe event-driven asynchronous systems.In the first part of the thesis, we present ModP, a modular programming framework for compositional programming and testing of event-driven asynchronous systems.The ModP module system supports a novel theory of compositional refinement for assume-guarantee reasoning of dynamic event-driven asynchronous systems. We build a complex distributed systems software stack using ModP.Our results demonstrate that compositional reasoning can help scale model-checking (both explicit and symbolic) to large distributed systems.ModP is transforming the way asynchronous software is built at Microsoft and Amazon Web Services (AWS). Microsoft uses ModP for implementing safe device drivers and other software in the Windows kernel.AWS uses ModP for compositional model checking of complex distributed systems. While ModP simplifies analysis of such systems, the state space of industrial-scale systems remains extremely large.In the second part of this thesis, we present scalable verification and systematic testing approaches to further mitigate this state-space explosion problem.First, we introduce the concept of a delaying explorer to perform prioritized exploration of the behaviors of an asynchronous reactive program. A delaying explorer stratifies the search space using a custom strategy (tailored towards finding bugs faster), and a delay operation that allows deviation from that strategy. We show that prioritized search with a delaying explorer performs significantly better than existing approaches for finding bugs in asynchronous programs.Next, we consider the challenge of verifying time-synchronized systems; these are almost-synchronous systems as they are neither completely asynchronous nor synchronous.We introduce approximate synchrony, a sound and tunable abstraction for verification of almost-synchronous systems. We show how approximate synchrony can be used for verification of both time-synchronization protocols and applications running on top of them.Moreover, we show how approximate synchrony also provides a useful strategy to guide state-space exploration during model-checking.Using approximate synchrony and implementing it as a delaying explorer, we were able to verify the correctness of the IEEE 1588 distributed time-synchronization protocol and, in the process, uncovered a bug in the protocol that was well appreciated by the standards committee.In the final part of this thesis, we consider the challenge of programming a special class of event-driven asynchronous systems -- safe autonomous robotics systems.Our approach towards achieving assured autonomy for robotics systems consists of two parts: (1) a high-level programming language for implementing and validating the reactive robotics software stack; and (2) an integrated runtime assurance system to ensure that the assumptions used during design-time validation of the high-level software hold at runtime.Combining high-level programming language and model-checking with runtime assurance helps us bridge the gap between design-time software validation that makes assumptions about the untrusted components (e.g., low-level controllers), and the physical world, and the actual execution of the software on a real robotic platform in the physical world. We implemented our approach as DRONA, a programming framework for building safe robotics systems.We used DRONA for building a distributed mobile robotics system and deployed it on real drone platforms. Our results demonstrate that DRONA (with the runtime-assurance capabilities) enables programmers to build an autonomous robotics software stack with formal safety guarantees.To summarize, this thesis contributes new theory and tools to the areas of programming languages, verification, systematic testing, and runtime assurance for programming safe asynchronous event-driven across the domains of fault-tolerant distributed systems and safe autonomous robotics systems
- …