1,535 research outputs found
Grove: a Separation-Logic Library for Verifying Distributed Systems (Extended Version)
Grove is a concurrent separation logic library for verifying distributed
systems. Grove is the first to handle time-based leases, including their
interaction with reconfiguration, crash recovery, thread-level concurrency, and
unreliable networks. This paper uses Grove to verify several distributed system
components written in Go, including GroveKV, a realistic distributed
multi-threaded key-value store. GroveKV supports reconfiguration,
primary/backup replication, and crash recovery, and uses leases to execute
read-only requests on any replica. GroveKV achieves high performance (67-73% of
Redis on a single core), scales with more cores and more backup replicas
(achieving about 2x the throughput when going from 1 to 3 servers), and can
safely execute reads while reconfiguring.Comment: Extended version of paper appearing at SOSP 202
Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants
Geo-replicated databases often operate under the principle of eventual
consistency to offer high-availability with low latency on a simple key/value
store abstraction. Recently, some have adopted commutative data types to
provide seamless reconciliation for special purpose data types, such as
counters. Despite this, the inability to enforce numeric invariants across all
replicas still remains a key shortcoming of relying on the limited guarantees
of eventual consistency storage. We present a new replicated data type, called
bounded counter, which adds support for numeric invariants to eventually
consistent geo-replicated databases. We describe how this can be implemented on
top of existing cloud stores without modifying them, using Riak as an example.
Our approach adapts ideas from escrow transactions to devise a solution that is
decentralized, fault-tolerant and fast. Our evaluation shows much lower latency
and better scalability than the traditional approach of using strong
consistency to enforce numeric invariants, thus alleviating the tension between
consistency and availability
Verifying and Enforcing Application Constraints in Antidote SQL
Geo-replicated storage systems are currently a fundamental piece in the development of large-scale
applications where users are distributed across the world. To meet the high requirements regarding la-
tency and availability of these applications, these database systems are forced to use weak consistency
mechanisms. However, under these consistency models, there is no guarantee that the invariants are
preserved, which can jeopardise the correctness of applications. The most obvious alternative to solve
this problem would be to use strong consistency, but this would place a large burden on the system.
Since neither of these options was feasible, many systems have been developed to preserve the
invariants of the applications without sacrificing low latency and high availability. These systems,
based on the analysis of operations, make it possible to increase the guarantees of weak consistency
by introducing consistency at the level of operations that are potentially dangerous to the invariant.
Antidote SQL is a database system that, by combining strong with weak consistency mechanisms,
attempts to guarantee the preservation of invariants at the data level. In this way, and after defining
the concurrency semantics for the application, any operation can be performed without coordination
and without the risk of violating the invariant. However, this approach has some limitations, namely
the fact that it is not trivial for developers to define appropriate concurrency semantics.
In this document, we propose a methodology for the verification and validation of defined prop-
erties, such as invariants, for applications using Antidote SQL. The proposed methodology uses a
high-level programming language with automatic verification features called VeriFx and provides
guidelines for programmers who wish to implement and verify their own systems and specifications
using this tool.Os sistemas de armazenamento geo-replicados são atualmente uma peça fundamental no desenvolvi-
mento de aplicações de grande escala em que os utilizadores se encontram espalhados pelo mundo.
Com o objetivo de satisfazer os elevados requisitos em relação à latência e à disponibilidade destas
aplicações, estes sistemas de bases de dados vêem-se obrigados a recorrer a mecanismos de consistên-
cia fracos. No entanto, sob estes modelos de consistência não existe qualquer tipo de garantia de que os
invariantes são preservados, o que pode colocar em causa a correção das aplicações. A alternativa mais
óbvia para resolver este problema passaria por utilizar consistência forte, no entanto esta incutiria
uma grande sobrecarga no sistema.
Sendo que nenhuma destas opções é viável, muitos sistemas foram desenvolvidos no sentido de
preservar os invariantes das aplicações, sem contudo, abdicar de baixas latências e alta disponibilidade.
Estes sistemas, baseados na análise das operações, permitem aumentar as garantias de consistência
fraca com a introdução de consistência ao nível das operações potencialmente perigosas para o invari-
ante.
O Antidote SQL é um sistema de base de dados que através da combinação de mecanismos de
consistência fortes com mecanismos de consistência fracos tenta garantir a preservação dos invariantes
ao nível dos dados. Desta forma, e depois de definidas as semânticas de concorrência para a aplicação,
qualquer operação pode ser executada sem coordenação e sem perigo de quebra do invariante. No
entanto esta abordagem apresenta algumas limitações nomeadamente o facto de não ser trivial para
os programadores definirem as semânticas de concorrência adequadas.
Neste documento propomos uma metodologia para a verificação e validação de propriedades defi-
nidas, como os invariantes, para aplicações que usam o Antidote SQL. A metodologia proposta utiliza
uma linguagem de programação de alto nível com capacidade de verificação automática designada
por VeriFx, e fornece as diretrizes a seguir para que o programador consiga implementar e verificar
os seus próprios sistemas e especificações, utilizando a ferramenta
Recommended from our members
Replicating multithreaded services
textFor the last 40 years, the systems community has invested a lot of effort in designing techniques for building fault tolerant distributed systems and services. This effort has produced a massive list of results: the literature describes how to design replication protocols that tolerate a wide range of failures (from simple crashes to malicious "Byzantine" failures) in a wide range of settings (e.g. synchronous or asynchronous communication, with or without stable storage), optimizing various metrics (e.g. number of messages, latency, throughput). These techniques have their roots in ideas, such as the abstraction of State Machine Replication and the Paxos protocol, that were conceived when computing was very different than it is today: computers had a single core; all processing was done using a single thread of control, handling requests sequentially; and a collection of 20 nodes was considered a large distributed system. In the last decade, however, computing has gone through some major paradigm shifts, with the advent of multicore architectures and large cloud infrastructures. This dissertation explains how these profound changes impact the practical usefulness of traditional fault tolerant techniques and proposes new ways to architect these solutions to fit the new paradigms.Computer Science
IoTSan: Fortifying the Safety of IoT Systems
Today's IoT systems include event-driven smart applications (apps) that
interact with sensors and actuators. A problem specific to IoT systems is that
buggy apps, unforeseen bad app interactions, or device/communication failures,
can cause unsafe and dangerous physical states. Detecting flaws that lead to
such states, requires a holistic view of installed apps, component devices,
their configurations, and more importantly, how they interact. In this paper,
we design IoTSan, a novel practical system that uses model checking as a
building block to reveal "interaction-level" flaws by identifying events that
can lead the system to unsafe states. In building IoTSan, we design novel
techniques tailored to IoT systems, to alleviate the state explosion associated
with model checking. IoTSan also automatically translates IoT apps into a
format amenable to model checking. Finally, to understand the root cause of a
detected vulnerability, we design an attribution mechanism to identify
problematic and potentially malicious apps. We evaluate IoTSan on the Samsung
SmartThings platform. From 76 manually configured systems, IoTSan detects 147
vulnerabilities. We also evaluate IoTSan with malicious SmartThings apps from a
previous effort. IoTSan detects the potential safety violations and also
effectively attributes these apps as malicious.Comment: Proc. of the 14th ACM CoNEXT, 201
Recommended from our members
Modular and Safe Event-Driven Programming
Asynchronous event-driven systems are ubiquitous across domains such as device drivers, distributed systems, and robotics. These systems are notoriously hard to get right as the programmer needs to reason about numerous control paths resulting from the complex interleaving of events (or messages) and failures. Unsurprisingly, it is easy to introduce subtle errors while attempting to fill in gaps between high-level system specifications and their concrete implementations.This dissertation proposes new methods for programming safe event-driven asynchronous systems.In the first part of the thesis, we present ModP, a modular programming framework for compositional programming and testing of event-driven asynchronous systems.The ModP module system supports a novel theory of compositional refinement for assume-guarantee reasoning of dynamic event-driven asynchronous systems. We build a complex distributed systems software stack using ModP.Our results demonstrate that compositional reasoning can help scale model-checking (both explicit and symbolic) to large distributed systems.ModP is transforming the way asynchronous software is built at Microsoft and Amazon Web Services (AWS). Microsoft uses ModP for implementing safe device drivers and other software in the Windows kernel.AWS uses ModP for compositional model checking of complex distributed systems. While ModP simplifies analysis of such systems, the state space of industrial-scale systems remains extremely large.In the second part of this thesis, we present scalable verification and systematic testing approaches to further mitigate this state-space explosion problem.First, we introduce the concept of a delaying explorer to perform prioritized exploration of the behaviors of an asynchronous reactive program. A delaying explorer stratifies the search space using a custom strategy (tailored towards finding bugs faster), and a delay operation that allows deviation from that strategy. We show that prioritized search with a delaying explorer performs significantly better than existing approaches for finding bugs in asynchronous programs.Next, we consider the challenge of verifying time-synchronized systems; these are almost-synchronous systems as they are neither completely asynchronous nor synchronous.We introduce approximate synchrony, a sound and tunable abstraction for verification of almost-synchronous systems. We show how approximate synchrony can be used for verification of both time-synchronization protocols and applications running on top of them.Moreover, we show how approximate synchrony also provides a useful strategy to guide state-space exploration during model-checking.Using approximate synchrony and implementing it as a delaying explorer, we were able to verify the correctness of the IEEE 1588 distributed time-synchronization protocol and, in the process, uncovered a bug in the protocol that was well appreciated by the standards committee.In the final part of this thesis, we consider the challenge of programming a special class of event-driven asynchronous systems -- safe autonomous robotics systems.Our approach towards achieving assured autonomy for robotics systems consists of two parts: (1) a high-level programming language for implementing and validating the reactive robotics software stack; and (2) an integrated runtime assurance system to ensure that the assumptions used during design-time validation of the high-level software hold at runtime.Combining high-level programming language and model-checking with runtime assurance helps us bridge the gap between design-time software validation that makes assumptions about the untrusted components (e.g., low-level controllers), and the physical world, and the actual execution of the software on a real robotic platform in the physical world. We implemented our approach as DRONA, a programming framework for building safe robotics systems.We used DRONA for building a distributed mobile robotics system and deployed it on real drone platforms. Our results demonstrate that DRONA (with the runtime-assurance capabilities) enables programmers to build an autonomous robotics software stack with formal safety guarantees.To summarize, this thesis contributes new theory and tools to the areas of programming languages, verification, systematic testing, and runtime assurance for programming safe asynchronous event-driven across the domains of fault-tolerant distributed systems and safe autonomous robotics systems
- …