Search CORE

1,535 research outputs found

Grove: a Separation-Logic Library for Verifying Distributed Systems (Extended Version)

Author: Jung Ralf
Kaashoek M. Frans
Sharma Upamanyu
Tassarotti Joseph
Zeldovich Nickolai
Publication venue
Publication date: 06/09/2023
Field of study

Grove is a concurrent separation logic library for verifying distributed systems. Grove is the first to handle time-based leases, including their interaction with reconfiguration, crash recovery, thread-level concurrency, and unreliable networks. This paper uses Grove to verify several distributed system components written in Go, including GroveKV, a realistic distributed multi-threaded key-value store. GroveKV supports reconfiguration, primary/backup replication, and crash recovery, and uses leases to execute read-only requests on any replica. GroveKV achieves high performance (67-73% of Redis on a single core), scales with more cores and more backup replicas (achieving about 2x the throughput when going from 1 to 3 servers), and can safely execute reads while reconfiguring.Comment: Extended version of paper appearing at SOSP 202

arXiv.org e-Print Archive

Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants

Author: Balegas Valter
Duarte Sérgio
Ferreira Carla
Najafzadeh Mahsa
Preguiça Nuno
Rodrigues Rodrigo
Serra Diogo
Shapiro Marc
Publication venue
Publication date: 31/03/2015
Field of study

Geo-replicated databases often operate under the principle of eventual consistency to offer high-availability with low latency on a simple key/value store abstraction. Recently, some have adopted commutative data types to provide seamless reconciliation for special purpose data types, such as counters. Despite this, the inability to enforce numeric invariants across all replicas still remains a key shortcoming of relying on the limited guarantees of eventual consistency storage. We present a new replicated data type, called bounded counter, which adds support for numeric invariants to eventually consistent geo-replicated databases. We describe how this can be implemented on top of existing cloud stores without modifying them, using Riak as an example. Our approach adapts ideas from escrow transactions to devise a solution that is decentralized, fault-tolerant and fast. Our evaluation shows much lower latency and better scalability than the traditional approach of using strong consistency to enforce numeric invariants, thus alleviating the tension between consistency and availability

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Verifying and Enforcing Application Constraints in Antidote SQL

Author: Borrego Dina dos Santos
Publication venue
Publication date: 01/11/2022
Field of study

Geo-replicated storage systems are currently a fundamental piece in the development of large-scale applications where users are distributed across the world. To meet the high requirements regarding la- tency and availability of these applications, these database systems are forced to use weak consistency mechanisms. However, under these consistency models, there is no guarantee that the invariants are preserved, which can jeopardise the correctness of applications. The most obvious alternative to solve this problem would be to use strong consistency, but this would place a large burden on the system. Since neither of these options was feasible, many systems have been developed to preserve the invariants of the applications without sacrificing low latency and high availability. These systems, based on the analysis of operations, make it possible to increase the guarantees of weak consistency by introducing consistency at the level of operations that are potentially dangerous to the invariant. Antidote SQL is a database system that, by combining strong with weak consistency mechanisms, attempts to guarantee the preservation of invariants at the data level. In this way, and after defining the concurrency semantics for the application, any operation can be performed without coordination and without the risk of violating the invariant. However, this approach has some limitations, namely the fact that it is not trivial for developers to define appropriate concurrency semantics. In this document, we propose a methodology for the verification and validation of defined prop- erties, such as invariants, for applications using Antidote SQL. The proposed methodology uses a high-level programming language with automatic verification features called VeriFx and provides guidelines for programmers who wish to implement and verify their own systems and specifications using this tool.Os sistemas de armazenamento geo-replicados são atualmente uma peça fundamental no desenvolvi- mento de aplicações de grande escala em que os utilizadores se encontram espalhados pelo mundo. Com o objetivo de satisfazer os elevados requisitos em relação à latência e à disponibilidade destas aplicações, estes sistemas de bases de dados vêem-se obrigados a recorrer a mecanismos de consistên- cia fracos. No entanto, sob estes modelos de consistência não existe qualquer tipo de garantia de que os invariantes são preservados, o que pode colocar em causa a correção das aplicações. A alternativa mais óbvia para resolver este problema passaria por utilizar consistência forte, no entanto esta incutiria uma grande sobrecarga no sistema. Sendo que nenhuma destas opções é viável, muitos sistemas foram desenvolvidos no sentido de preservar os invariantes das aplicações, sem contudo, abdicar de baixas latências e alta disponibilidade. Estes sistemas, baseados na análise das operações, permitem aumentar as garantias de consistência fraca com a introdução de consistência ao nível das operações potencialmente perigosas para o invari- ante. O Antidote SQL é um sistema de base de dados que através da combinação de mecanismos de consistência fortes com mecanismos de consistência fracos tenta garantir a preservação dos invariantes ao nível dos dados. Desta forma, e depois de definidas as semânticas de concorrência para a aplicação, qualquer operação pode ser executada sem coordenação e sem perigo de quebra do invariante. No entanto esta abordagem apresenta algumas limitações nomeadamente o facto de não ser trivial para os programadores definirem as semânticas de concorrência adequadas. Neste documento propomos uma metodologia para a verificação e validação de propriedades defi- nidas, como os invariantes, para aplicações que usam o Antidote SQL. A metodologia proposta utiliza uma linguagem de programação de alto nível com capacidade de verificação automática designada por VeriFx, e fornece as diretrizes a seguir para que o programador consiga implementar e verificar os seus próprios sistemas e especificações, utilizando a ferramenta

Repositório da Universidade Nova de Lisboa

Recommended from our members

Replicating multithreaded services

Author: Kapritsos Emmanouil
Publication venue
Publication date: 09/02/2015
Field of study

textFor the last 40 years, the systems community has invested a lot of effort in designing techniques for building fault tolerant distributed systems and services. This effort has produced a massive list of results: the literature describes how to design replication protocols that tolerate a wide range of failures (from simple crashes to malicious "Byzantine" failures) in a wide range of settings (e.g. synchronous or asynchronous communication, with or without stable storage), optimizing various metrics (e.g. number of messages, latency, throughput). These techniques have their roots in ideas, such as the abstraction of State Machine Replication and the Paxos protocol, that were conceived when computing was very different than it is today: computers had a single core; all processing was done using a single thread of control, handling requests sequentially; and a collection of 20 nodes was considered a large distributed system. In the last decade, however, computing has gone through some major paradigm shifts, with the advent of multicore architectures and large cloud infrastructures. This dissertation explains how these profound changes impact the practical usefulness of traditional fault tolerant techniques and proposes new ways to architect these solutions to fit the new paradigms.Computer Science

Texas ScholarWorks

IoTSan: Fortifying the Safety of IoT Systems

Author: Beyer Dirk
Busold Christoph
Cattel T.
Celik Z. Berkay
Chaves J.
Costin A.
Croft Jason
Fernandes E.
Holzmann G. J.
Holzmann G.J.
Jia Y. J.
Memon M. U.
Ronen E.
Shin Hocheol
Tian Yuan
Wang Qi
Xiao F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/10/2018
Field of study

Today's IoT systems include event-driven smart applications (apps) that interact with sensors and actuators. A problem specific to IoT systems is that buggy apps, unforeseen bad app interactions, or device/communication failures, can cause unsafe and dangerous physical states. Detecting flaws that lead to such states, requires a holistic view of installed apps, component devices, their configurations, and more importantly, how they interact. In this paper, we design IoTSan, a novel practical system that uses model checking as a building block to reveal "interaction-level" flaws by identifying events that can lead the system to unsafe states. In building IoTSan, we design novel techniques tailored to IoT systems, to alleviate the state explosion associated with model checking. IoTSan also automatically translates IoT apps into a format amenable to model checking. Finally, to understand the root cause of a detected vulnerability, we design an attribution mechanism to identify problematic and potentially malicious apps. We evaluate IoTSan on the Samsung SmartThings platform. From 76 manually configured systems, IoTSan detects 147 vulnerabilities. We also evaluate IoTSan with malicious SmartThings apps from a previous effort. IoTSan detects the potential safety violations and also effectively attributes these apps as malicious.Comment: Proc. of the 14th ACM CoNEXT, 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Modular and Safe Event-Driven Programming

Author: Desai Ankush Pankaj
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Asynchronous event-driven systems are ubiquitous across domains such as device drivers, distributed systems, and robotics. These systems are notoriously hard to get right as the programmer needs to reason about numerous control paths resulting from the complex interleaving of events (or messages) and failures. Unsurprisingly, it is easy to introduce subtle errors while attempting to fill in gaps between high-level system specifications and their concrete implementations.This dissertation proposes new methods for programming safe event-driven asynchronous systems.In the first part of the thesis, we present ModP, a modular programming framework for compositional programming and testing of event-driven asynchronous systems.The ModP module system supports a novel theory of compositional refinement for assume-guarantee reasoning of dynamic event-driven asynchronous systems. We build a complex distributed systems software stack using ModP.Our results demonstrate that compositional reasoning can help scale model-checking (both explicit and symbolic) to large distributed systems.ModP is transforming the way asynchronous software is built at Microsoft and Amazon Web Services (AWS). Microsoft uses ModP for implementing safe device drivers and other software in the Windows kernel.AWS uses ModP for compositional model checking of complex distributed systems. While ModP simplifies analysis of such systems, the state space of industrial-scale systems remains extremely large.In the second part of this thesis, we present scalable verification and systematic testing approaches to further mitigate this state-space explosion problem.First, we introduce the concept of a delaying explorer to perform prioritized exploration of the behaviors of an asynchronous reactive program. A delaying explorer stratifies the search space using a custom strategy (tailored towards finding bugs faster), and a delay operation that allows deviation from that strategy. We show that prioritized search with a delaying explorer performs significantly better than existing approaches for finding bugs in asynchronous programs.Next, we consider the challenge of verifying time-synchronized systems; these are almost-synchronous systems as they are neither completely asynchronous nor synchronous.We introduce approximate synchrony, a sound and tunable abstraction for verification of almost-synchronous systems. We show how approximate synchrony can be used for verification of both time-synchronization protocols and applications running on top of them.Moreover, we show how approximate synchrony also provides a useful strategy to guide state-space exploration during model-checking.Using approximate synchrony and implementing it as a delaying explorer, we were able to verify the correctness of the IEEE 1588 distributed time-synchronization protocol and, in the process, uncovered a bug in the protocol that was well appreciated by the standards committee.In the final part of this thesis, we consider the challenge of programming a special class of event-driven asynchronous systems -- safe autonomous robotics systems.Our approach towards achieving assured autonomy for robotics systems consists of two parts: (1) a high-level programming language for implementing and validating the reactive robotics software stack; and (2) an integrated runtime assurance system to ensure that the assumptions used during design-time validation of the high-level software hold at runtime.Combining high-level programming language and model-checking with runtime assurance helps us bridge the gap between design-time software validation that makes assumptions about the untrusted components (e.g., low-level controllers), and the physical world, and the actual execution of the software on a real robotic platform in the physical world. We implemented our approach as DRONA, a programming framework for building safe robotics systems.We used DRONA for building a distributed mobile robotics system and deployed it on real drone platforms. Our results demonstrate that DRONA (with the runtime-assurance capabilities) enables programmers to build an autonomous robotics software stack with formal safety guarantees.To summarize, this thesis contributes new theory and tools to the areas of programming languages, verification, systematic testing, and runtime assurance for programming safe asynchronous event-driven across the domains of fault-tolerant distributed systems and safe autonomous robotics systems

eScholarship - University of California

ProQuest OAI Repository