732 research outputs found
Optimistic replication
Data replication is a key technology in distributed data sharing systems, enabling higher availability and performance. This paper surveys optimistic replication algorithms that allow replica contents to diverge in the short term, in order to support concurrent work practices and to tolerate failures in low-quality communication links. The importance of such techniques is increasing as collaboration through wide-area and mobile networks becomes popular. Optimistic replication techniques are different from traditional “pessimistic ” ones. Instead of synchronous replica coordination, an optimistic algorithm propagates changes in the background, discovers conflicts after they happen and reaches agreement on the final contents incrementally. We explore the solution space for optimistic replication algorithms. This paper identifies key challenges facing optimistic replication systems — ordering operations, detecting and resolving conflicts, propagating changes efficiently, and bounding replica divergence — and provides a comprehensive survey of techniques developed for addressing these challenges
Multi-value distributed key-value stores
Tese de Doutoramento em InformaticsMany large scale distributed data stores rely on optimistic replication to
scale and remain highly available in the face of network partitions. Managing
data without strong coordination results in eventually consistent
data stores that allow for concurrent data updates. To allow writing applications
in the absence of linearizability or transactions, the seminal
Dynamo data store proposed a multi-value API in which a get returns
the set of concurrent written values. In this scenario, it is important to
be able to accurately and efficiently identify updates executed concurrently.
Logical clocks are often used to track data causality, necessary
to distinguish concurrent from causally related writes on the same key.
However, in traditional mechanisms there is a non-negligible metadata
overhead per key, which also keeps growing with time, proportional to
the node churn rate. Another challenge is deleting keys while respecting
causality: while the values can be deleted, per-key metadata cannot
be permanently removed in current data stores.
These systems often use anti-entropy mechanisms (like Merkle Trees)
to detect and repair divergent data versions across nodes. However,
in practice hash-based data structures are not suitable to a store using
consistent hashing and create too many false positives.
Also, highly available systems usually provide eventual consistency,
which is the weakest form of consistency. This results in a programming
model difficult to use and to reason about. It has been proved that
causal consistency is the strongest consistency model achievable if we
want highly available services. It provides better programming semantics
such as sessions guarantees. However, classical causal consistency
is a memory model that that is problematic for concurrent updates, in
the absence of concurrency control primitives. Used in eventually consistent
data stores, it leads to arbitrating between concurrent updates
which leads to data loss. We propose three novel techniques in this thesis. The first is Dotted
Version Vectors: a solution that combines a new logical clock mechanism
and a request handling workflow that together support the traditional
Dynamo key-value store API while capturing causality in an
accurate and scalable way, avoiding false conflicts. It maintains concise
information per version, linear only on the number of replicas, and includes
a container data structure that allows sets of concurrent versions
to be merged efficiently, with time complexity linear on the number of
replicas plus versions.
The second is DottedDB: a Dynamo-like key-value store, which uses
a novel node-wide logical clock framework, overcoming three fundamental
limitations of the state of the art: (1) minimize the metadata per
key necessary to track causality, avoiding its growth even in the face
of node churn; (2) correctly and durably delete keys, with no need for
tombstones; (3) offer a lightweight anti-entropy mechanism to converge
replicated data, avoiding the need for Merkle Trees.
The third and final contribution is Causal Multi-Value Consistency: a
novel consistency model that respects the causality of client operations
while properly supporting concurrent updates without arbitration, by
having the same Dynamo-like multi-value nature. In addition, we extend
this model to provide the same semantics with read and write
transactions. For both models, we define an efficient implementation
on top of a distributed key-value store.Várias bases de dados de larga escala usam técnicas de replicação otimista
para escalar e permanecer altamente disponíveis face a falhas e partições
na rede. Gerir os dados sem coordenação forte entre os nós
do servidor e o cliente resulta em bases de dados "inevitavelmente coerentes"
que permitem escritas de dados concorrentes. Para permitir
que aplicações escrevam na base de dados na ausência de transações
e mecanismos de coerência forte, a influente base de dados Dynamo
propôs uma interface multi-valor, que permite a uma leitura devolver
um conjunto de valores escritos concorrentemente para a mesma chave.
Neste cenário, é importante identificar com exatidão e eficiência quais
as escritas efetuadas numa chave de forma potencialmente concorrente.
Relógios lógicos são normalmente usados para gerir a causalidade das
chaves, de forma a detetar escritas causalmente concorrentes na mesma
chave. No entanto, mecanismos tradicionais adicionam metadados cujo
tamanho cresce proporcionalmente com a entrada e saída de nós no
servidor. Outro desafio é a remoção de chaves do sistema, respeitando
a causalidade e ao mesmo tempo não deixando metadados permanentes
no servidor.
Estes sistemas de dados utilizam também mecanismos de anti-entropia
(tais como Merkle Trees) para detetar e reparar dados replicados em diferentes
nós que divirjam. No entanto, na prática estas estruturas de dados
baseadas em hashes não são adequados para sistemas que usem hashing
consistente para a partição de dados e resultam em muitos falsos positivos.
Outro aspeto destes sistemas é o facto de normalmente apenas suportarem
coerência inevitável, que é a garantia mais fraca em termos
de coerência de dados. Isto resulta num modelo de programação difícil
de usar e compreender. Foi provado que coerência causal é a forma
mais forte de coerência de dados que se consegue fornecer, de forma a que se consiga também ser altamente disponível face a falhas. Este
modelo fornece uma semântica mais interessante ao cliente do sistema,
nomeadamente as garantias de sessão. No entanto, a coerência causal
tradicional é definida sobre um modelo de memória não apropriado
para escritas concorrentes não controladas. Isto leva a que se arbitre
um vencedor quando escritas acontecem concorrentemente, levando a
perda de dados.
Propomos nesta tese três novas técnicas. A primeira chama-se Dotted
Version Vectors: uma solução que combina um novo mecanismo de
relógios lógicos com uma interação entre o cliente e o servidor, que permitem
fornecer uma interface multi-valor ao cliente similar ao Dynamo
de forma eficiente e escalável, sem falsos conflitos. O novo relógio lógico
mantém informação precisa por versão de uma chave, de tamanho linear
no número de réplicas da chave no sistema. Permite também que
versão diferentes sejam corretamente e eficientemente reunidas.
A segunda contribuição chama-se DottedDB: uma base de dados similar
ao Dynamo, mas que implementa um novo mecanismo de relógios
lógicos ao nível dos nós, que resolve três limitações fundamentais do estado
da arte: (1) minimiza os metadados necessários manter por chave
para gerir a causalidade, evitando o seu crescimento com a entrada e
saída de nós; (2) permite remover chaves de forma permanente, sem
a necessidade de manter metadados indefinidamente no servidor; (3)
um novo protocolo de anti-entropia para reparar dados replicados, de
modo a que todas as réplicas na base de dados convirjam, sem que seja
necessário operações dispendiosas como as usadas com Merkle Trees.
A terceira e última contribuição é Coerência Causal Multi-Valor: um
novo modelo de coerência de dados que respeita a causalidade das operações
efetuadas pelos clientes e que também suporta operações concorrentes,
sem que seja necessário arbitrar um vencedor entre as escritas,
seguindo o espírito da interface multi-valor do Dynamo. Adicionalmente,
estendemos este modelo para fornecer transações de escritas ou
leituras, respeitando a mesma semântica da causalidade. Para ambos
os modelos, definimos uma implementação eficiente em cima de uma
base de dados distribuída.Fundação para a Ciência e Tecnologia (FCT) - with the research grant SFRH/BD/86735/201
Reliability Mechanisms for Controllers in Real-Time Cyber-Physical Systems
Cyber-physical systems (CPSs) are real-world processes that are controlled by computer algorithms. We consider CPSs where a centralized, software-based controller maintains the process in a desired state by exchanging measurements and setpoints with process agents (PAs). As CPSs control processes with low-inertia, e.g., electric grids and autonomous cars, the controller needs to satisfy stringent real-time constraints.
However, the controllers are susceptible to delay and crash faults, and the communication network might drop, delay or reorder messages. This degrades the quality of control of the physical process, failure of which can result in damage to life or property. Existing reliability solutions are either not well-suited for real-time CPSs or impose serious restrictions on the controllers. In this thesis, we design, implement and evaluate reliability mechanisms for real-time CPS controllers that require minimal modifications to the controller itself.
We begin by abstracting the execution of a CPS using events in the CPS, and the two inherent relations among those events, namely network and computation relations. We use these relations to introduce the intentionality relation that uses these events to capture the state of the physical process. Based on the intentionality relation, we define three correctness properties namely, state safety, optimal selection and consistency, that together provide linearizability (one-copy equivalence) for CPS controllers.
We propose intentionality clocks and Quarts, and prove that they provide linearizability. To provide consistency, Quarts ensures agreement among controller replicas, which is typically achieved using consensus. Consensus can add an unbounded-latency overhead. Quarts leverages the properties specific to CPSs to perform agreement using pre-computed priorities among sets of received measurements, resulting in a bounded-latency overhead with high availability. Using simulation, we show that availability of Quarts, with two replicas, is more than an order of magnitude higher than consensus.
We also propose Axo, a fault-tolerance protocol that uses active replication to detect and recover faulty replicas, and provide timeliness that requires delayed setpoints be masked from the PAs. We study the effect of delay faults and the impact of fault-tolerance with Axo, by deploying Axo in two real-world CPSs.
Then, we realize that the proposed reliability mechanisms also apply to unconventional CPSs such as software defined networking (SDN), where the controlled process is the routing fabric of the network. We show that, in SDN, violating consistency can cause implementation of incorrect routing policies. Thus, we use Quarts and intentionality clocks, to design and implement QCL, a coordination layer for SDN controllers that guarantees control-plane consistency. QCL also drastically reduces the response time of SDN controllers when compared to consensus-based techniques.
In the last part of the thesis, we address the problem of reliable communication between the software agents, in a wide-area network that can drop, delay or reorder messages. For this, we propose iPRP, an IP-friendly parallel redundancy protocol for 0 ms repair of packet losses. iPRP requires fail-independent paths for high-reliability. So, we study the fail-independence of Wi-Fi links using real-life measurements, as a first step towards using Wi-Fi for real-time communication in CPSs
Computer Aided Verification
This open access two-volume set LNCS 11561 and 11562 constitutes the refereed proceedings of the 31st International Conference on Computer Aided Verification, CAV 2019, held in New York City, USA, in July 2019. The 52 full papers presented together with 13 tool papers and 2 case studies, were carefully reviewed and selected from 258 submissions. The papers were organized in the following topical sections: Part I: automata and timed systems; security and hyperproperties; synthesis; model checking; cyber-physical systems and machine learning; probabilistic systems, runtime techniques; dynamical, hybrid, and reactive systems; Part II: logics, decision procedures; and solvers; numerical programs; verification; distributed systems and networks; verification and invariants; and concurrency
Principles of Security and Trust: 7th International Conference, POST 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings
authentication; computer science; computer software selection and evaluation; cryptography; data privacy; formal logic; formal methods; formal specification; internet; privacy; program compilers; programming languages; security analysis; security systems; semantics; separation logic; software engineering; specifications; verification; world wide we
A Prescription for Partial Synchrony
Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals.
Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M_*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size
and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic
constructions. Consequently, empirical deployment of the many M_*-based algorithms risks anomalous behavior.
As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is fairness: a time base measured
by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M_* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M_* models on a significant subset of the empirical systems. The proposed techniques use failure detectors — system
services that provide hints about process crashes — as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M_* models are now proved correct in such empirical systems as well.
Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors
An Exploratory Analysis Of A Time Synchronization Protocol For UAS
This dissertation provides a numerical analysis of a Receiver Only Synchronization (ROS) protocol which is proposed for use by Unmanned Aircraft Systems (UAS) in Beyond Visual Line of Sight (BVLOS) operations. The use of ROS protocols could reinforce current technologies that enable transmission over 5G cell networks, decreasing latency issues and enabling the incorporation of an increased number of UAS to the network, without loss of accuracy. A minimum squared error (MSE)-based accuracy of clock offset and clock skew estimations was obtained using the number of iterations and number of observations as independent parameters. Although the model converged after only four iterations, the number of observations needed was considerably large, of no less than about 250. The noise, introduced in the system through the first residual, the correlation parameter and the disturbance terms, was assumed to be autocorrelated. Previous studies suggested that correlated noise might be typical in multipath scenarios, or in case of damaged antennas. Four noise distributions: gaussian, exponential, gamma and Weibull were considered. Each of them is adapted to different noise sources in the OSI model. Dispersion of results in the first case, the only case with zero mean, was checked against the Cramér-Rao Bound (CRB) limit. Results confirmed that the scheme proposed was fully efficient. Moreover, results with the other three cases were less promising, thus demonstrating that only zero mean distributions could deliver good results. This fact would limit the proposed scheme application in multipath scenarios, where echoes of previous signals may reach the receiver at delayed times. In the second part, a wake/sleep scheme was imposed on the model, concluding that for wake/sleep ratios below 92/08 results were not accurate at p=.05 level. The study also evaluated the impact of noise levels in the time domain and showed that above -2dB in time a substantial contribution of error terms disturbed the initial estimations significantly. The tests were performed in Matlab®. Based on the results, three venues confirming the assumptions made were proposed for future work. Some final reflections on the use of 5G in aviation brought the present dissertation to a close
Recommended from our members
Towards secure & robust PNT for automated systems
This dissertation makes four contributions in support of secure and robust position, navigation, and timing (PNT) for automated systems. The first two relate to PNT security while the latter two address robust positioning for automated ground vehicles.
The first contribution is a fundamental theory for provably-secure clock synchronization between two agents in a distributed automated system. All one-way synchronization protocols, such as those based on the Global Positioning System (GPS) and other Global Navigation Satellite Systems (GNSS), are shown to be vulnerable to man-in-the-middle delay attacks. This contribution is the first to identify the necessary and sufficient conditions for provably secure clock synchronization.
The second contribution, also related to PNT security, is a three-year study of the world-wide GPS interference landscape based on data from a dual-frequency GNSS receiver operating continuously on the International Space Station (ISS). This work is the first publicly-reported space-based survey of GNSS interference, and unveils previously-unreported GNSS interference activity.
The third contribution is a novel ground vehicle positioning technique that is robust to GNSS signal blockage, poor lighting conditions, and adverse weather events such as heavy rain and dense fog. The technique relies on sensors that are commonly available on automated vehicles and are insensitive to lighting and inclement weather: automotive radar, low-cost inertial measurement units (IMUs), and GNSS. Remarkably, it is shown that, given a prior radar map, the proposed technique operating on data from off-the-shelf all-weather automotive sensors can maintain sub-50-cm horizontal position accuracy during 60 min of GNSS-denied driving in downtown Austin, TX.
This dissertation’s final contribution is an analysis and demonstration of the feasibility of crowd-sourced digital mapping for automated vehicles. Localization techniques, such as the one described in the previous contribution, rely on such digital maps for accuracy and robustness. A key enabler for large-scale up-to-date maps is enlisting the help of the very consumer vehicles that need the map to build and update it. A method for fusing multi-session vision data into a unified digital map is developed. The asymptotic limit of such a map’s globally-referenced position accuracy is explored for the case in which the mapping agents rely on low-cost GNSS receivers performing standard code-phase-based navigation. Experimental validation along a semi-urban route shows that low-cost consumer vehicles incrementally tighten the accuracy of the jointly-optimized digital map over time enough to support sub-lane-level positioning in a global frame of reference.Electrical and Computer Engineerin
- …