7,672 research outputs found
Model-Based Proactive Read-Validation in Transaction Processing Systems
Concurrency control protocols based on read-validation schemes allow transactions which are doomed to abort to still run until a subsequent validation check reveals them as invalid. These late aborts do not favor the reduction of wasted computation and can penalize performance. To counteract this problem, we present an analytical model that predicts the abort probability of transactions handled via read-validation schemes. Our goal is to determine what are the suited points-along a transaction lifetime-to carry out a validation check. This may lead to early aborting doomed transactions, thus saving CPU time. We show how to exploit the abort probability predictions returned by the model in combination with a threshold-based scheme to trigger read-validations. We also show how this approach can definitely improve performance-leading up to 14 % better turnaround-as demonstrated by some experiments carried out with a port of the TPC-C benchmark to Software Transactional Memory
Exploiting replication in distributed systems
Techniques are examined for replicating data and execution in directly distributed systems: systems in which multiple processes interact directly with one another while continuously respecting constraints on their joint behavior. Directly distributed systems are often required to solve difficult problems, ranging from management of replicated data to dynamic reconfiguration in response to failures. It is shown that these problems reduce to more primitive, order-based consistency problems, which can be solved using primitives such as the reliable broadcast protocols. Moreover, given a system that implements reliable broadcast primitives, a flexible set of high-level tools can be provided for building a wide variety of directly distributed application programs
Maintaining consistency in distributed systems
In systems designed as assemblies of independently developed components, concurrent access to data or data structures normally arises within individual programs, and is controlled using mutual exclusion constructs, such as semaphores and monitors. Where data is persistent and/or sets of operation are related to one another, transactions or linearizability may be more appropriate. Systems that incorporate cooperative styles of distributed execution often replicate or distribute data within groups of components. In these cases, group oriented consistency properties must be maintained, and tools based on the virtual synchrony execution model greatly simplify the task confronting an application developer. All three styles of distributed computing are likely to be seen in future systems - often, within the same application. This leads us to propose an integrated approach that permits applications that use virtual synchrony with concurrent objects that respect a linearizability constraint, and vice versa. Transactional subsystems are treated as a special case of linearizability
Archiving the Relaxed Consistency Web
The historical, cultural, and intellectual importance of archiving the web
has been widely recognized. Today, all countries with high Internet penetration
rate have established high-profile archiving initiatives to crawl and archive
the fast-disappearing web content for long-term use. As web technologies
evolve, established web archiving techniques face challenges. This paper
focuses on the potential impact of the relaxed consistency web design on
crawler driven web archiving. Relaxed consistent websites may disseminate,
albeit ephemerally, inaccurate and even contradictory information. If captured
and preserved in the web archives as historical records, such information will
degrade the overall archival quality. To assess the extent of such quality
degradation, we build a simplified feed-following application and simulate its
operation with synthetic workloads. The results indicate that a non-trivial
portion of a relaxed consistency web archive may contain observable
inconsistency, and the inconsistency window may extend significantly longer
than that observed at the data store. We discuss the nature of such quality
degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201
Multi-value distributed key-value stores
Tese de Doutoramento em InformaticsMany large scale distributed data stores rely on optimistic replication to
scale and remain highly available in the face of network partitions. Managing
data without strong coordination results in eventually consistent
data stores that allow for concurrent data updates. To allow writing applications
in the absence of linearizability or transactions, the seminal
Dynamo data store proposed a multi-value API in which a get returns
the set of concurrent written values. In this scenario, it is important to
be able to accurately and efficiently identify updates executed concurrently.
Logical clocks are often used to track data causality, necessary
to distinguish concurrent from causally related writes on the same key.
However, in traditional mechanisms there is a non-negligible metadata
overhead per key, which also keeps growing with time, proportional to
the node churn rate. Another challenge is deleting keys while respecting
causality: while the values can be deleted, per-key metadata cannot
be permanently removed in current data stores.
These systems often use anti-entropy mechanisms (like Merkle Trees)
to detect and repair divergent data versions across nodes. However,
in practice hash-based data structures are not suitable to a store using
consistent hashing and create too many false positives.
Also, highly available systems usually provide eventual consistency,
which is the weakest form of consistency. This results in a programming
model difficult to use and to reason about. It has been proved that
causal consistency is the strongest consistency model achievable if we
want highly available services. It provides better programming semantics
such as sessions guarantees. However, classical causal consistency
is a memory model that that is problematic for concurrent updates, in
the absence of concurrency control primitives. Used in eventually consistent
data stores, it leads to arbitrating between concurrent updates
which leads to data loss. We propose three novel techniques in this thesis. The first is Dotted
Version Vectors: a solution that combines a new logical clock mechanism
and a request handling workflow that together support the traditional
Dynamo key-value store API while capturing causality in an
accurate and scalable way, avoiding false conflicts. It maintains concise
information per version, linear only on the number of replicas, and includes
a container data structure that allows sets of concurrent versions
to be merged efficiently, with time complexity linear on the number of
replicas plus versions.
The second is DottedDB: a Dynamo-like key-value store, which uses
a novel node-wide logical clock framework, overcoming three fundamental
limitations of the state of the art: (1) minimize the metadata per
key necessary to track causality, avoiding its growth even in the face
of node churn; (2) correctly and durably delete keys, with no need for
tombstones; (3) offer a lightweight anti-entropy mechanism to converge
replicated data, avoiding the need for Merkle Trees.
The third and final contribution is Causal Multi-Value Consistency: a
novel consistency model that respects the causality of client operations
while properly supporting concurrent updates without arbitration, by
having the same Dynamo-like multi-value nature. In addition, we extend
this model to provide the same semantics with read and write
transactions. For both models, we define an efficient implementation
on top of a distributed key-value store.Várias bases de dados de larga escala usam técnicas de replicação otimista
para escalar e permanecer altamente disponíveis face a falhas e partições
na rede. Gerir os dados sem coordenação forte entre os nós
do servidor e o cliente resulta em bases de dados "inevitavelmente coerentes"
que permitem escritas de dados concorrentes. Para permitir
que aplicações escrevam na base de dados na ausência de transações
e mecanismos de coerência forte, a influente base de dados Dynamo
propôs uma interface multi-valor, que permite a uma leitura devolver
um conjunto de valores escritos concorrentemente para a mesma chave.
Neste cenário, é importante identificar com exatidão e eficiência quais
as escritas efetuadas numa chave de forma potencialmente concorrente.
Relógios lógicos são normalmente usados para gerir a causalidade das
chaves, de forma a detetar escritas causalmente concorrentes na mesma
chave. No entanto, mecanismos tradicionais adicionam metadados cujo
tamanho cresce proporcionalmente com a entrada e saída de nós no
servidor. Outro desafio é a remoção de chaves do sistema, respeitando
a causalidade e ao mesmo tempo não deixando metadados permanentes
no servidor.
Estes sistemas de dados utilizam também mecanismos de anti-entropia
(tais como Merkle Trees) para detetar e reparar dados replicados em diferentes
nós que divirjam. No entanto, na prática estas estruturas de dados
baseadas em hashes não são adequados para sistemas que usem hashing
consistente para a partição de dados e resultam em muitos falsos positivos.
Outro aspeto destes sistemas é o facto de normalmente apenas suportarem
coerência inevitável, que é a garantia mais fraca em termos
de coerência de dados. Isto resulta num modelo de programação difícil
de usar e compreender. Foi provado que coerência causal é a forma
mais forte de coerência de dados que se consegue fornecer, de forma a que se consiga também ser altamente disponível face a falhas. Este
modelo fornece uma semântica mais interessante ao cliente do sistema,
nomeadamente as garantias de sessão. No entanto, a coerência causal
tradicional é definida sobre um modelo de memória não apropriado
para escritas concorrentes não controladas. Isto leva a que se arbitre
um vencedor quando escritas acontecem concorrentemente, levando a
perda de dados.
Propomos nesta tese três novas técnicas. A primeira chama-se Dotted
Version Vectors: uma solução que combina um novo mecanismo de
relógios lógicos com uma interação entre o cliente e o servidor, que permitem
fornecer uma interface multi-valor ao cliente similar ao Dynamo
de forma eficiente e escalável, sem falsos conflitos. O novo relógio lógico
mantém informação precisa por versão de uma chave, de tamanho linear
no número de réplicas da chave no sistema. Permite também que
versão diferentes sejam corretamente e eficientemente reunidas.
A segunda contribuição chama-se DottedDB: uma base de dados similar
ao Dynamo, mas que implementa um novo mecanismo de relógios
lógicos ao nível dos nós, que resolve três limitações fundamentais do estado
da arte: (1) minimiza os metadados necessários manter por chave
para gerir a causalidade, evitando o seu crescimento com a entrada e
saída de nós; (2) permite remover chaves de forma permanente, sem
a necessidade de manter metadados indefinidamente no servidor; (3)
um novo protocolo de anti-entropia para reparar dados replicados, de
modo a que todas as réplicas na base de dados convirjam, sem que seja
necessário operações dispendiosas como as usadas com Merkle Trees.
A terceira e última contribuição é Coerência Causal Multi-Valor: um
novo modelo de coerência de dados que respeita a causalidade das operações
efetuadas pelos clientes e que também suporta operações concorrentes,
sem que seja necessário arbitrar um vencedor entre as escritas,
seguindo o espírito da interface multi-valor do Dynamo. Adicionalmente,
estendemos este modelo para fornecer transações de escritas ou
leituras, respeitando a mesma semântica da causalidade. Para ambos
os modelos, definimos uma implementação eficiente em cima de uma
base de dados distribuída.Fundação para a Ciência e Tecnologia (FCT) - with the research grant SFRH/BD/86735/201
- …