209 research outputs found
An Indulgent Uniform Total Order Algorithm with Optimistic Delivery
A total order algorithm is a fundamental building block in the construction of distributed fault-tolerant applications. Unfortunately, the implementation of such a primitive can be expensive both in terms of communication steps and of number of messages exchanged. This problem is exacerbated in large-scale systems, where the performance of the algorithm may be limited by the presence of high-latency links. Typically, the most efficient total order algorithms do not provide uniform delivery and assume the availability of a perfect failure detector. Such algorithms may provide inconsistent results if the system assumptions do not hold. On the other hand, algorithms that assume an unreliable failure detector always provide consistent results but exhibit higher costs. This paper presents a new algorithm that combines the advantages of both approaches. On good periods, when the system is stable and processes are not suspected, the algorithm operates as if a perfect failure detector is assumed. Yet, the algorithm is indulgent, since it never violates consistency, even in runs where processes are suspecte
Strong Consistency for Shared Objects in Pervasive Grids
International audienceRecent advances in communication technology en- able the emergence of a new generation of applications that integrates mobile devices with classical high performance systems as part of a common computing environment. In such environ- ments, keeping the coherence of shared data (distributed objects, for example) represents a real challenge as communications are strongly influenced by the performance and the reliability of mobile devices (laptops, PDAs and cellular telephones) and wireless networks (WiFi, Bluetooth). Indeed, data incoherence may arise due to message losses or node volatility, which blocks the algorithms used to synchronize these data. In this paper, we analyze the main challenges concerning the manipulation of shared distributed objects in a pervasive environment. We demonstrate how a membership service can be enhanced to tolerate temporary disconnections and message losses without blocking, while reducing the number of exchanged message
Towards a generic group communication service
View synchronous group communication is a mature technology that greatly eases the development of reliable distributed applications by enforcing precise message delivery semantics, especially in face of faults. It is therefore found at the core of multiple widely deployed and used middleware products. Although the implementation of a group communication system is a complex task, application developers may benefit from the fact that multiple group communication toolkits are currently available and supported. Unfortunately, each communication toolkit has a different interface, that differs from every other interface in subtile syntactic and semantic aspects. This hinders the design, implementation and maintenance of applications using group communication and forces developers to commit beforehand to a single toolkit, thus imposing a significant hurdle to portability. In this paper we propose jGCS, a generic group communication service for Java, that specifies an interface as well as minimum semantics that allow application portability. This interface accommodates existing group communication services, enabling implementation independence. Furthermore, it provides support for the latest state-of-art mechanisms that have been proposed to improve the performance of group-based applications. To support our claims, we present and experimentally evaluate implementations of jGCS for several major group communication systems, namely, Appia, Spread/FlushSpread and JGroups, and describe the port of a large middleware product to jGCS.This work was partially supported by the IST project GORDA (FP6-IST2-004758
A General Characterization of Indulgence (Invited Paper)
An indulgent algorithm is a distributed algorithm that, besides tolerating process failures, also tolerates arbitrarily long periods of instability, with an unbounded number of timing and scheduling failures. In particular, no process can take any irrevocable action based on the operational status, correct or failed, of other processes. This paper presents an intuitive and general characterization of indulgence. The characterization can be viewed as a simple application of Murphy's law to partial runs of a distributed algorithm, in a computing model that encompasses various communication and resilience schemes. We use our characterization to establish several results about the inherent power and limitations of indulgent algorithms
The Complexity of Reliable and Secure Distributed Transactions
The use of transactions in distributed systems dates back to the 70's. The last decade has also seen the proliferation of transactional systems. In the existing transactional systems, many protocols employ a centralized approach in executing a distributed transaction where one single process coordinates the participants of a transaction. The centralized approach is usually straightforward and efficient in the failure-free setting, yet the coordinator then turns to be a single point of failure, undermining reliability/security in the failure-prone setting, or even be a performance bottleneck in practice.
In this dissertation, we explore the complexity of decentralized solutions for reliable and secure distributed transactions, which do not use a distinguished coordinator or use the coordinator as little as possible. We show that for some problems in reliable distributed transactions, there are decentralized solutions that perform as efficiently as the classical centralized one, while for some others, we determine the complexity limitations by proving lower and upper bounds to have a better understanding of the state-of-the-art solutions.
We first study the complexity on two aspects of reliable transactions: atomicity and consistency. More specifically, we do a systematic study on the time and message complexity of non-blocking atomic commit of a distributed transaction, and investigate intrinsic limitations of causally consistent transactions. Our study of distributed transaction commit focuses on the complexity of the most frequent executions in practice, i.e., failure-free, and willing to commit. Through our systematic study, we close many open questions like the complexity of synchronous non-blocking atomic commit. We also present an effective protocol which solves what we call indulgent atomic commit that tolerates practical distributed database systems which are synchronous "most of the time", and can perform as efficiently as the two-phase commit protocol widely used in distributed database systems.
Our investigation of causal transactions focuses on the limitations of read-only transactions, which are considered the most frequent in practice. We consider "fast" read-only transactions where operations are executed within one round-trip message exchange between a client seeking an object and the server storing it (in which no process can be a coordinator). We show two impossibility results regarding "fast" read-only transactions. By our impossibility results, when read-only transactions are "fast", they have to be "visible", i.e., they induce inherent updates on the servers. We also present a "fast" read-only transaction protocol that is "visible" as an upper bound on the complexity of inherent updates.
We then study the complexity of secure transactions in the model of secure multiparty computation: even in the face of malicious parties, no party obtains the computation result unless all other parties obtain the same result. As it is impossible to achieve without any trusted party, we focus on optimism where if all parties are honest, they can obtain the computation result without resorting to a trusted third party, and the complexity of every optimistic execution where all parties are honest. We prove a tight lower bound on the message complexity by relating the number of messages to the length of the permutation sequence in combinatorics, a necessary pattern for messages in every optimistic execution
On Statistically Estimated Optimistic Delivery inWide-Area Total Order Protocols
Total order broadcast protocols have been successfully applied as the basis for the construction of many fault-tolerant distributed systems. Unfortunately, the implementation of such a primitive can be expensive both in terms of communication steps and of number of messages exchanged. To alleviate this problem, optimistic total order protocols have been proposed. This paper addresses the problem of offering optimistic total order in geographically wide-area systems. We present a protocol that outperforms previous work, by minimizing the average latency of the optimistic notificatio
Practical database replication
Tese de doutoramento em InformáticaSoftware-based replication is a cost-effective approach for fault-tolerance when combined with
commodity hardware. In particular, shared-nothing database clusters built upon commodity machines
and synchronized through eager software-based replication protocols have been driven by
the distributed systems community in the last decade.
The efforts on eager database replication, however, stem from the late 1970s with initial
proposals designed by the database community. From that time, we have the distributed locking
and atomic commitment protocols. Briefly speaking, before updating a data item, all copies
are locked through a distributed lock, and upon commit, an atomic commitment protocol is
responsible for guaranteeing that the transaction’s changes are written to a non-volatile storage
at all replicas before committing it. Both these processes contributed to a poor performance.
The distributed systems community improved these processes by reducing the number of interactions
among replicas through the use of group communication and by relaxing the durability
requirements imposed by the atomic commitment protocol. The approach requires at most two
interactions among replicas and disseminates updates without necessarily applying them before
committing a transaction. This relies on a high number of machines to reduce the likelihood of
failures and ensure data resilience. Clearly, the availability of commodity machines and their
increasing processing power makes this feasible.
Proving the feasibility of this approach requires us to build several prototypes and evaluate
them with different workloads and scenarios. Although simulation environments are a good starting
point, mainly those that allow us to combine real (e.g., replication protocols, group communication)
and simulated-code (e.g., database, network), full-fledged implementations should be
developed and tested. Unfortunately, database vendors usually do not provide native support for
the development of third-party replication protocols, thus forcing protocol developers to either
change the database engines, when the source code is available, or construct in the middleware
server wrappers that intercept client requests otherwise. The former solution is hard to maintain
as new database releases are constantly being produced, whereas the latter represents a strenuous
development effort as it requires us to rebuild several database features at the middleware.
Unfortunately, the group-based replication protocols, optimistic or conservative, that had
been proposed so far have drawbacks that present a major hurdle to their practicability. The
optimistic protocols make it difficult to commit transactions in the presence of hot-spots, whereas
the conservative protocols have a poor performance due to concurrency issues.
In this thesis, we propose using a generic architecture and programming interface, titled
GAPI, to facilitate the development of different replication strategies. The idea consists of providing key extensions to multiple DBMSs (Database Management Systems), thus enabling a
replication strategy to be developed once and tested on several databases that have such extensions,
i.e., those that are replication-friendly. To tackle the aforementioned problems in groupbased
replication protocols, we propose using a novel protocol, titled AKARA. AKARA guarantees
fairness, and thus all transactions have a chance to commit, and ensures great performance
while exploiting parallelism as provided by local database engines. Finally, we outline a simple
but comprehensive set of components to build group-based replication protocols and discuss key
points in its design and implementation.A replicação baseada em software é uma abordagem que fornece um bom custo benefício para
tolerância a falhas quando combinada com hardware commodity. Em particular, os clusters de
base de dados “shared-nothing” construídos com hardware commodity e sincronizados através de
protocolos “eager” têm sido impulsionados pela comunidade de sistemas distribuídos na última
década.
Os primeiros esforços na utilização dos protocolos “eager”, decorrem da década de 70 do
século XX com as propostas da comunidade de base de dados. Dessa época, temos os protocolos
de bloqueio distribuído e de terminação atómica (i.e. “two-phase commit”). De forma sucinta,
antes de actualizar um item de dados, todas as cópias são bloqueadas através de um protocolo
de bloqueio distribuído e, no momento de efetivar uma transacção, um protocolo de terminação
atómica é responsável por garantir que as alterações da transacção são gravadas em todas as
réplicas num sistema de armazenamento não-volátil. No entanto, ambos os processos contribuem
para um mau desempenho do sistema.
A comunidade de sistemas distribuídos melhorou esses processos, reduzindo o número de
interacções entre réplicas, através do uso da comunicação em grupo e minimizando a rigidez
os requisitos de durabilidade impostos pelo protocolo de terminação atómica. Essa abordagem
requer no máximo duas interacções entre as réplicas e dissemina actualizações sem necessariamente
aplicá-las antes de efectivar uma transacção. Para funcionar, a solução depende de um
elevado número de máquinas para reduzirem a probabilidade de falhas e garantir a resiliência de
dados. Claramente, a disponibilidade de hardware commodity e o seu poder de processamento
crescente tornam essa abordagem possível.
Comprovar a viabilidade desta abordagem obriga-nos a construir vários protótipos e a avaliálos
com diferentes cargas de trabalho e cenários. Embora os ambientes de simulação sejam um
bom ponto de partida, principalmente aqueles que nos permitem combinar o código real (por
exemplo, protocolos de replicação, a comunicação em grupo) e o simulado (por exemplo, base
de dados, rede), implementações reais devem ser desenvolvidas e testadas. Infelizmente, os
fornecedores de base de dados, geralmente, não possuem suporte nativo para o desenvolvimento
de protocolos de replicação de terceiros, forçando os desenvolvedores de protocolo a mudar o
motor de base de dados, quando o código fonte está disponível, ou a construir no middleware
abordagens que interceptam as solicitações do cliente. A primeira solução é difícil de manter já
que novas “releases” das bases de dados estão constantemente a serem produzidas, enquanto a
segunda representa um desenvolvimento árduo, pois obriga-nos a reconstruir vários recursos de
uma base de dados no middleware. Infelizmente, os protocolos de replicação baseados em comunicação em grupo, optimistas ou
conservadores, que foram propostos até agora apresentam inconvenientes que são um grande obstáculo
à sua utilização. Com os protocolos optimistas é difícil efectivar transacções na presença
de “hot-spots”, enquanto que os protocolos conservadores têm um fraco desempenho devido a
problemas de concorrência.
Nesta tese, propomos utilizar uma arquitetura genérica e uma interface de programação, intitulada
GAPI, para facilitar o desenvolvimento de diferentes estratégias de replicação. A ideia
consiste em fornecer extensões chaves para múltiplos SGBDs (Database Management Systems),
permitindo assim que uma estratégia de replicação possa ser desenvolvida uma única vez e testada
em várias bases de dados que possuam tais extensões, ou seja, aquelas que são “replicationfriendly”.
Para resolver os problemas acima referidos nos protocolos de replicação baseados
em comunicação em grupo, propomos utilizar um novo protocolo, intitulado AKARA. AKARA
garante a equidade, portanto, todas as operações têm uma oportunidade de serem efectivadas,
e garante um excelente desempenho ao tirar partido do paralelismo fornecido pelos motores
de base de dados. Finalmente, propomos um conjunto simples, mas abrangente de componentes
para construir protocolos de replicação baseados em comunicação em grupo e discutimos pontoschave
na sua concepção e implementação
- …