11 research outputs found
Distributed eventual leader election in the crash-recovery and general omission failure models.
102 p.Distributed applications are present in many aspects of everyday life. Banking, healthcare or transportation are examples of such applications. These applications are built on top of distributed systems. Roughly speaking, a distributed system is composed of a set of processes that collaborate among them to achieve a common goal. When building such systems, designers have to cope with several issues, such as different synchrony assumptions and failure occurrence. Distributed systems must ensure that the delivered service is trustworthy.Agreement problems compose a fundamental class of problems in distributed systems. All agreement problems follow the same pattern: all processes must agree on some common decision. Most of the agreement problems can be considered as a particular instance of the Consensus problem. Hence, they can be solved by reduction to consensus. However, a fundamental impossibility result, namely (FLP), states that in an asynchronous distributed system it is impossible to achieve consensus deterministically when at least one process may fail. A way to circumvent this obstacle is by using unreliable failure detectors. A failure detector allows to encapsulate synchrony assumptions of the system, providing (possibly incorrect) information about process failures. A particular failure detector, called Omega, has been shown to be the weakest failure detector for solving consensus with a majority of correct processes. Informally, Omega lies on providing an eventual leader election mechanism
Fit-Broker: delivering a reliable service for event dissemination
Tese de mestrado em Segurança da Informação, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2013Os serviços de nuvem (Cloud) estão a assumir um papel cada vez mais importante no mundo de fornecimento de serviços. Estes serviços variam desde a oferta de simples ferramentas de trabalho até a disponibilização de infraestruturas remotas de computação. Como tal, a correcta monitorização das infraestruturas de nuvem assume um papel vital de forma a garantir disponibilidade e o cumprimento de acordos de nível de serviço. Existem alguns estudos recentes que mostram que este tipo de infraestruturas não se encontra preparada para enfrentar atuais e futuros problemas de segurança que podem ocorrer. Parte deste problema advém do facto de as ferramentas de monitorização serem centralizadas e de apenas suportarem alguns tipos de falhas. De forma a tornar os sistemas de monitorização mais resilientes, esta dissertação propõe uma solução para aumentar a confiabilidade no transporte de informação entre os seus vários pontos. Trata-se de uma framework adaptável e resiliente de disseminação de eventos baseada no paradigma de publicador-subscritor. Esta oferece múltiplos níveis de resiliência e qualidades de serviço que podem ser combinados para oferecer uma qualidade de serviço e de proteção adequada às necessidades de cada sistema. Este documento descreve a arquitectura da framework bem como todo seu funcionamento interno e interfaces oferecidas. Este documento descreve ainda um conjunto de testes realizados de forma a avaliar a performance da framework em vários cenários distintos.Cloud services are assuming a greater role in the world of service providing. These services can range from the simple working tool to a complete remote computing infrastructure. As such, the correct monitoring of this type of infrastructures represents a key requirement to ensure availability and the fulfilment of the service level agreements. Recent studies show that these infrastructures are not prepared to face some current and future security issues. Part of these problems resides in the fact that current monitoring tools are centralized and are only prepared to deal with some types of faults. In order to increase the resilience of monitoring systems, this dissertation proposes a framework capable of increasing the reliability of the transport of information between their many peers. It is a adaptable and resilient framework for event dissemination based on the publisher-subscriber paradigm. The framework offers multiple levels of resilience and quality of services that can be combined to meet the necessities of quality of service and protection of each system.
This document describes the architecture, internal mechanism and interfaces of the framework. Also, we describe a series of tests that where used to evaluate the performance of the framework in different scenarios
Topology Aware Leader Election Algorithm for MANET
National audienceThis article presents an eventual leader election algorithm for mobile dynamic networks. Each node builds knowledge of the communication graph of connected nodes, by broadcasting changes in their neighborhood. This knowledge provides the current topology of the network, used to compute the closeness centrality as the choice of the leader. Experiments were realized on PeerSim simulator, comparing our algorithm with static and dynamic flooding algorithms, on different network topologies and mobility patterns. Our algorithm improves leader stability up to 24%, sends half less messages and aims to an 8% shorter leader path
Preliminary Specification of Services and Protocols
This document describes the preliminary specification of services and protocols for the Crutial Architecture. The Crutial Architecture definition, first addressed in Crutial Project Technical Report D4 (January 2007), intends to reply to a grand challenge of computer science and control engineering: how to achieve resilience of critical information infrastructures, in particular in the electrical sector. The definitions herein elaborate on the major architectural options and components established in the Preliminary Architecture Specification (D4), with special relevance to the Crutial middleware building blocks, and are based on the fault, synchrony and topological models defined in the same document. The document, in general lines, describes the Runtime Support Services and APIs, and the Middleware Services and APIs. Then, it delves into the protocols, describing: Runtime Support Protocols, and Middleware Services Protocols. The Runtime Support Services and APIs chapter features as a main component, the Proactive-Reactive Recovery Service, whose aim is to guarantee perpetual execution of any components it protects. The Middleware Services and APIs chapter describes our approach to intrusion-tolerant middleware. The middleware comprises several layers. The Multipoint Network layer is the lowest layer of CRUTIAL's middleware, and features an abstraction of basic communication services, such as provided by standard protocols, like IP, IPsec, UDP, TCP and SSL/TLS. The Communication Support Services feature two important building blocks: the Randomized Intrusion-Tolerant Services (RITAS), and the Overlay Protection Layer (OPL) against DoS attacks. The Activity Support Services currently defined comprise the CIS Protection service, and the Access Control and Authorization service. Protection as described in this report is implemented by mechanisms and protocols residing on a device called Crutial Information Switch (CIS). The Access Control and Authorization service is implemented through PolyOrBAC, which defines the rules for information exchange and collaboration between sub-modules of the architecture, corresponding in fact to different facilities of the CII's organizations.The Monitoring and Failure Detection layer contains a preliminary definition of the middleware services devoted to monitoring and failure detection activities. The remaining chapters describe the protocols implementing the above-mentioned services: Runtime Support Protocols, and Middleware Services Protocol
Topology Aware Leader Election Algorithm for Dynamic Networks
International audienceThis paper proposes an algorithm that eventually elects a leader for each connected component of a dynamic network where nodes can move or fail by crash. A node only communicates with nodes in its transmission range and locally keeps a global view, denoted topological knowledge, of the communication graph of the network and its dynamic evolution. Every change in the topology or in nodes membership is detected by one or more nodes and propagated over the network, updating thus the topological knowledge of the nodes. As the choice of the leader has an impact on the performance of applications that use an eventual leader election service, our algorithm, thanks to nodes topological knowledge, exploits the closeness centrality as the criterion for electing a leader. Experiments were conducted on top of PeerSim simulator, comparing our algorithm to a representative flooding algorithm. Performance results show that our algorithm outperforms the flooding one when considering leader choice stability, number of messages, and average distance to the leader
Byzantine fault-tolerant agreement protocols for wireless Ad hoc networks
Tese de doutoramento, Informática (Ciências da Computação), Universidade de Lisboa, Faculdade de Ciências, 2010.The thesis investigates the problem of fault- and intrusion-tolerant consensus
in resource-constrained wireless ad hoc networks. This is a fundamental
problem in distributed computing because it abstracts the need
to coordinate activities among various nodes. It has been shown to be a
building block for several other important distributed computing problems
like state-machine replication and atomic broadcast.
The thesis begins by making a thorough performance assessment of existing
intrusion-tolerant consensus protocols, which shows that the performance
bottlenecks of current solutions are in part related to their system
modeling assumptions. Based on these results, the communication failure
model is identified as a model that simultaneously captures the reality
of wireless ad hoc networks and allows the design of efficient protocols.
Unfortunately, the model is subject to an impossibility result stating that
there is no deterministic algorithm that allows n nodes to reach agreement
if more than n2 omission transmission failures can occur in a communication
step. This result is valid even under strict timing assumptions (i.e.,
a synchronous system).
The thesis applies randomization techniques in increasingly weaker variants
of this model, until an efficient intrusion-tolerant consensus protocol
is achieved. The first variant simplifies the problem by restricting the
number of nodes that may be at the source of a transmission failure at
each communication step. An algorithm is designed that tolerates f dynamic
nodes at the source of faulty transmissions in a system with a total
of n 3f + 1 nodes.
The second variant imposes no restrictions on the pattern of transmission
failures. The proposed algorithm effectively circumvents the Santoro-
Widmayer impossibility result for the first time. It allows k out of n nodes
to decide despite dn
2 e(nk)+k2 omission failures per communication
step. This algorithm also has the interesting property of guaranteeing
safety during arbitrary periods of unrestricted message loss.
The final variant shares the same properties of the previous one, but relaxes
the model in the sense that the system is asynchronous and that a
static subset of nodes may be malicious. The obtained algorithm, called
Turquois, admits f < n
3 malicious nodes, and ensures progress in communication
steps where dnf
2 e(n k f) + k 2. The algorithm is
subject to a comparative performance evaluation against other intrusiontolerant
protocols. The results show that, as the system scales, Turquois
outperforms the other protocols by more than an order of magnitude.Esta tese investiga o problema do consenso tolerante a faltas acidentais
e maliciosas em redes ad hoc sem fios. Trata-se de um problema fundamental
que captura a essência da coordenação em actividades envolvendo
vários nós de um sistema, sendo um bloco construtor de outros importantes
problemas dos sistemas distribuídos como a replicação de máquina
de estados ou a difusão atómica.
A tese começa por efectuar uma avaliação de desempenho a protocolos
tolerantes a intrusões já existentes na literatura. Os resultados mostram
que as limitações de desempenho das soluções existentes estão em parte
relacionadas com o seu modelo de sistema. Baseado nestes resultados, é
identificado o modelo de falhas de comunicação como um modelo que simultaneamente
permite capturar o ambiente das redes ad hoc sem fios e
projectar protocolos eficientes. Todavia, o modelo é restrito por um resultado
de impossibilidade que afirma não existir algoritmo algum que permita
a n nós chegaram a acordo num sistema que admita mais do que n2
transmissões omissas num dado passo de comunicação. Este resultado é
válido mesmo sob fortes hipóteses temporais (i.e., em sistemas síncronos)
A tese aplica técnicas de aleatoriedade em variantes progressivamente
mais fracas do modelo até ser alcançado um protocolo eficiente e tolerante
a intrusões. A primeira variante do modelo, de forma a simplificar
o problema, restringe o número de nós que estão na origem de transmissões
faltosas. É apresentado um algoritmo que tolera f nós dinâmicos na
origem de transmissões faltosas em sistemas com um total de n 3f + 1
nós.
A segunda variante do modelo não impõe quaisquer restrições no padrão
de transmissões faltosas. É apresentado um algoritmo que contorna efectivamente
o resultado de impossibilidade Santoro-Widmayer pela primeira
vez e que permite a k de n nós efectuarem progresso nos passos de comunicação
em que o número de transmissões omissas seja dn
2 e(n
k) + k 2. O algoritmo possui ainda a interessante propriedade de tolerar
períodos arbitrários em que o número de transmissões omissas seja
superior a .
A última variante do modelo partilha das mesmas características da variante
anterior, mas com pressupostos mais fracos sobre o sistema. Em particular,
assume-se que o sistema é assíncrono e que um subconjunto estático
dos nós pode ser malicioso. O algoritmo apresentado, denominado
Turquois, admite f < n
3 nós maliciosos e assegura progresso nos passos
de comunicação em que dnf
2 e(n k f) + k 2. O algoritmo é
sujeito a uma análise de desempenho comparativa com outros protocolos
na literatura. Os resultados demonstram que, à medida que o número de
nós no sistema aumenta, o desempenho do protocolo Turquois ultrapassa
os restantes em mais do que uma ordem de magnitude.FC