6 research outputs found
Dependability in Aggregation by Averaging
Aggregation is an important building block of modern distributed
applications, allowing the determination of meaningful properties (e.g. network
size, total storage capacity, average load, majorities, etc.) that are used to
direct the execution of the system. However, the majority of the existing
aggregation algorithms exhibit relevant dependability issues, when prospecting
their use in real application environments. In this paper, we reveal some
dependability issues of aggregation algorithms based on iterative averaging
techniques, giving some directions to solve them. This class of algorithms is
considered robust (when compared to common tree-based approaches), being
independent from the used routing topology and providing an aggregation result
at all nodes. However, their robustness is strongly challenged and their
correctness often compromised, when changing the assumptions of their working
environment to more realistic ones. The correctness of this class of algorithms
relies on the maintenance of a fundamental invariant, commonly designated as
"mass conservation". We will argue that this main invariant is often broken in
practical settings, and that additional mechanisms and modifications are
required to maintain it, incurring in some degradation of the algorithms
performance. In particular, we discuss the behavior of three representative
algorithms Push-Sum Protocol, Push-Pull Gossip protocol and Distributed Random
Grouping under asynchronous and faulty (with message loss and node crashes)
environments. More specifically, we propose and evaluate two new versions of
the Push-Pull Gossip protocol, which solve its message interleaving problem
(evidenced even in a synchronous operation mode).Comment: 14 pages. Presented in Inforum 200
Spectra: Robust Estimation of Distribution Functions in Networks
Distributed aggregation allows the derivation of a given global aggregate
property from many individual local values in nodes of an interconnected
network system. Simple aggregates such as minima/maxima, counts, sums and
averages have been thoroughly studied in the past and are important tools for
distributed algorithms and network coordination. Nonetheless, this kind of
aggregates may not be comprehensive enough to characterize biased data
distributions or when in presence of outliers, making the case for richer
estimates of the values on the network. This work presents Spectra, a
distributed algorithm for the estimation of distribution functions over large
scale networks. The estimate is available at all nodes and the technique
depicts important properties, namely: robust when exposed to high levels of
message loss, fast convergence speed and fine precision in the estimate. It can
also dynamically cope with changes of the sampled local property, not requiring
algorithm restarts, and is highly resilient to node churn. The proposed
approach is experimentally evaluated and contrasted to a competing state of the
art distribution aggregation technique.Comment: Full version of the paper published at 12th IFIP International
Conference on Distributed Applications and Interoperable Systems (DAIS),
Stockholm (Sweden), June 201
Recommended from our members
Agreement in epidemic data aggregation
Computing and spreading global information in large-scale distributed systems pose significant challenges when scalability, parallelism, resilience and consistency are demanded. Epidemic protocols are a robust and scalable computing and communication paradigm that can be effectively used for information dissemination and data aggregation in a fully decentralised context where each network node requires the local computation of a global synopsis function. Theoretical analysis of epidemic protocols for synchronous and static network models provide guarantees on the convergence to a global target and on the consistency among the network nodes. However, practical applications in real-world networks may require the explicit detection of both local convergence and global agreement (consensus). This work introduces the Epidemic Consensus Protocol (ECP) for the determination of consensus on the convergence of a decentralised data aggregation task. ECP adopts a heuristic method to locally detect convergence of the aggregation task and stochastic phase transitions to detect global agreement and reach consensus. The performance of ECP has been investigated by means of simulations and compared to a tree-based Three-Phase Commit protocol (3PC). Although, as expected, ECP exhibits total communication costs
greater than the optimal tree-based protocol, it is shown to have better performance and scalability properties; ECP can achieve faster convergence to consensus for large system sizes and inherits the intrinsic decentralisation, fault-tolerance and robustness properties of epidemic protocols
Recommended from our members
Asynchronous epidemic algorithms for consistency in large-scale systems
Achieving and detecting a globally consistent state is essential to many services in the large
and extreme-scale distributed systems, especially when the desired consistent state is critical
for services operation. Centralised and deterministic approaches for synchronisation and
distributed consistency are not scalable and not fault-tolerant. Alternatively, epidemic-based
paradigms are decentralised computations based on randomised communications. They are
scalable, resilient, fault-tolerant, and converge to the desired target in logarithmic time with
respect to system size. Thus, many distributed services have adopted epidemic protocols
to achieve the consensus and the consistent state, mainly due to scalability concerns. The
convergence of epidemic protocols is stochastically guaranteed. However, the detection of
the convergence is probabilistic and non-explicit. In a real-world environment, systems are
unreliable, and epidemic protocols cannot converge to the desired state. Thus, achieving
convergence by itself does not ensure making a system-wide consistent state under dynamic
conditions.
The research work presented in this thesis introduces the Phase Transition Algorithm
(PTA) to achieve distributed consistent state based on the explicit detection of convergence.
Each phase in PTA is a decentralised decision-making process that implements epidemic data
aggregation, in which the detection of convergence implies achieving a global agreement. The
phases in PTA can be cascaded to achieve higher certainty as desired. Following the PTA,
two epidemic protocols, namely PTP and ECP, are proposed to acquire of consensus, i.e. for
the consistency in data dissemination and data aggregation. The protocols are examined
through simulations, and experimental results have validated the protocols ability to achieve
and explicitly detect the consensus among system nodes.
The research work has also studied the epidemic data aggregation under nodes churn and
network failures, in which the analysis has identified three phases of the aggregation process.
The investigations have shown a different impact of nodes churn on each phase. The phase
that is critical for the aggregation process has been studied further, which led to propose
new robust data aggregation protocols, REAP and REAP+. Each protocol has a different
decentralised replication method, and both implements distributed failure detection and
instantaneous mass restoration mechanisms. Simulations have validated the protocols, and
results have shown protocols ability to converge, detect convergence, and produce competitive
accuracy under various levels of nodes churn.
Furthermore, distributed consistency in continuous systems is addressed in the research.
The work has proposed a novel continuous epidemic protocol with the adaptive restart
mechanism. The protocol restarts either upon the detection of system convergence or upon
the detection of divergence. Also, the protocol introduces the seed selection method for
the peak data distribution in decentralised approaches, which was a challenge that requires
single-point initialisation and leader-election step. The simulations validated the performance
of the algorithm under static and dynamic conditions and approved that convergence and
divergence detection accuracy can be tuned as desired.
Finally, the research work shows that combining and integrating of the proposed protocols
enables extreme-scale distributed systems to achieve and detect global consistent states even
under realistic and dynamical conditions
Robust distributed data aggregation
Tese de doutoramento
Programa Doutoral em Informática MAP-iDistributed aggregation algorithms are an important building block of modern large
scale systems, as it allows the determination of meaningful system-wide properties
(e.g., network size, total storage capacity, average load, or majorities) which are required
to direct the execution of distributed applications. In the last decade, several
algorithms have been proposed to address the distributed computation of aggregation
functions (e.g., COUNT, SUM, AVERAGE, and MAX/MIN), exhibiting different properties
in terms of accuracy, speed and communication tradeoffs. However, existing
approaches exhibit many issues when challenged in faulty and dynamic environments,
lacking in terms of fault-tolerance and support to churn.
This study details a novel distributed aggregation approach, named Flow Updating,
which is fault-tolerant and able to operate on dynamics networks. The algorithm
is based on manipulating flows (inspired by the concept from graph theory), that are
updated using idempotent messages, providing it with unique robustness capabilities.
Experimental results showed that Flow Updating outperforms previous averaging algorithms
in terms of time and message complexity, and unlike them it self adapts to
churn and changes of the initial input values without requiring any periodic restart,
supporting node crashes and high levels of message loss.
In addition to this main contribution, others can also be found in this research
work, namely: a definition of the aggregation problem is proposed; existing distributed
aggregation algorithm are surveyed and classified into a comprehensive taxonomy; a
novel algorithm is introduced, based on Flow Updating, to estimate the Cumulative
Distribution Function (CDF) of a global system attribute.
It is expected that this work will constitute a relevant contribution to the area of distributed
computing, in particular to the robust distributed computation of aggregation
functions in dynamic networks.Os algoritmos de agregação distribuídos têm um papel importante no desenho dos
sistemas de larga escala modernos, uma vez que permitem determinar o valor de propriedades
globais do sistema (e.g., tamanho da rede, capacidade total de armazenamento,
carga média, ou maiorias) que são fundamentais para a execução de outras
aplicações distribuídas. Ao longo da última década, diversos algoritmos têm sido
propostos para calcular funções de agregação (e.g., CONTAGEM, SOMA, M´E DIA, ou
MIN/MAX), possuindo diferentes características em termos de precisão, velocidade e
comunicação. No entanto, as técnicas existentes exibem vários problemas quando
executadas em ambientes com faltas e dinâmicos, deixando a desejar em termos de
tolerância a faltas e não suportando a entrada/saída de nós.
Este estudo descreve detalhadamente uma nova abordagem para calcular funções
de agregação de forma distribuída, denominada Flow Updating, que é tolerante a faltas
e capaz de operar em redes dinámicas. O algoritmo é baseada na manipulacão de
fluxos (inspirado no conceito da teoria de grafos), que são atualizados por mensagens
idempotentes, conferindo-lhe capacidades unicas em termos de robustez. Os resultados
experimentais demonstram que o Flow Updating supera os anteriores algoritmos
baseados em técnicas de averaging em termos de complexidade de tempo e mensagens,
e, ao contrário destes, auto adapta-se a mudanc¸as da rede (i.e., entrada/saída de
nós e alteraçãoo dos valores iniciais) sem necessitar de reiniciar periodicamente a sua
execuçãoo, suportando falhas de nos e elevados níveis de perdas de mensagens.
Para além desta contribuição principal, outras são também encontradas neste trabalho,
nomeadamente: é proposta uma definição do problema da agregação; é descrito
o estado da arte em termos dos algoritmos de agregação distribuídos, e estes são classificados
numa taxonomia abrangente; é apresentado um novo algoritmo baseado no
Flow Updating para estimar a Funcão de Distribuição Cumulativa (CDF) de um atributo
global do sistema