487 research outputs found
Recommended from our members
Robust epidemic aggregation under churn
In large-scale distributed systems data aggregation is a fundamental task that provides a global synopsis
over a distributed set of data values. Epidemic protocols are based on a randomised communication paradigm
inspired by biological systems and have been proposed to provide decentralised, scalable and fault-tolerant
solutions to the data aggregation problem. However, in epidemic aggregation, nodes failure and churn have
a detrimental effect on the accuracy of the local estimates of the global aggregation target. In this paper, a
novel approach, the Robust Epidemic Aggregation Protocol (REAP), is proposed to provide robustness in
the presence of churn by detecting three distinct phases in the aggregation process. An analysis of the impact
of each phase over the estimation accuracy is provided. In particular, a novel mechanism is introduced to
improve the phase that is most critical for the protocol accuracy. REAP is validated by means of simulations
and is shown to achieve convergence with a good level of accuracy for a reasonable range of node churn
rates
Recommended from our members
Asynchronous epidemic algorithms for consistency in large-scale systems
Achieving and detecting a globally consistent state is essential to many services in the large
and extreme-scale distributed systems, especially when the desired consistent state is critical
for services operation. Centralised and deterministic approaches for synchronisation and
distributed consistency are not scalable and not fault-tolerant. Alternatively, epidemic-based
paradigms are decentralised computations based on randomised communications. They are
scalable, resilient, fault-tolerant, and converge to the desired target in logarithmic time with
respect to system size. Thus, many distributed services have adopted epidemic protocols
to achieve the consensus and the consistent state, mainly due to scalability concerns. The
convergence of epidemic protocols is stochastically guaranteed. However, the detection of
the convergence is probabilistic and non-explicit. In a real-world environment, systems are
unreliable, and epidemic protocols cannot converge to the desired state. Thus, achieving
convergence by itself does not ensure making a system-wide consistent state under dynamic
conditions.
The research work presented in this thesis introduces the Phase Transition Algorithm
(PTA) to achieve distributed consistent state based on the explicit detection of convergence.
Each phase in PTA is a decentralised decision-making process that implements epidemic data
aggregation, in which the detection of convergence implies achieving a global agreement. The
phases in PTA can be cascaded to achieve higher certainty as desired. Following the PTA,
two epidemic protocols, namely PTP and ECP, are proposed to acquire of consensus, i.e. for
the consistency in data dissemination and data aggregation. The protocols are examined
through simulations, and experimental results have validated the protocols ability to achieve
and explicitly detect the consensus among system nodes.
The research work has also studied the epidemic data aggregation under nodes churn and
network failures, in which the analysis has identified three phases of the aggregation process.
The investigations have shown a different impact of nodes churn on each phase. The phase
that is critical for the aggregation process has been studied further, which led to propose
new robust data aggregation protocols, REAP and REAP+. Each protocol has a different
decentralised replication method, and both implements distributed failure detection and
instantaneous mass restoration mechanisms. Simulations have validated the protocols, and
results have shown protocols ability to converge, detect convergence, and produce competitive
accuracy under various levels of nodes churn.
Furthermore, distributed consistency in continuous systems is addressed in the research.
The work has proposed a novel continuous epidemic protocol with the adaptive restart
mechanism. The protocol restarts either upon the detection of system convergence or upon
the detection of divergence. Also, the protocol introduces the seed selection method for
the peak data distribution in decentralised approaches, which was a challenge that requires
single-point initialisation and leader-election step. The simulations validated the performance
of the algorithm under static and dynamic conditions and approved that convergence and
divergence detection accuracy can be tuned as desired.
Finally, the research work shows that combining and integrating of the proposed protocols
enables extreme-scale distributed systems to achieve and detect global consistent states even
under realistic and dynamical conditions
Spectra: Robust Estimation of Distribution Functions in Networks
Distributed aggregation allows the derivation of a given global aggregate
property from many individual local values in nodes of an interconnected
network system. Simple aggregates such as minima/maxima, counts, sums and
averages have been thoroughly studied in the past and are important tools for
distributed algorithms and network coordination. Nonetheless, this kind of
aggregates may not be comprehensive enough to characterize biased data
distributions or when in presence of outliers, making the case for richer
estimates of the values on the network. This work presents Spectra, a
distributed algorithm for the estimation of distribution functions over large
scale networks. The estimate is available at all nodes and the technique
depicts important properties, namely: robust when exposed to high levels of
message loss, fast convergence speed and fine precision in the estimate. It can
also dynamically cope with changes of the sampled local property, not requiring
algorithm restarts, and is highly resilient to node churn. The proposed
approach is experimentally evaluated and contrasted to a competing state of the
art distribution aggregation technique.Comment: Full version of the paper published at 12th IFIP International
Conference on Distributed Applications and Interoperable Systems (DAIS),
Stockholm (Sweden), June 201
Recommended from our members
An adaptive restart mechanism for continuous epidemic systems
Software services based on large-scale distributed systems demand continuous and decentralised solutions for achieving system consistency and providing operational monitoring. Epidemic data aggregation algorithms provide decentralised, scalable and fault-tolerant solutions that can be used for system-wide tasks such as global state determination, monitoring and consensus. Existing continuous epidemic algorithms either periodically restart at fixed epochs or apply changes in the system state instantly producing less accurate approximation. This work introduces an innovative mechanism without fixed epochs that monitors the system state and restarts upon the detection of the system convergence or divergence. The mechanism makes correct aggregation with an approximation error as small as desired. The proposed solution is validated and analysed by means of simulations under static and dynamic network conditions
Recommended from our members
Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale
Exploiting the Synergy Between Gossiping and Structured Overlays
In this position paper we argue for exploiting the synergy between gossip-based algorithms and structured overlay networks (SON). These two strands of research have both aimed at building fault-tolerant, dynamic, self-managing, and large-scale distributed systems. Despite the common goals, the two areas have, however, been relatively isolated. We focus on three problem domains where there is an untapped potential of using gossiping combined with SONs. We argue for applying gossip-based membership for ring-based SONs---such as Chord and Bamboo---to make them handle partition mergers and loopy networks. We argue that small world SONs---such as Accordion and Mercury---are specifically well-suited for gossip-based membership management. The benefits would be better graph-theoretic properties. Finally, we argue that gossip-based algorithms could use the overlay constructed by SONs. For example, many unreliable broadcast algorithms for SONs could be augmented with anti-entropy protocols. Similarly, gossip-based aggregation could be used in SONs for network size estimation and load-balancing purposes
Recommended from our members
Robust and efficient membership management in large-scale dynamic networks
Epidemic protocols are a bio-inspired communication and computation paradigm for large-scale networked systems based on randomised communication. These protocols rely on a membership service to build decentralised and random overlay topologies. In large-scale, dynamic network environments, node churn and failures may have a detrimental effect on the structure of the overlay topologies with negative impact on the efficiency and the accuracy of applications. Most importantly, there exists the risk of a permanent loss of global connectivity that would prevent the correct convergence of applications. This work investigates to what extent a dynamic network environment may negatively affect the performance of Epidemic membership protocols. A novel Enhanced Expander Membership Protocol (EMP+) based on the expansion properties of graphs is presented. The proposed protocol is evaluated against other membership protocols and the comparative analysis shows that EMP+ can support faster application convergence and is the first membership protocol to provide robustness against global network connectivity problems
Flow updating: fault-tolerant aggregation for dynamic networks
Documento submetido para revisão pelos pares. A publicar em Journal of Parallel and Distributed Computing. ISSN 0743-7315Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches, commonly designated gossip-based, are an important class of aggregation algorithms as they allow all nodes to produce a result, converge to any required accuracy, and work independently from the network topology. However, existing approaches exhibit many dependability issues when used in faulty and dynamic environments. This paper describes and evaluates a fault tolerant distributed aggregation technique, Flow Updating, which overcomes the problems in previous averaging approaches and is able to operate on faulty dynamic networks. Experimental results show that this novel approach out performs previous averaging algorithms; it self-adapts to churn and input value changes without requiring any periodic restart, supporting node crashes and high levels of message loss, and works in asynchronous networks. Realistic concerns have been taken into account in evaluating Flow Updating, like the use of unreliable failure detectors and asynchrony, targeting its application to realistic environments.This work was partially funded by FCT PhD grant SFRH/BD/33232/2007 and by project Norte-01-0124-FEDER- 000058, co-financed by the North Portugal Regional Operational Program (ON.2 O Novo Norte), under the National Strategic Reference Framework (NSRF), through the European Regional Development Fund (ERDF)
- …