Search CORE

487 research outputs found

Recommended from our members

Robust epidemic aggregation under churn

Author: GD Fatta
L Chitnis
M Ayiad
M Jelasity
P Jesus
P Poonpakdee
R Makhloufi
Publication venue
Publication date: 01/01/2018
Field of study

In large-scale distributed systems data aggregation is a fundamental task that provides a global synopsis over a distributed set of data values. Epidemic protocols are based on a randomised communication paradigm inspired by biological systems and have been proposed to provide decentralised, scalable and fault-tolerant solutions to the data aggregation problem. However, in epidemic aggregation, nodes failure and churn have a detrimental effect on the accuracy of the local estimates of the global aggregation target. In this paper, a novel approach, the Robust Epidemic Aggregation Protocol (REAP), is proposed to provide robustness in the presence of churn by detecting three distinct phases in the aggregation process. An analysis of the impact of each phase over the estimation accuracy is provided. In particular, a novel mechanism is introduced to improve the phase that is most critical for the protocol accuracy. REAP is validated by means of simulations and is shown to achieve convergence with a good level of accuracy for a reasonable range of node churn rates

Central Archive at the University of Reading

Crossref

Recommended from our members

Asynchronous epidemic algorithms for consistency in large-scale systems

Author: Ayiad Mosab
Publication venue
Publication date: 31/01/2020
Field of study

Achieving and detecting a globally consistent state is essential to many services in the large and extreme-scale distributed systems, especially when the desired consistent state is critical for services operation. Centralised and deterministic approaches for synchronisation and distributed consistency are not scalable and not fault-tolerant. Alternatively, epidemic-based paradigms are decentralised computations based on randomised communications. They are scalable, resilient, fault-tolerant, and converge to the desired target in logarithmic time with respect to system size. Thus, many distributed services have adopted epidemic protocols to achieve the consensus and the consistent state, mainly due to scalability concerns. The convergence of epidemic protocols is stochastically guaranteed. However, the detection of the convergence is probabilistic and non-explicit. In a real-world environment, systems are unreliable, and epidemic protocols cannot converge to the desired state. Thus, achieving convergence by itself does not ensure making a system-wide consistent state under dynamic conditions. The research work presented in this thesis introduces the Phase Transition Algorithm (PTA) to achieve distributed consistent state based on the explicit detection of convergence. Each phase in PTA is a decentralised decision-making process that implements epidemic data aggregation, in which the detection of convergence implies achieving a global agreement. The phases in PTA can be cascaded to achieve higher certainty as desired. Following the PTA, two epidemic protocols, namely PTP and ECP, are proposed to acquire of consensus, i.e. for the consistency in data dissemination and data aggregation. The protocols are examined through simulations, and experimental results have validated the protocols ability to achieve and explicitly detect the consensus among system nodes. The research work has also studied the epidemic data aggregation under nodes churn and network failures, in which the analysis has identified three phases of the aggregation process. The investigations have shown a different impact of nodes churn on each phase. The phase that is critical for the aggregation process has been studied further, which led to propose new robust data aggregation protocols, REAP and REAP+. Each protocol has a different decentralised replication method, and both implements distributed failure detection and instantaneous mass restoration mechanisms. Simulations have validated the protocols, and results have shown protocols ability to converge, detect convergence, and produce competitive accuracy under various levels of nodes churn. Furthermore, distributed consistency in continuous systems is addressed in the research. The work has proposed a novel continuous epidemic protocol with the adaptive restart mechanism. The protocol restarts either upon the detection of system convergence or upon the detection of divergence. Also, the protocol introduces the seed selection method for the peak data distribution in decentralised approaches, which was a challenge that requires single-point initialisation and leader-election step. The simulations validated the performance of the algorithm under static and dynamic conditions and approved that convergence and divergence detection accuracy can be tuned as desired. Finally, the research work shows that combining and integrating of the proposed protocols enables extreme-scale distributed systems to achieve and detect global consistent states even under realistic and dynamical conditions

Central Archive at the University of Reading

Spectra: Robust Estimation of Distribution Functions in Networks

Author: Almeida Paulo Sérgio
Baquero Carlos
Borges Miguel
Jesus Paulo
Publication venue
Publication date: 01/01/2012
Field of study

Distributed aggregation allows the derivation of a given global aggregate property from many individual local values in nodes of an interconnected network system. Simple aggregates such as minima/maxima, counts, sums and averages have been thoroughly studied in the past and are important tools for distributed algorithms and network coordination. Nonetheless, this kind of aggregates may not be comprehensive enough to characterize biased data distributions or when in presence of outliers, making the case for richer estimates of the values on the network. This work presents Spectra, a distributed algorithm for the estimation of distribution functions over large scale networks. The estimate is available at all nodes and the technique depicts important properties, namely: robust when exposed to high levels of message loss, fast convergence speed and fine precision in the estimate. It can also dynamically cope with changes of the sampled local property, not requiring algorithm restarts, and is highly resilient to node churn. The proposed approach is experimentally evaluated and contrasted to a competing state of the art distribution aggregation technique.Comment: Full version of the paper published at 12th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS), Stockholm (Sweden), June 201

arXiv.org e-Print Archive

Universidade do Minho: RepositoriUM

Crossref

Recommended from our members

An adaptive restart mechanism for continuous epidemic systems

Author: Ayiad Mosab M.
Di Fatta Giuseppe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Software services based on large-scale distributed systems demand continuous and decentralised solutions for achieving system consistency and providing operational monitoring. Epidemic data aggregation algorithms provide decentralised, scalable and fault-tolerant solutions that can be used for system-wide tasks such as global state determination, monitoring and consensus. Existing continuous epidemic algorithms either periodically restart at fixed epochs or apply changes in the system state instantly producing less accurate approximation. This work introduces an innovative mechanism without fixed epochs that monitors the system state and restarts upon the detection of the system convergence or divergence. The mechanism makes correct aggregation with an approximation error as small as desired. The proposed solution is validated and analysed by means of simulations under static and dynamic network conditions

Central Archive at the University of Reading

Recommended from our members

Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

Author: Blasa Francesco
Cafiero Simone
Di Fatta Giuseppe
Fortino Giancarlo
Publication venue: 'Elsevier BV'
Publication date: 01/03/2013
Field of study

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale

Central Archive at the University of Reading

Crossref

Exploiting the Synergy Between Gossiping and Structured Overlays

Author: Ali Ghodsi
Castro M.
El-Ansary S.
Ghodsi A.
Ghodsi A.
Ghodsi A.
Hakim Weatherspoon
Leong B.
Li J.
Manku G. S.
Pouwelse J. A.
Ratnasamy S.
Rhea S.
Seif Haridi
Publication venue
Publication date: 01/01/2007
Field of study

In this position paper we argue for exploiting the synergy between gossip-based algorithms and structured overlay networks (SON). These two strands of research have both aimed at building fault-tolerant, dynamic, self-managing, and large-scale distributed systems. Despite the common goals, the two areas have, however, been relatively isolated. We focus on three problem domains where there is an untapped potential of using gossiping combined with SONs. We argue for applying gossip-based membership for ring-based SONs---such as Chord and Bamboo---to make them handle partition mergers and loopy networks. We argue that small world SONs---such as Accordion and Mercury---are specifically well-suited for gossip-based membership management. The benefits would be better graph-theoretic properties. Finally, we argue that gossip-based algorithms could use the overlay constructed by SONs. For example, many unreliable broadcast algorithms for SONs could be augmented with anti-entropy protocols. Similarly, gossip-based aggregation could be used in SONs for network size estimation and load-balancing purposes

Crossref

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Recommended from our members

Robust and efficient membership management in large-scale dynamic networks

Author: Di Fatta Giuseppe
Poonpakdee Pasu
Publication venue: 'Elsevier BV'
Publication date: 01/10/2017
Field of study

Epidemic protocols are a bio-inspired communication and computation paradigm for large-scale networked systems based on randomised communication. These protocols rely on a membership service to build decentralised and random overlay topologies. In large-scale, dynamic network environments, node churn and failures may have a detrimental effect on the structure of the overlay topologies with negative impact on the efficiency and the accuracy of applications. Most importantly, there exists the risk of a permanent loss of global connectivity that would prevent the correct convergence of applications. This work investigates to what extent a dynamic network environment may negatively affect the performance of Epidemic membership protocols. A novel Enhanced Expander Membership Protocol (EMP+) based on the expansion properties of graphs is presented. The proposed protocol is evaluated against other membership protocols and the comparative analysis shows that EMP+ can support faster application convergence and is the first membership protocol to provide robustness against global network connectivity problems

Central Archive at the University of Reading

Crossref

Gossip-based aggregation in large dynamic networks

Author: Alberto Montresor
Bavier A.
Demers A.
Gupta I.
Jelasity M.
Jelasity M.
Kempe D.
Madden S.
Milojicic D. S.
Montresor A.
Márk Jelasity
Ozalp Babaoglu
van Renesse R.
van Renesse R.
Watts D. J.
Watts D. J.
Yalagandula P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Flow updating: fault-tolerant aggregation for dynamic networks

Author: Almeida Paulo Sérgio
Baquero Carlos
Jesus Paulo Alexandre Marques
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Documento submetido para revisão pelos pares. A publicar em Journal of Parallel and Distributed Computing. ISSN 0743-7315Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches, commonly designated gossip-based, are an important class of aggregation algorithms as they allow all nodes to produce a result, converge to any required accuracy, and work independently from the network topology. However, existing approaches exhibit many dependability issues when used in faulty and dynamic environments. This paper describes and evaluates a fault tolerant distributed aggregation technique, Flow Updating, which overcomes the problems in previous averaging approaches and is able to operate on faulty dynamic networks. Experimental results show that this novel approach out performs previous averaging algorithms; it self-adapts to churn and input value changes without requiring any periodic restart, supporting node crashes and high levels of message loss, and works in asynchronous networks. Realistic concerns have been taken into account in evaluating Flow Updating, like the use of unreliable failure detectors and asynchrony, targeting its application to realistic environments.This work was partially funded by FCT PhD grant SFRH/BD/33232/2007 and by project Norte-01-0124-FEDER- 000058, co-financed by the North Portugal Regional Operational Program (ON.2 O Novo Norte), under the National Strategic Reference Framework (NSRF), through the European Regional Development Fund (ERDF)

Universidade do Minho: RepositoriUM

Crossref