91 research outputs found

    Agreement in epidemic data aggregation

    Get PDF
    Computing and spreading global information in large-scale distributed systems pose significant challenges when scalability, parallelism, resilience and consistency are demanded. Epidemic protocols are a robust and scalable computing and communication paradigm that can be effectively used for information dissemination and data aggregation in a fully decentralised context where each network node requires the local computation of a global synopsis function. Theoretical analysis of epidemic protocols for synchronous and static network models provide guarantees on the convergence to a global target and on the consistency among the network nodes. However, practical applications in real-world networks may require the explicit detection of both local convergence and global agreement (consensus). This work introduces the Epidemic Consensus Protocol (ECP) for the determination of consensus on the convergence of a decentralised data aggregation task. ECP adopts a heuristic method to locally detect convergence of the aggregation task and stochastic phase transitions to detect global agreement and reach consensus. The performance of ECP has been investigated by means of simulations and compared to a tree-based Three-Phase Commit protocol (3PC). Although, as expected, ECP exhibits total communication costs greater than the optimal tree-based protocol, it is shown to have better performance and scalability properties; ECP can achieve faster convergence to consensus for large system sizes and inherits the intrinsic decentralisation, fault-tolerance and robustness properties of epidemic protocols

    Dependability in Aggregation by Averaging

    Get PDF
    Aggregation is an important building block of modern distributed applications, allowing the determination of meaningful properties (e.g. network size, total storage capacity, average load, majorities, etc.) that are used to direct the execution of the system. However, the majority of the existing aggregation algorithms exhibit relevant dependability issues, when prospecting their use in real application environments. In this paper, we reveal some dependability issues of aggregation algorithms based on iterative averaging techniques, giving some directions to solve them. This class of algorithms is considered robust (when compared to common tree-based approaches), being independent from the used routing topology and providing an aggregation result at all nodes. However, their robustness is strongly challenged and their correctness often compromised, when changing the assumptions of their working environment to more realistic ones. The correctness of this class of algorithms relies on the maintenance of a fundamental invariant, commonly designated as "mass conservation". We will argue that this main invariant is often broken in practical settings, and that additional mechanisms and modifications are required to maintain it, incurring in some degradation of the algorithms performance. In particular, we discuss the behavior of three representative algorithms Push-Sum Protocol, Push-Pull Gossip protocol and Distributed Random Grouping under asynchronous and faulty (with message loss and node crashes) environments. More specifically, we propose and evaluate two new versions of the Push-Pull Gossip protocol, which solve its message interleaving problem (evidenced even in a synchronous operation mode).Comment: 14 pages. Presented in Inforum 200

    Asynchronous epidemic algorithms for consistency in large-scale systems

    Get PDF
    Achieving and detecting a globally consistent state is essential to many services in the large and extreme-scale distributed systems, especially when the desired consistent state is critical for services operation. Centralised and deterministic approaches for synchronisation and distributed consistency are not scalable and not fault-tolerant. Alternatively, epidemic-based paradigms are decentralised computations based on randomised communications. They are scalable, resilient, fault-tolerant, and converge to the desired target in logarithmic time with respect to system size. Thus, many distributed services have adopted epidemic protocols to achieve the consensus and the consistent state, mainly due to scalability concerns. The convergence of epidemic protocols is stochastically guaranteed. However, the detection of the convergence is probabilistic and non-explicit. In a real-world environment, systems are unreliable, and epidemic protocols cannot converge to the desired state. Thus, achieving convergence by itself does not ensure making a system-wide consistent state under dynamic conditions. The research work presented in this thesis introduces the Phase Transition Algorithm (PTA) to achieve distributed consistent state based on the explicit detection of convergence. Each phase in PTA is a decentralised decision-making process that implements epidemic data aggregation, in which the detection of convergence implies achieving a global agreement. The phases in PTA can be cascaded to achieve higher certainty as desired. Following the PTA, two epidemic protocols, namely PTP and ECP, are proposed to acquire of consensus, i.e. for the consistency in data dissemination and data aggregation. The protocols are examined through simulations, and experimental results have validated the protocols ability to achieve and explicitly detect the consensus among system nodes. The research work has also studied the epidemic data aggregation under nodes churn and network failures, in which the analysis has identified three phases of the aggregation process. The investigations have shown a different impact of nodes churn on each phase. The phase that is critical for the aggregation process has been studied further, which led to propose new robust data aggregation protocols, REAP and REAP+. Each protocol has a different decentralised replication method, and both implements distributed failure detection and instantaneous mass restoration mechanisms. Simulations have validated the protocols, and results have shown protocols ability to converge, detect convergence, and produce competitive accuracy under various levels of nodes churn. Furthermore, distributed consistency in continuous systems is addressed in the research. The work has proposed a novel continuous epidemic protocol with the adaptive restart mechanism. The protocol restarts either upon the detection of system convergence or upon the detection of divergence. Also, the protocol introduces the seed selection method for the peak data distribution in decentralised approaches, which was a challenge that requires single-point initialisation and leader-election step. The simulations validated the performance of the algorithm under static and dynamic conditions and approved that convergence and divergence detection accuracy can be tuned as desired. Finally, the research work shows that combining and integrating of the proposed protocols enables extreme-scale distributed systems to achieve and detect global consistent states even under realistic and dynamical conditions

    Agreement in epidemic information dissemination

    Get PDF
    Consensus is one of the fundamental problems in multi-agent systems and distributed computing, in which agents or processing nodes are required to reach global agreement on some data value, decision, action, or synchronisation. In the absence of centralised coordination, achieving global consensus is challenging especially in dynamic and large-scale distributed systems with faulty processes. This paper presents a fully decentralised phase transition protocol to achieve global consensus on the convergence of an underlying information dissemination process. The proposed approach is based on Epidemic protocols, which are a randomised communication and computation paradigm and provide excellent scalability and fault-tolerant properties. The experimental analysis is based on simulations of a large-scale information dissemination process and the results show that global agreement can be achieved without deterministic and global communication patterns, such as those based on centralised coordination

    Rapid and Round-free Multi-pair Asynchronous Push-Pull Aggregation

    Get PDF
    As various distributed algorithms and services demand overall information on large scale networks, the protocols that aggregate data over networks are essential, and the quality of aggregations determines the quality of those distributed algorithms and services. Though a variety of aggregation protocols have been proposed, gossip-based iterative aggregations have outstanding advantages especially in accuracy, result distribution, topology-independence, and resilience to network churns. However, most of iterative aggregations, especially push-pull style aggregations, suffer from two synchronization constraints: synchronized rounds and synchronized communication. Namely, iterative protocols generally need prior configurations to synchronize rounds over all nodes, and messages should be exchanged in a synchronous way in order to ensure accurate estimates in push-pull or push-sum protocols. This paper proposes multi-pair asynchronous push-pull aggregation (MAPPA), which liberates the push-pull aggregations from the synchronization constraints, and pursues a way to accelerate the aggregation speed. MAPPA considerably reduces aggregation times, and shows an improvement in fault-tolerance. Thanks to topology independence, inherent from gossip mechanisms, and its rapidness, MAPPA is resilient to network churns, and thus suitable for dynamic networks

    Robust distributed data aggregation

    Get PDF
    Tese de doutoramento Programa Doutoral em Informática MAP-iDistributed aggregation algorithms are an important building block of modern large scale systems, as it allows the determination of meaningful system-wide properties (e.g., network size, total storage capacity, average load, or majorities) which are required to direct the execution of distributed applications. In the last decade, several algorithms have been proposed to address the distributed computation of aggregation functions (e.g., COUNT, SUM, AVERAGE, and MAX/MIN), exhibiting different properties in terms of accuracy, speed and communication tradeoffs. However, existing approaches exhibit many issues when challenged in faulty and dynamic environments, lacking in terms of fault-tolerance and support to churn. This study details a novel distributed aggregation approach, named Flow Updating, which is fault-tolerant and able to operate on dynamics networks. The algorithm is based on manipulating flows (inspired by the concept from graph theory), that are updated using idempotent messages, providing it with unique robustness capabilities. Experimental results showed that Flow Updating outperforms previous averaging algorithms in terms of time and message complexity, and unlike them it self adapts to churn and changes of the initial input values without requiring any periodic restart, supporting node crashes and high levels of message loss. In addition to this main contribution, others can also be found in this research work, namely: a definition of the aggregation problem is proposed; existing distributed aggregation algorithm are surveyed and classified into a comprehensive taxonomy; a novel algorithm is introduced, based on Flow Updating, to estimate the Cumulative Distribution Function (CDF) of a global system attribute. It is expected that this work will constitute a relevant contribution to the area of distributed computing, in particular to the robust distributed computation of aggregation functions in dynamic networks.Os algoritmos de agregação distribuídos têm um papel importante no desenho dos sistemas de larga escala modernos, uma vez que permitem determinar o valor de propriedades globais do sistema (e.g., tamanho da rede, capacidade total de armazenamento, carga média, ou maiorias) que são fundamentais para a execução de outras aplicações distribuídas. Ao longo da última década, diversos algoritmos têm sido propostos para calcular funções de agregação (e.g., CONTAGEM, SOMA, M´E DIA, ou MIN/MAX), possuindo diferentes características em termos de precisão, velocidade e comunicação. No entanto, as técnicas existentes exibem vários problemas quando executadas em ambientes com faltas e dinâmicos, deixando a desejar em termos de tolerância a faltas e não suportando a entrada/saída de nós. Este estudo descreve detalhadamente uma nova abordagem para calcular funções de agregação de forma distribuída, denominada Flow Updating, que é tolerante a faltas e capaz de operar em redes dinámicas. O algoritmo é baseada na manipulacão de fluxos (inspirado no conceito da teoria de grafos), que são atualizados por mensagens idempotentes, conferindo-lhe capacidades unicas em termos de robustez. Os resultados experimentais demonstram que o Flow Updating supera os anteriores algoritmos baseados em técnicas de averaging em termos de complexidade de tempo e mensagens, e, ao contrário destes, auto adapta-se a mudanc¸as da rede (i.e., entrada/saída de nós e alteraçãoo dos valores iniciais) sem necessitar de reiniciar periodicamente a sua execuçãoo, suportando falhas de nos e elevados níveis de perdas de mensagens. Para além desta contribuição principal, outras são também encontradas neste trabalho, nomeadamente: é proposta uma definição do problema da agregação; é descrito o estado da arte em termos dos algoritmos de agregação distribuídos, e estes são classificados numa taxonomia abrangente; é apresentado um novo algoritmo baseado no Flow Updating para estimar a Funcão de Distribuição Cumulativa (CDF) de um atributo global do sistema

    Ordering, timeliness and reliability for publish/subscribe systems over WAN

    Get PDF
    In the last few years, the increasing use of the Internet and geo-political, sociological and financial changes induced by globalization, are paving the way for a connected world where the information is always available at the right place and the right time. As such, applications previously deployed for ``closed'' environmets, are now federating into geographically distributed systems connected through a Wide Area Network (WAN). By this evolution, in the near future no system will be isolated: every system will be composed by interconnected systems, i.e., it will be a System of Systems (SoS). Example of SoS are the Large-scale Complex Critical Infrastructure (LCCIs), such as power grids, transport infrastructures (airports and seaports), financial infrastructures, next generation intelligence platforms, to cite a few. In these systems, multiple sources of information generate a high volume of events that need to be delivered to all intended destinations by respecting several Quality of Service (QoS) constraints imposed by the critical nature of LCCIs. As such, particular attention is devoted to the middleware solution used to disseminate information in the SoS. Due to its inherently scalability provided by space, time and synchronization decoupling properties, the publish/subscribe paradigm is becoming attractive for the implementation of a middleware service for LCCIs. However, scalability is not the only requirement exhibited by SoS. Several services need to control a broader set of QoS requirements, such as timeliness, ordering and reliability. Unfortunately, current middleware solutions do not address QoS constraints required by SoS. Current publish/subscribe middleware solutions for the WAN environment offer only a best effort event dissemination, with no additional control on QoS. Just a few implementations try to address some isolated QoS policy, making them not suitable for a SoS scenario. The contribution of this thesis is to devise a QoS layer that can be posed on top of a generic publish/subscribe middleware that enriches its service by addressing: (i) ordering, (ii) reliability and (iii) timeliness in event dissemination in SoS over WAN. Specifically, we first analyze several real case studies, by highlighting their QoS requirements in terms of ordering, reliability and timeliness, and compare these requirements with both current research prototypes and commercial systems. Then, we fill the gap by proposing novel algorithms to address those requirements. The proposed protocols can also be combined together in order to provide the QoS level required by the particular application. In this way, QoS issues do not need to be addressed at application level, so as to leave applications to implement just their native functionalities

    Ordering, timeliness and reliability for publish/subscribe systems over WAN

    Get PDF
    In the last few years, the increasing use of the Internet and geo-political, sociological and financial changes induced by globalization, are paving the way for a connected world where the information is always available at the right place and the right time. As such, applications previously deployed for ``closed'' environmets, are now federating into geographically distributed systems connected through a Wide Area Network (WAN). By this evolution, in the near future no system will be isolated: every system will be composed by interconnected systems, i.e., it will be a System of Systems (SoS). Example of SoS are the Large-scale Complex Critical Infrastructure (LCCIs), such as power grids, transport infrastructures (airports and seaports), financial infrastructures, next generation intelligence platforms, to cite a few. In these systems, multiple sources of information generate a high volume of events that need to be delivered to all intended destinations by respecting several Quality of Service (QoS) constraints imposed by the critical nature of LCCIs. As such, particular attention is devoted to the middleware solution used to disseminate information in the SoS. Due to its inherently scalability provided by space, time and synchronization decoupling properties, the publish/subscribe paradigm is becoming attractive for the implementation of a middleware service for LCCIs. However, scalability is not the only requirement exhibited by SoS. Several services need to control a broader set of QoS requirements, such as timeliness, ordering and reliability. Unfortunately, current middleware solutions do not address QoS constraints required by SoS. Current publish/subscribe middleware solutions for the WAN environment offer only a best effort event dissemination, with no additional control on QoS. Just a few implementations try to address some isolated QoS policy, making them not suitable for a SoS scenario. The contribution of this thesis is to devise a QoS layer that can be posed on top of a generic publish/subscribe middleware that enriches its service by addressing: (i) ordering, (ii) reliability and (iii) timeliness in event dissemination in SoS over WAN. Specifically, we first analyze several real case studies, by highlighting their QoS requirements in terms of ordering, reliability and timeliness, and compare these requirements with both current research prototypes and commercial systems. Then, we fill the gap by proposing novel algorithms to address those requirements. The proposed protocols can also be combined together in order to provide the QoS level required by the particular application. In this way, QoS issues do not need to be addressed at application level, so as to leave applications to implement just their native functionalities

    Citizen science and Lepidoptera biodiversity change in Great Britain

    Get PDF
    A considerable body of scientific evidence shows that the world is currently suffering a biodiversity crisis driven by anthropogenic factors such as land-use change, environmental pollution and climate change. Our knowledge of this crisis is incomplete, however, particularly when it comes to the most diverse multi-cellular organisms on the planet, the insects. Although there is evidence of decline in the abundance, distribution and biomass of many insect species, recent attempts to extrapolate these to global scales and encourage a policy response have been met with scepticism. More data are required, together with reliable methods to integrate and interpret them. In parallel, evidence-based conservation initiatives are urgently needed to address the biodiversity crisis. Citizen science has great promise for gathering much-needed data on insect trends and for engaging the public in biodiversity conservation. Citizen science has undergone a rapid rise in popularity over the past two decades, increasing the capacity for cost-effective, spatially-extensive biodiversity monitoring, while also raising awareness and commitment to nature conservation among participating members of the public. However, citizen science approaches can also present challenges, such as reductions in data quality, constraints in sampling strategies and in the onward reuse of data. In this thesis, citizen science monitoring of Great Britain’s (GB) moths and butterflies is examined as a case study, assessing some of the benefits and limitations of increased participation and demonstrating applications of citizen science data in determining species trends, drivers of change and estimates of extinction risk. Overall moth abundance has decreased in GB, probably mainly as a result of habitat degradation, while climate change has enabled the range expansion of some species (Chapter 2). Much remains to be learnt about other potential drivers of change, such as chemical pollution and artificial light at night (Chapter 2). I demonstrated the efficacy of citizen science by calculating GB distribution trends for 673 moth species for the first time, finding that 260 species had undergone statistically significant long-term declines compared with 160 that had increased significantly (Chapter 3). The geographical patterns of change were consistent with expected responses to land-use, nutrient enrichment and climatic change (Chapter 3). I also utilised citizen-science derived monitoring data for 485 Lepidoptera species to investigate the impact of insect population variability on the assessment of Red List extinction risk using 10-year trends as specified by the International Union for Conservation of Nature procedure (Chapter 5). I concluded that for these taxa, strict use of 10-year trends produces Red List classifications that are unacceptably biased by the start year (Chapter 5). In Chapter 4, I showed that mass-participation citizen science data obtained using a simple sampling protocol produced comparable estimates of butterfly species abundance to data collected through standardized monitoring undertaken by experienced volunteers. Resulting increases in participation, along with the associated benefits of public engagement and awareness raising, need not have a detrimental impact on the ability to detect abundance trends in common butterfly species. However, citizen science participation may affect the onward use of data, unless this is considered at the outset. I found that despite support in principle for open access to distribution records of butterflies and moths, most citizen scientists were much more cautious in practice, preferring to limit the spatial resolution of records, particularly of threatened species, and restrict commercial reuse of data (Chapter 6). Overall, these results demonstrate the potential for citizen science, involving both expert volunteer naturalists and inexperienced members of the public, to address the global biodiversity knowledge gap through generating meaningful trend estimates for insect species and elucidating the drivers of change
    • …
    corecore