Search CORE

700 research outputs found

Dependability in Aggregation by Averaging

Author: Almeida Paulo Sérgio
Baquero Carlos
Jesus Paulo
Publication venue
Publication date: 01/01/2009
Field of study

Aggregation is an important building block of modern distributed applications, allowing the determination of meaningful properties (e.g. network size, total storage capacity, average load, majorities, etc.) that are used to direct the execution of the system. However, the majority of the existing aggregation algorithms exhibit relevant dependability issues, when prospecting their use in real application environments. In this paper, we reveal some dependability issues of aggregation algorithms based on iterative averaging techniques, giving some directions to solve them. This class of algorithms is considered robust (when compared to common tree-based approaches), being independent from the used routing topology and providing an aggregation result at all nodes. However, their robustness is strongly challenged and their correctness often compromised, when changing the assumptions of their working environment to more realistic ones. The correctness of this class of algorithms relies on the maintenance of a fundamental invariant, commonly designated as "mass conservation". We will argue that this main invariant is often broken in practical settings, and that additional mechanisms and modifications are required to maintain it, incurring in some degradation of the algorithms performance. In particular, we discuss the behavior of three representative algorithms Push-Sum Protocol, Push-Pull Gossip protocol and Distributed Random Grouping under asynchronous and faulty (with message loss and node crashes) environments. More specifically, we propose and evaluate two new versions of the Push-Pull Gossip protocol, which solve its message interleaving problem (evidenced even in a synchronous operation mode).Comment: 14 pages. Presented in Inforum 200

arXiv.org e-Print Archive

CiteSeerX

Universidade do Minho: RepositoriUM

Spectra: Robust Estimation of Distribution Functions in Networks

Author: Almeida Paulo Sérgio
Baquero Carlos
Borges Miguel
Jesus Paulo
Publication venue
Publication date: 01/01/2012
Field of study

Distributed aggregation allows the derivation of a given global aggregate property from many individual local values in nodes of an interconnected network system. Simple aggregates such as minima/maxima, counts, sums and averages have been thoroughly studied in the past and are important tools for distributed algorithms and network coordination. Nonetheless, this kind of aggregates may not be comprehensive enough to characterize biased data distributions or when in presence of outliers, making the case for richer estimates of the values on the network. This work presents Spectra, a distributed algorithm for the estimation of distribution functions over large scale networks. The estimate is available at all nodes and the technique depicts important properties, namely: robust when exposed to high levels of message loss, fast convergence speed and fine precision in the estimate. It can also dynamically cope with changes of the sampled local property, not requiring algorithm restarts, and is highly resilient to node churn. The proposed approach is experimentally evaluated and contrasted to a competing state of the art distribution aggregation technique.Comment: Full version of the paper published at 12th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS), Stockholm (Sweden), June 201

arXiv.org e-Print Archive

Universidade do Minho: RepositoriUM

Crossref

Approaches to Conflict-free Replicated Data Types

Author: Almeida Paulo Sérgio
Publication venue
Publication date: 27/10/2023
Field of study

Conflict-free Replicated Data Types (CRDTs) allow optimistic replication in a principled way. Different replicas can proceed independently, being available even under network partitions, and always converging deterministically: replicas that have received the same updates will have equivalent state, even if received in different orders. After a historical tour of the evolution from sequential data types to CRDTs, we present in detail the two main approaches to CRDTs, operation-based and state-based, including two important variations, the pure operation-based and the delta-state based. Intended as a tutorial for prospective CRDT researchers and designers, it provides solid coverage of the essential concepts, clarifying some misconceptions which frequently occur, but also presents some novel insights gained from considerable experience in designing both specific CRDTs and approaches to CRDTs.Comment: 36 page

arXiv.org e-Print Archive

A Case for Partitioned Bloom Filters

Author: Almeida Paulo Sérgio
Publication venue
Publication date: 24/09/2020
Field of study

In a partitioned Bloom Filter the

m

bit vector is split into

k

disjoint

m/k

sized parts, one per hash function. Contrary to hardware designs, where they prevail, software implementations mostly adopt standard Bloom filters, considering partitioned filters slightly worse, due to the slightly larger false positive rate (FPR). In this paper, by performing an in-depth analysis, first we show that the FPR advantage of standard Bloom filters is smaller than thought; more importantly, by studying the per-element FPR, we show that standard Bloom filters have weak spots in the domain: elements which will be tested as false positives much more frequently than expected. This is relevant in scenarios where an element is tested against many filters, e.g., in packet forwarding. Moreover, standard Bloom filters are prone to exhibit extremely weak spots if naive double hashing is used, something occurring in several, even mainstream, libraries. Partitioned Bloom filters exhibit a uniform distribution of the FPR over the domain and are robust to the naive use of double hashing, having no weak spots. Finally, by surveying several usages other than testing set membership, we point out the many advantages of having disjoint parts: they can be individually sampled, extracted, added or retired, leading to superior designs for, e.g., SIMD usage, size reduction, test of set disjointness, or duplicate detection in streams. Partitioned Bloom filters are better, and should replace the standard form, both in general purpose libraries and as the base for novel designs.Comment: 21 page

arXiv.org e-Print Archive

Universidade do Minho: RepositoriUM

Fast Distributed Computation of Distances in Networks

Author: Almeida Paulo Sérgio
Baquero Carlos
Cunha Alcino
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/11/2011
Field of study

This paper presents a distributed algorithm to simultaneously compute the diameter, radius and node eccentricity in all nodes of a synchronous network. Such topological information may be useful as input to configure other algorithms. Previous approaches have been modular, progressing in sequential phases using building blocks such as BFS tree construction, thus incurring longer executions than strictly required. We present an algorithm that, by timely propagation of available estimations, achieves a faster convergence to the correct values. We show local criteria for detecting convergence in each node. The algorithm avoids the creation of BFS trees and simply manipulates sets of node ids and hop counts. For the worst scenario of variable start times, each node i with eccentricity ecc(i) can compute: the node eccentricity in diam(G)+ecc(i)+2 rounds; the diameter in 2*diam(G)+ecc(i)+2 rounds; and the radius in diam(G)+ecc(i)+2*radius(G) rounds.Comment: 12 page

arXiv.org e-Print Archive

Universidade do Minho: RepositoriUM

Crossref

ID generation in mobile environments

Author: Almeida Paulo Sérgio
Baquero Carlos
Jesus Paulo
Publication venue
Publication date: 01/01/2006
Field of study

This work is focused in the ID generation problem in mobile environments. We discuss the suitability of traditional mechanisms and techniques to generate IDs in mobile environments. Based on the “Birthday Problem”, we deduced some formulas to evaluate the ID trust that is directly related to the number of entities in the system. The estimation of the system size revels to be the main problem of our approach. To deal with it, we develop a recursive schema that needs to be evaluated. Alternatively, we also design an aggregation algorithm to estimate the system size, which results are currently been analyzed

Universidade do Minho: RepositoriUM

Prediction tools for student learning assessment in professional schools

Author: Almeida Paulo Sérgio
Neves José
Novais Paulo
Publication venue: EUROSIS-ETI
Publication date: 01/01/2008
Field of study

Professional Schools are in need to access technologies and tools that allow the monitoring of a student evolution course, in acquiring a given skill. Furthermore, they need to be able to predict the presentation of the students on a course before they actually sign up, to either provide them with the extra skills required to succeed, or to adapt the course to the students’ level of knowledge. Based on a knowledge base of student features, the Student Model, a Student Prediction System must be able to produce estimates on whether a student will succeed on a particular course. This tool must rely on a formal methodology for problem solving to estimate a measure of the quality-ofinformation that branches out from students’ profiles, before trying to guess their likelihood of success. Indeed, this paper presents an approach to design a Student Prediction System, which is, in fact, a reasoner, in the sense that, presented with a new problem description (a student outline) it produces a solved problem, i.e., a diagnostic of the student potential of success

Universidade do Minho: RepositoriUM

Towards efficient time-stamping for autonomous versioning

Author: Almeida Paulo Sérgio
Baquero Carlos
Publication venue
Publication date: 01/01/1999
Field of study

We sketch a decentralized versioning scheme that handles the detection of concurrent updates among an arbitrary number of replicas, overcoming the limitations that a centralized knowledge of that number imposes to Mobile Computing

Universidade do Minho: RepositoriUM

Fault-tolerant aggregation for dynamic networks

Author: Almeida Paulo Sérgio
Baquero Carlos
Jesus Paulo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches, commonly designated gossip-based, are an important class of aggregation algorithms as they allow all nodes to produce a result, converge to any required accuracy, and work independently from the network topology. However, existing approaches exhibit many dependability issues when used in faulty and dynamic environments. This paper extends our own technique, Flow Updating, which is immune to message loss, to operate in dynamic networks, improving its fault tolerance characteristics. Experimental results show that the novel version of Flow Updating vastly outperforms previous averaging algorithms, it self adapts to churn without requiring any periodic restart, supporting node crashes and high levels of message loss

CiteSeerX

Universidade do Minho: RepositoriUM

Crossref