700 research outputs found
Dependability in Aggregation by Averaging
Aggregation is an important building block of modern distributed
applications, allowing the determination of meaningful properties (e.g. network
size, total storage capacity, average load, majorities, etc.) that are used to
direct the execution of the system. However, the majority of the existing
aggregation algorithms exhibit relevant dependability issues, when prospecting
their use in real application environments. In this paper, we reveal some
dependability issues of aggregation algorithms based on iterative averaging
techniques, giving some directions to solve them. This class of algorithms is
considered robust (when compared to common tree-based approaches), being
independent from the used routing topology and providing an aggregation result
at all nodes. However, their robustness is strongly challenged and their
correctness often compromised, when changing the assumptions of their working
environment to more realistic ones. The correctness of this class of algorithms
relies on the maintenance of a fundamental invariant, commonly designated as
"mass conservation". We will argue that this main invariant is often broken in
practical settings, and that additional mechanisms and modifications are
required to maintain it, incurring in some degradation of the algorithms
performance. In particular, we discuss the behavior of three representative
algorithms Push-Sum Protocol, Push-Pull Gossip protocol and Distributed Random
Grouping under asynchronous and faulty (with message loss and node crashes)
environments. More specifically, we propose and evaluate two new versions of
the Push-Pull Gossip protocol, which solve its message interleaving problem
(evidenced even in a synchronous operation mode).Comment: 14 pages. Presented in Inforum 200
Spectra: Robust Estimation of Distribution Functions in Networks
Distributed aggregation allows the derivation of a given global aggregate
property from many individual local values in nodes of an interconnected
network system. Simple aggregates such as minima/maxima, counts, sums and
averages have been thoroughly studied in the past and are important tools for
distributed algorithms and network coordination. Nonetheless, this kind of
aggregates may not be comprehensive enough to characterize biased data
distributions or when in presence of outliers, making the case for richer
estimates of the values on the network. This work presents Spectra, a
distributed algorithm for the estimation of distribution functions over large
scale networks. The estimate is available at all nodes and the technique
depicts important properties, namely: robust when exposed to high levels of
message loss, fast convergence speed and fine precision in the estimate. It can
also dynamically cope with changes of the sampled local property, not requiring
algorithm restarts, and is highly resilient to node churn. The proposed
approach is experimentally evaluated and contrasted to a competing state of the
art distribution aggregation technique.Comment: Full version of the paper published at 12th IFIP International
Conference on Distributed Applications and Interoperable Systems (DAIS),
Stockholm (Sweden), June 201
Approaches to Conflict-free Replicated Data Types
Conflict-free Replicated Data Types (CRDTs) allow optimistic replication in a
principled way. Different replicas can proceed independently, being available
even under network partitions, and always converging deterministically:
replicas that have received the same updates will have equivalent state, even
if received in different orders. After a historical tour of the evolution from
sequential data types to CRDTs, we present in detail the two main approaches to
CRDTs, operation-based and state-based, including two important variations, the
pure operation-based and the delta-state based. Intended as a tutorial for
prospective CRDT researchers and designers, it provides solid coverage of the
essential concepts, clarifying some misconceptions which frequently occur, but
also presents some novel insights gained from considerable experience in
designing both specific CRDTs and approaches to CRDTs.Comment: 36 page
A Case for Partitioned Bloom Filters
In a partitioned Bloom Filter the bit vector is split into disjoint
sized parts, one per hash function. Contrary to hardware designs, where
they prevail, software implementations mostly adopt standard Bloom filters,
considering partitioned filters slightly worse, due to the slightly larger
false positive rate (FPR). In this paper, by performing an in-depth analysis,
first we show that the FPR advantage of standard Bloom filters is smaller than
thought; more importantly, by studying the per-element FPR, we show that
standard Bloom filters have weak spots in the domain: elements which will be
tested as false positives much more frequently than expected. This is relevant
in scenarios where an element is tested against many filters, e.g., in packet
forwarding. Moreover, standard Bloom filters are prone to exhibit extremely
weak spots if naive double hashing is used, something occurring in several,
even mainstream, libraries. Partitioned Bloom filters exhibit a uniform
distribution of the FPR over the domain and are robust to the naive use of
double hashing, having no weak spots. Finally, by surveying several usages
other than testing set membership, we point out the many advantages of having
disjoint parts: they can be individually sampled, extracted, added or retired,
leading to superior designs for, e.g., SIMD usage, size reduction, test of set
disjointness, or duplicate detection in streams. Partitioned Bloom filters are
better, and should replace the standard form, both in general purpose libraries
and as the base for novel designs.Comment: 21 page
Fast Distributed Computation of Distances in Networks
This paper presents a distributed algorithm to simultaneously compute the
diameter, radius and node eccentricity in all nodes of a synchronous network.
Such topological information may be useful as input to configure other
algorithms. Previous approaches have been modular, progressing in sequential
phases using building blocks such as BFS tree construction, thus incurring
longer executions than strictly required. We present an algorithm that, by
timely propagation of available estimations, achieves a faster convergence to
the correct values. We show local criteria for detecting convergence in each
node. The algorithm avoids the creation of BFS trees and simply manipulates
sets of node ids and hop counts. For the worst scenario of variable start
times, each node i with eccentricity ecc(i) can compute: the node eccentricity
in diam(G)+ecc(i)+2 rounds; the diameter in 2*diam(G)+ecc(i)+2 rounds; and the
radius in diam(G)+ecc(i)+2*radius(G) rounds.Comment: 12 page
ID generation in mobile environments
This work is focused in the ID generation problem in mobile environments. We discuss the suitability of traditional mechanisms and techniques to generate IDs in mobile environments. Based on the “Birthday Problem”, we deduced some formulas to evaluate the ID trust that is directly related to the number of entities in the system. The estimation of the system size revels to be the main problem of our approach. To deal with it, we develop a recursive schema that needs to be evaluated. Alternatively, we also design an aggregation algorithm to estimate the system size, which results are currently been analyzed
Prediction tools for student learning assessment in professional schools
Professional Schools are in need to access technologies and
tools that allow the monitoring of a student evolution course,
in acquiring a given skill. Furthermore, they need to be able
to predict the presentation of the students on a course before
they actually sign up, to either provide them with the extra
skills required to succeed, or to adapt the course to the
students’ level of knowledge.
Based on a knowledge base of student features, the Student
Model, a Student Prediction System must be able to produce
estimates on whether a student will succeed on a particular
course. This tool must rely on a formal methodology for
problem solving to estimate a measure of the quality-ofinformation
that branches out from students’ profiles, before
trying to guess their likelihood of success.
Indeed, this paper presents an approach to design a Student
Prediction System, which is, in fact, a reasoner, in the sense
that, presented with a new problem description (a student
outline) it produces a solved problem, i.e., a diagnostic of
the student potential of success
Towards efficient time-stamping for autonomous versioning
We sketch a decentralized versioning scheme that handles the detection of concurrent updates among an arbitrary number of replicas, overcoming the limitations that a centralized knowledge of that number imposes to Mobile Computing
Fault-tolerant aggregation for dynamic networks
Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches, commonly designated gossip-based, are an important class of aggregation algorithms as they allow all nodes to produce a result, converge to any required accuracy, and work independently from the network topology. However, existing approaches exhibit many dependability issues when used in faulty and dynamic environments. This paper extends our own technique, Flow Updating, which is immune to message loss, to operate in dynamic networks, improving its fault tolerance characteristics. Experimental results show that the novel version of Flow Updating vastly outperforms previous averaging algorithms, it self adapts to churn without requiring any periodic restart, supporting node crashes and high levels of message loss
- …