2 research outputs found
Mining frequent items in unstructured P2P networks
Large scale decentralized systems, such as P2P, sensor or IoT device networks
are becoming increasingly common, and require robust protocols to address the
challenges posed by the distribution of data and the large number of peers
belonging to the network. In this paper, we deal with the problem of mining
frequent items in unstructured P2P networks. This problem, of practical
importance, has many useful applications. We design P2PSS, a fully
decentralized, gossip--based protocol for frequent items discovery, leveraging
the Space-Saving algorithm. We formally prove the correctness and theoretical
error bound. Extensive experimental results clearly show that P2PSS provides
very good accuracy and scalability, also in the presence of highly dynamic P2P
networks with churning. To the best of our knowledge, this is the first
gossip--based distributed algorithm providing strong theoretical guarantees for
both the Approximate Frequent Items Problem in Unstructured P2P Networks and
for the frequency estimation of discovered frequent items
Distributed mining of time--faded heavy hitters
We present \textsc{P2PTFHH} (Peer--to--Peer Time--Faded Heavy Hitters) which,
to the best of our knowledge, is the first distributed algorithm for mining
time--faded heavy hitters on unstructured P2P networks. \textsc{P2PTFHH} is
based on the \textsc{FDCMSS} (Forward Decay Count--Min Space-Saving) sequential
algorithm, and efficiently exploits an averaging gossip protocol, by merging in
each interaction the involved peers' underlying data structures. We formally
prove the convergence and correctness properties of our distributed algorithm
and show that it is fast and simple to implement. Extensive experimental
results confirm that \textsc{P2PTFHH} retains the extreme accuracy and error
bound provided by \textsc{FDCMSS} whilst showing excellent scalability. Our
contributions are three-fold: (i) we prove that the averaging gossip protocol
can be used jointly with our augmented sketch data structure for mining
time--faded heavy hitters; (ii) we prove the error bounds on frequency
estimation; (iii) we experimentally prove that \textsc{P2PTFHH} is extremely
accurate and fast, allowing near real time processing of large datasets.Comment: arXiv admin note: text overlap with arXiv:1806.0658