research

Towards Data Mining in Large and Fully Distributed Peer-To-Peer Overlay Networks

Abstract

The Internet, which is becoming a more and more dynamic, extremely heterogeneous network has recently became a platform for huge fully distributed peer-to-peer overlay networks containing millions of nodes typically for the purpose of information dissemination and file sharing. This paper targets the problem of analyzing data which are scattered over a such huge and dynamic set of nodes, where each node is storing possibly very little data but where the total amount of data is immense due to the large number of nodes. We present distributed algorithms for effectively calculating basic statistics of data using the recently introduced newscast model of computation and we demonstrate how to implement basic data mining algorithms based on these techniques. We will argue that the suggested techniques are efficient, robust and scalable and that they preserve the privacy of data

    Similar works