Location of Repository

A distributed scenario can be of two types: (1) homogeneous – where only a fraction of each feature is observed at every site or (2) heterogeneous – where only some of the features are observed at each site. For either scenario centralizing all the data in order to build a global model is not an appropriate solution due to the high cost of centralizing and storage requirement at the central node. Therefore, distributed algorithms are required to solve most data mining problems in p2p networks. In general, a distributed algorithm in this setting should (1) not require global synchronization, (2) be communication efficient, and (3) be resilient to moderate changes in the network topology. We proposed Gossip based probabilistic approximate algorithm. This algorithm relies on properties of random walk on network to provide estimates for various estimates for various statistics of data stored in network. The computation result of this algorithm is exponentially fast. The most important quality of Gossip based probabilistic approximate algorithm is that they provide probabilistic guarantees for the accuracy of result. Keywords-- Distributed data mining, inner product, peer-to-peer network I

Year: 2014

OAI identifier:
oai:CiteSeerX.psu:10.1.1.414.1078

Provided by:
CiteSeerX

Download PDF:To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.