8 research outputs found
Gossip Learning with Linear Models on Fully Distributed Data
Machine learning over fully distributed data poses an important problem in
peer-to-peer (P2P) applications. In this model we have one data record at each
network node, but without the possibility to move raw data due to privacy
considerations. For example, user profiles, ratings, history, or sensor
readings can represent this case. This problem is difficult, because there is
no possibility to learn local models, the system model offers almost no
guarantees for reliability, yet the communication cost needs to be kept low.
Here we propose gossip learning, a generic approach that is based on multiple
models taking random walks over the network in parallel, while applying an
online learning algorithm to improve themselves, and getting combined via
ensemble learning methods. We present an instantiation of this approach for the
case of classification with linear models. Our main contribution is an ensemble
learning method which---through the continuous combination of the models in the
network---implements a virtual weighted voting mechanism over an exponential
number of models at practically no extra cost as compared to independent random
walks. We prove the convergence of the method theoretically, and perform
extensive experiments on benchmark datasets. Our experimental analysis
demonstrates the performance and robustness of the proposed approach.Comment: The paper was published in the journal Concurrency and Computation:
Practice and Experience
http://onlinelibrary.wiley.com/journal/10.1002/%28ISSN%291532-0634 (DOI:
http://dx.doi.org/10.1002/cpe.2858). The modifications are based on the
suggestions from the reviewer
S.: Automatic document organization in a p2p environment
Abstract. This paper describes an efficient method to construct reliable machine learning applications in peer-to-peer (P2P) networks by building ensemble based meta methods. We consider this problem in the context of distributed Web exploration applications like focused crawling. Typical applications are user-specific classification of retrieved Web contents into personalized topic hierarchies as well as automatic refinements of such taxonomies using unsupervised machine learning methods (e.g. clustering). Our approach is to combine models from multiple peers and to construct the advanced decision model that takes the generalization performance of multiple ‘local ’ peer models into account. In addition, meta algorithms can be applied in a restrictive manner, i.e. by leaving out some ‘uncertain ’ documents. The results of our systematic evaluation show the viability of the proposed approach.