Social choice in distributed classification tasks: dealing with vertically partitioned data∗

Abstract

In many situations, a centralized, conventional classification task can not be performed because the data is not available in a central facility. In such cases, we are dealing with distributed data mining problems, in which local models must be individually built and later combined into a consensus, global model. In this paper, we are partic-ularly interested in distributed classification tasks with vertically par-titioned data, i.e., when features are distributed among several sources. This restriction implies a challenging scenario given that the develop-ment of an accurate model usually requires access to all the features that are relevant for classification. To deal with such a situation, we propose an agent-based classification system in which the preference or-derings of each agent regarding the probability of an instance to belong to the target class are aggregated by means of social choice functions. We employ this method to classify microRNA target genes, an impor-tant bioinformatics problem, showing that the predictions derived from the social choice tend to outperform local models. This performance gain is accompanied by others interesting advantages: the combination methods herein proposed are extremely simple, do not require transfer of large volumes of data, do not assume an offline training process or parameters setup, and preserves data privacy

Similar works

Full text

thumbnail-image

CiteSeerX

redirect
Last time updated on 30/10/2017

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.