This paper introduces a distributed data mining approach suited to
grid computing environments based on a supervised learning
classifier system. Different methods of merging data mining
models generated at different distributed sites are explored.
Centralized Data Mining (CDM) is a conventional method of data
mining in distributed data. In CDM, data that is stored in
distributed locations have to be collected and stored in a central
repository before executing the data mining algorithm. CDM
method is reliable; however it is expensive (computational,
communicational and implementation costs are high).
Alternatively, Distributed Data Mining (DDM) approach is
economical but it has limitations in combining local models. In
DDM, the data mining algorithm has to be executed at each one of
the sites to induce a local model. Those induced local models are
collected and combined to form a global data mining model. In
this work six different tactics are used for constructing the global
model in DDM: Generalized Classifier Method (GCM); Specific
Classifier Method (SCM); Weighed Classifier Method (WCM);
Majority Voting Method (MVM); Model Sampling Method
(MSM); and Centralized Training Method (CTM). Preliminary
experimental tests were conducted with two synthetic data sets
(eleven multiplexer and monks3) and a real world data set
(intensive care medicine). The initial results demonstrate that the
performance of DDM methods is competitive when compared
with the CDM methods.Fundação para a Ciência e a Tecnologia (FCT