In this paper, a new approach for centralised and distributed learning from spatial heterogeneous databases is proposed. The centralised algorithm consists of a spatial clustering followed by local regression aimed at learning relationships between driving attributes and the target variable inside each region identified through clustering. For distributed learning, similar regions in multiple databases are first discovered by applying a spatial clustering algorithm independently on all sites, and then identifying corresponding clusters on participating sites. Local regression models are built on identified clusters and transferred among the sites for combining the models responsible for identified regions. Extensive experiments on spatial data sets with missing and irrelevant attributes, and with different levels of noise, resulted in a higher prediction accuracy of both centralised and distributed methods, as compared to using global models. In addition, experiments performed indicate that both methods are computationally more efficient than the global approach, due to the smaller data sets used for learning. Furthermore, the accuracy of the distributed method was comparable to the centralised approach, thus providing a viable alternative to moving all data to a central location
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.