Dynamic load balancing in parallel KD-tree k-means

Di Fatta, Giuseppe; Pettinger, David

research

oai:centaur.reading.ac.uk:6127

Dynamic load balancing in parallel KD-tree k-means

Authors: Giuseppe Di Fatta
David Pettinger
Publication date: 30 June 2010
Publisher: IEEE
Doi

Abstract

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy

Similar works

Full text

Open in the Core reader

Download PDF

Central Archive at the University of Reading

oai:centaur.reading.ac.uk:6127

Last time updated on 01/07/2012

This paper was published in Central Archive at the University of Reading.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.