Parallel data mining-case study

Abstract

Abstract. The continuing rapid growth of data and knowledge in scientific domain has spurred huge interest in distributed/parallel data and text mining. This paper reports the investigation of a large scale data mining application to supercomputing environment. The aim is to explore some of the issues that may arise in porting and working with the C++/MPI implementation of the ensemble knn application on supercomputers. In this paper we evaluate behaviour of MFS on several large data sets. The aim of this study is to identify how the performance of the ensemble application depends on the nature of the algorithm used, and on the characteristics of the datasets and the analysis to be performed. This can then be used to select the most appropriate algorithm for a given analysis/dataset, and to indicate the optimum number of processors to be used

    Similar works

    Full text

    thumbnail-image