Search CORE

1 research outputs found

Clustering categorical data based on the relational analysis approach and MapReduce

Author: Said Chah Slaoui
Yasmine Lamari
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2017
Field of study

Abstract The traditional methods of clustering are unable to cope with the exploding volume of data that the world is currently facing. As a solution to this problem, the research is intensified in the direction of parallel clustering methods. Although there is a variety of parallel programming models, the MapReduce paradigm is considered as the most prominent model for problems of large scale data processing of which the clustering. This paper introduces a new parallel design of a recently appeared heuristic for hard clustering using the MapReduce programming model. In this heuristic, clustering is performed by efficiently partitioning categorical large data sets according to the relational analysis approach. The proposed design, called PMR-Transitive, is a single-scan and parameter-free heuristic which determines the number of clusters automatically. The experimental results on real-life and synthetic data sets demonstrate that PMR-Transitive produces good quality results

Directory of Open Access Journals