3 research outputs found
Performance evaluation of a distributed clustering approach for spatial datasets
The analysis of big data requires powerful, scalable, and accurate data analytics techniques that the traditional data mining and
machine learning do not have as a whole. Therefore, new data analytics frameworks are needed to deal with the big data challenges such as
volumes, velocity, veracity, variety of the data. Distributed data mining
constitutes a promising approach for big data sets, as they are usually
produced in distributed locations, and processing them on their local
sites will reduce significantly the response times, communications, etc. In
this paper, we propose to study the performance of a distributed clustering, called Dynamic Distributed Clustering (DDC). DDC has the ability
to remotely generate clusters and then aggregate them using an efficient
aggregation algorithm. The technique is developed for spatial datasets.
We evaluated the DDC using two types of communications (synchronous
and asynchronous), and tested using various load distributions. The experimental results show that the approach has super-linear speed-up,
scales up very well, and can take advantage of the recent programming
models, such as MapReduce model, as its results are not affected by the
types of communication
Performance Evaluation of a Distributed Clustering Approach for Spatial Datasets
AusDM 2017: 15th Australasian Conference, Melbourne, VIC, Australia, 19-20 August 2017The analysis of big data requires powerful, scalable, and accurate data analytics techniques that the traditional data mining and machine learning do not have as a whole. Therefore, new data analytics frameworks are needed to deal with the big data challenges such as volumes, velocity, veracity, variety of the data. Distributed data mining constitutes a promising approach for big data sets, as they are usually produced in distributed locations, and processing them on their local sites will reduce significantly the response times, communications, etc. In this paper, we propose to study the performance of a distributed clustering, called Dynamic Distributed Clustering (DDC). DDC has the ability to remotely generate clusters and then aggregate them using an efficient aggregation algorithm. The technique is developed for spatial datasets. We evaluated the DDC using two types of communications (synchronous and asynchronous), and tested using various load distributions. The experimental results show that the approach has super-linear speed-up, scales up very well, and can take advantage of the recent programming models, such as MapReduce model, as its results are not affected by the types of communications.Science Foundation IrelandInsight Research Centr
Performance Evaluation of a Distributed Clustering Approach for Spatial Datasets
AusDM 2017: 15th Australasian Conference, Melbourne, VIC, Australia, 19-20 August 2017The analysis of big data requires powerful, scalable, and accurate data analytics techniques that the traditional data mining and machine learning do not have as a whole. Therefore, new data analytics frameworks are needed to deal with the big data challenges such as volumes, velocity, veracity, variety of the data. Distributed data mining constitutes a promising approach for big data sets, as they are usually produced in distributed locations, and processing them on their local sites will reduce significantly the response times, communications, etc. In this paper, we propose to study the performance of a distributed clustering, called Dynamic Distributed Clustering (DDC). DDC has the ability to remotely generate clusters and then aggregate them using an efficient aggregation algorithm. The technique is developed for spatial datasets. We evaluated the DDC using two types of communications (synchronous and asynchronous), and tested using various load distributions. The experimental results show that the approach has super-linear speed-up, scales up very well, and can take advantage of the recent programming models, such as MapReduce model, as its results are not affected by the types of communications.Science Foundation IrelandInsight Research Centr