3 research outputs found
Hybrid Cloud-Based Privacy Preserving Clustering as Service for Enterprise Big Data
Clustering as service is being offered by many cloud service providers. It helps enterprises to learn hidden patterns and learn knowledge from large, big data generated by enterprises. Though it brings lot of value to enterprises, it also exposes the data to various security and privacy threats. Privacy preserving clustering is being proposed a solution to address this problem. But the privacy preserving clustering as outsourced service model involves too much overhead on querying user, lacks adaptivity to incremental data and involves frequent interaction between service provider and the querying user. There is also a lack of personalization to clustering by the querying user. This work “Locality Sensitive Hashing for Transformed Dataset (LSHTD)” proposes a hybrid cloud-based clustering as service model for streaming data that address the problems in the existing model such as privacy preserving k-means clustering outsourcing under multiple keys (PPCOM) and secure nearest neighbor clustering (SNNC) models, The solution combines hybrid cloud, LSHTD clustering algorithm as outsourced service model. Through experiments, the proposed solution is able is found to reduce the computation cost by 23% and communication cost by 6% and able to provide better clustering accuracy with ARI greater than 4.59% compared to existing works
Data Stream Clustering: A Review
Number of connected devices is steadily increasing and these devices
continuously generate data streams. Real-time processing of data streams is
arousing interest despite many challenges. Clustering is one of the most
suitable methods for real-time data stream processing, because it can be
applied with less prior information about the data and it does not need labeled
instances. However, data stream clustering differs from traditional clustering
in many aspects and it has several challenging issues. Here, we provide
information regarding the concepts and common characteristics of data streams,
such as concept drift, data structures for data streams, time window models and
outlier detection. We comprehensively review recent data stream clustering
algorithms and analyze them in terms of the base clustering technique,
computational complexity and clustering accuracy. A comparison of these
algorithms is given along with still open problems. We indicate popular data
stream repositories and datasets, stream processing tools and platforms. Open
problems about data stream clustering are also discussed.Comment: Has been accepted for publication in Artificial Intelligence Revie