12 research outputs found

    Fully online clustering of evolving data streams into arbitrarily shaped clusters

    Get PDF
    In recent times there has been an increase in data availability in continuous data streams and clustering of this data has many advantages in data analysis. It is often the case that these data streams are not stationary, but evolve over time, and also that the clusters are not regular shapes but form arbitrary shapes in the data space. Previous techniques for clustering such data streams are either hybrid online / offline methods, windowed offline methods, or find only hyper-elliptical clusters. In this paper we present a fully online technique for clustering evolving data streams into arbitrary shaped clusters. It is a two stage technique that is accurate, robust to noise, computationally and memory efficient, with a low time penalty as the number of data dimensions increases. The first stage of the technique produces micro-clusters and the second stage combines these micro- clusters into macro-clusters. Dimensional stability and high speed is achieved through keeping the calculations both simple and minimal using hyper-spherical micro-clusters. By maintaining a graph structure, where the micro-clusters are the nodes and the edges are its pairs with intersecting micro-clusters, we minimise the calculations required for macro-cluster maintenance. The micro- clusters themselves are described in such a way that there is no calculation required for the core and shell regions and no separate definition of outer micro-clusters necessary. We demonstrate the ability of the proposed technique to join and separate macro-clusters as they evolve in a fully online manner. There are no other fully online techniques that the authors are aware of and so we compare the tech- nique with popular online / offline hybrid alternatives for accuracy, purity and speed. The technique is then applied to real atmospheric science data streams and used to discover short term, long term and seasonal drift and the effects on anomaly detection. As well as having favourable computational characteristics, the technique can add analytic value over hyper-elliptical methods by character- ising the cluster hyper-shape using Euclidean or fractal shape factors. Because the technique records macro-clusters as graphs, further analytic value accrues from characterising the order, degree, and completeness of the cluster-graphs as they evolve over time

    An approach based on tunicate swarm algorithm to solve partitional clustering problem

    Get PDF
    The tunicate swarm algorithm (TSA) is a newly proposed population-based swarm optimizer for solving global optimization problems. TSA uses best solution in the population in order improve the intensification and diversification of the tunicates. Thus, the possibility of finding a better position for search agents has increased. The aim of the clustering algorithms is to distributed the data instances into some groups according to similar and dissimilar features of instances. Therefore, with a proper clustering algorithm the dataset will be separated to some groups and it’s expected that the similarities of groups will be minimum. In this work, firstly, an approach based on TSA has proposed for solving partitional clustering problem. Then, the TSA is implemented on ten different clustering problems taken from UCI Machine Learning Repository, and the clustering performance of the TSA is compared with the performances of the three well known clustering algorithms such as fuzzy c-means, k-means and k-medoids. The experimental results and comparisons show that the TSA based approach is highly competitive and robust optimizer for solving the partitional clustering problems

    An Improved Differential Evolution Algorithm for Data Stream Clustering

    Get PDF
    A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%

    Data stream mining techniques: a review

    Get PDF
    A plethora of infinite data is generated from the Internet and other information sources. Analyzing this massive data in real-time and extracting valuable knowledge using different mining applications platforms have been an area for research and industry as well. However, data stream mining has different challenges making it different from traditional data mining. Recently, many studies have addressed the concerns on massive data mining problems and proposed several techniques that produce impressive results. In this paper, we review real time clustering and classification mining techniques for data stream. We analyze the characteristics of data stream mining and discuss the challenges and research issues of data steam mining. Finally, we present some of the platforms for data stream mining

    Non-intrusive load monitoring techniques for the disaggregation of ON/OFF appliances

    Get PDF
    Nowadays, Non-Intrusive Load Monitoring techniques are sufficiently accurate to provide valuable insights to the end-users and improve their electricity behaviours. Indeed, previous works show that commonly used appliances (fridge, dishwasher, washing machine) can be easily disaggregated thanks to their abundance of electrical features. Nevertheless, there are still many ON/OFF devices (e.g. heaters, kettles, air conditioners, hair dryers) that present very poor power signatures, preventing their disaggregation with traditional algorithms. In this work, we propose a new online clustering method exploiting both operational features (peak power, duration) and external features (time of use, day of week, weekday/weekend) in order to recognize ON/OFF devices. The proposed algorithm is intended to support an existing disaggregation algorithm that is already able to classify at least 80% of the total energy consumption of the house. Thanks to our approach, we improved the performance of our existing disaggreation algorithm from 80% to 87% of the total energy consumption in the monitored houses. In particular, we found that 85% of the clusters were identified by only using operational features, while external features allowed us to identify the remaining 15% of the clusters. The algorithm needs to collect on average less than 40 operations to find a cluster, which demonstrates its applicability in the real world
    corecore