7 research outputs found

    Automatically Selecting Parameters for Graph-Based Clustering

    Get PDF
    Data streams present a number of challenges, caused by change in stream concepts over time. In this thesis we present a novel method for detection of concept drift within data streams by analysing geometric features of the clustering algorithm, RepStream. Further, we present novel methods for automatically adjusting critical input parameters over time, and generating self-organising nearest-neighbour graphs, improving robustness and decreasing the need to domain-specific knowledge in the face of stream evolution

    Representative Points and Cluster Attributes Based Incremental Sequence Clustering Algorithm

    Get PDF
    In order to improve the execution time and clustering quality of sequence clustering algorithm in large-scale dynamic dataset, a novel algorithm RPCAISC (Representative Points and Cluster Attributes Based Incremental Sequence Clustering) was presented. In this paper, density factor is defined. The primary representative point that has a density factor less than the prescribed threshold will be deleted directly. New representative points can be reselected from non-representative points. Moreover, the representative points of each cluster are modeled using the K-nearest neighbor method. The definition of the relevant degree (RD) between clusters was also proposed. The RD is computed by comprehensively considering the correlations of objects within a cluster and between different clusters. Then, whether the two clusters need to merge is determined. Additionally, the cluster attributes of the initial clustering are retained with this process. By calculating the matching degree between the incremental sequence and the existing cluster attributes, dynamic sequence clustering can be achieved. The theoretic experimental results and analysis prove that RPCAISC has better correct rate of clustering results and execution efficiency

    Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms

    Get PDF
    Abstract Analyzing data streams has received considerable attention over the past decades due to the widespread usage of sensors, social media and other streaming data sources. A core research area in this field is stream clustering which aims to recognize patterns in an unordered, infinite and evolving stream of observations. Clustering can be a crucial support in decision making, since it aims for an optimized aggregated representation of a continuous data stream over time and allows to identify patterns in large and high-dimensional data. A multitude of algorithms and approaches has been developed that are able to find and maintain clusters over time in the challenging streaming scenario. This survey explores, summarizes and categorizes a total of 51 stream clustering algorithms and identifies core research threads over the past decades. In particular, it identifies categories of algorithms based on distance thresholds, density grids and statistical models as well as algorithms for high dimensional data. Furthermore, it discusses applications scenarios, available software and how to configure stream clustering algorithms. This survey is considerably more extensive than comparable studies, more up-to-date and highlights how concepts are interrelated and have been developed over time

    Data Stream Mining: an Evolutionary Approach

    Get PDF
    Este trabajo presenta un algoritmo para agrupar flujos de datos, llamado ESCALIER. Este algoritmo es una extensión del algoritmo de agrupamiento evolutivo ECSAGO Evolutionary Clustering with Self Adaptive Genetic Operators. ESCALIER toma el proceso evolutivo propuesto por ECSAGO para encontrar grupos en los flujos de datos, los cuales son definidos por la técnica Sliding Window. Para el mantenimiento y olvido de los grupos detectados a través de la evolución de los datos, ESCALIER incluye un mecanismo de memoria inspirado en la teoría de redes inmunológicas artificiales. Para probar la efectividad del algoritmo, se realizaron experimentos utilizando datos sintéticos simulando un ambiente de flujos de datos, y un conjunto de datos reales.Abstract. This work presents a data stream clustering algorithm called ESCALIER. This algorithm is an extension of the evolutionary clustering ECSAGO - Evolutionary Clustering with Self Adaptive Genetic Operators. ESCALIER takes the advantage of the evolutionary process proposed by ECSAGO to find the clusters in the data streams. They are defined by sliding window technique. To maintain and forget clusters through the evolution of the data, ESCALIER includes a memory mechanism inspired by the artificial immune network theory. To test the performance of the algorithm, experiments using synthetic data, simulating the data stream environment, and a real dataset are carried out.Maestrí

    Data type proofs using Edinburgh LCF

    Get PDF

    Streaming Data Clustering in MOA using the Leader Algorithm

    Get PDF
    This master thesis presents a novel stream clustering algorithm, called StreamLeader. It presents a way to deliver clustering without the need of resorting to conventional clustering algorithms, like most other algorithms do. We test it, outperforming its state of the art rivals in most of the case

    Using distribution analysis for parameter selection in repstream

    No full text
    corecore