2 research outputs found

    Automatically Selecting Parameters for Graph-Based Clustering

    Get PDF
    Data streams present a number of challenges, caused by change in stream concepts over time. In this thesis we present a novel method for detection of concept drift within data streams by analysing geometric features of the clustering algorithm, RepStream. Further, we present novel methods for automatically adjusting critical input parameters over time, and generating self-organising nearest-neighbour graphs, improving robustness and decreasing the need to domain-specific knowledge in the face of stream evolution

    Graph-based clustering with DRepStream

    No full text
    © 2017 ACM. Finding and setting input parameters for clustering algorithms is a challenging thing due to the unsupervised nature of clustering. The accuracy of clustering algorithms can be affected greatly by setting parameters appropriately for the dataset, however without ground truth labels and external validation it can be impossible to know when the parameters are set well. In this paper we propose the DRepStream algorithm, which extends the RepStream algorithm. DRepStream uses a graph-based approach, and unlike its predecessor does not require the primary K parameter used in K-nearest neighbour graphs. Our algorithm automatically computes the number of outgoing edges for each vertex in the graph using a computed metric known as the anomalous edge score. We evaluate the performance of our algorithm on other previous stream clustering algorithms on real world benchmark datasets
    corecore