Search CORE

7 research outputs found

Automatically Selecting Parameters for Graph-Based Clustering

Author: Callister Ross
Publication venue: Curtin University
Publication date: 01/01/2020
Field of study

Data streams present a number of challenges, caused by change in stream concepts over time. In this thesis we present a novel method for detection of concept drift within data streams by analysing geometric features of the clustering algorithm, RepStream. Further, we present novel methods for automatically adjusting critical input parameters over time, and generating self-organising nearest-neighbour graphs, improving robustness and decreasing the need to domain-specific knowledge in the face of stream evolution

espace@Curtin

Representative Points and Cluster Attributes Based Incremental Sequence Clustering Algorithm

Author: Ren Jiadong
Wu Di
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 09/02/2018
Field of study

In order to improve the execution time and clustering quality of sequence clustering algorithm in large-scale dynamic dataset, a novel algorithm RPCAISC (Representative Points and Cluster Attributes Based Incremental Sequence Clustering) was presented. In this paper, density factor is defined. The primary representative point that has a density factor less than the prescribed threshold will be deleted directly. New representative points can be reselected from non-representative points. Moreover, the representative points of each cluster are modeled using the K-nearest neighbor method. The definition of the relevant degree (RD) between clusters was also proposed. The RD is computed by comprehensively considering the correlations of objects within a cluster and between different clusters. Then, whether the two clusters need to merge is determined. Additionally, the cluster attributes of the initial clustering are retained with this process. By calculating the matching degree between the incremental sequence and the existing cluster attributes, dynamic sequence clustering can be achieved. The theoretic experimental results and analysis prove that RPCAISC has better correct rate of clustering results and execution efficiency

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms

Author: Carnein Matthias
Trautmann Heike
Publication venue: AIS Electronic Library (AISeL)
Publication date: 14/06/2019
Field of study

Abstract Analyzing data streams has received considerable attention over the past decades due to the widespread usage of sensors, social media and other streaming data sources. A core research area in this field is stream clustering which aims to recognize patterns in an unordered, infinite and evolving stream of observations. Clustering can be a crucial support in decision making, since it aims for an optimized aggregated representation of a continuous data stream over time and allows to identify patterns in large and high-dimensional data. A multitude of algorithms and approaches has been developed that are able to find and maintain clusters over time in the challenging streaming scenario. This survey explores, summarizes and categorizes a total of 51 stream clustering algorithms and identifies core research threads over the past decades. In particular, it identifies categories of algorithms based on distance thresholds, density grids and statistical models as well as algorithms for high dimensional data. Furthermore, it discusses applications scenarios, available software and how to configure stream clustering algorithms. This survey is considerably more extensive than comparable studies, more up-to-date and highlights how concepts are interrelated and have been developed over time

AIS Electronic Library (AISeL)

Data Stream Mining: an Evolutionary Approach

Author: Veloza Suan Angélica
Publication venue
Publication date: 01/01/2013
Field of study

Este trabajo presenta un algoritmo para agrupar flujos de datos, llamado ESCALIER. Este algoritmo es una extensión del algoritmo de agrupamiento evolutivo ECSAGO Evolutionary Clustering with Self Adaptive Genetic Operators. ESCALIER toma el proceso evolutivo propuesto por ECSAGO para encontrar grupos en los flujos de datos, los cuales son definidos por la técnica Sliding Window. Para el mantenimiento y olvido de los grupos detectados a través de la evolución de los datos, ESCALIER incluye un mecanismo de memoria inspirado en la teoría de redes inmunológicas artificiales. Para probar la efectividad del algoritmo, se realizaron experimentos utilizando datos sintéticos simulando un ambiente de flujos de datos, y un conjunto de datos reales.Abstract. This work presents a data stream clustering algorithm called ESCALIER. This algorithm is an extension of the evolutionary clustering ECSAGO - Evolutionary Clustering with Self Adaptive Genetic Operators. ESCALIER takes the advantage of the evolutionary process proposed by ECSAGO to find the clusters in the data streams. They are defined by sliding window technique. To maintain and forget clusters through the evolution of the data, ESCALIER includes a memory mechanism inspired by the artificial immune network theory. To test the performance of the algorithm, experiments using synthetic data, simulating the data stream environment, and a real dataset are carried out.Maestrí

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Nacional De Colombia - Repositorio Institucional UN

Data type proofs using Edinburgh LCF

Author: Monahan Brian Quentin
Publication venue: The University of Edinburgh
Publication date: 01/01/1984
Field of study

Edinburgh Research Archive

Streaming Data Clustering in MOA using the Leader Algorithm

Author: Andrés Merino Jaime
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2015
Field of study

This master thesis presents a novel stream clustering algorithm, called StreamLeader. It presents a way to deliver clustering without the need of resorting to conventional clustering algorithms, like most other algorithms do. We test it, outperforming its state of the art rivals in most of the case

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Using distribution analysis for parameter selection in repstream

Author
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date: 01/01/2019
Field of study

Crossref