7 research outputs found

    ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities

    Full text link
    Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the high-performance rate for dataset where clusters have the constant density of data points. One of the significant attributes of this algorithm is noise cancellation. However, DBSCAN demonstrates reduced performances for clusters with different densities. Therefore, in this paper, an adaptive DBSCAN is proposed which can work significantly well for identifying clusters with varying densities.Comment: To be published in the 4th IEEE International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT 2018

    Density propagation based adaptive multi-density clustering algorithm

    Get PDF
    This research was supported by the Science & Technology Development Foundation of Jilin Province (Grants Nos. 20160101259JC, 20180201045GX), the National Natural Science Foundation of China (Grants No. 61772227) and the Natural Science Foundation of Xinjiang Province (Grants No. 2015211C127). This resarch is also supported by the Engineering and Physical Sciences Research Council (EPSRC) funded project on New Industrial Systems: Manufacturing Immortality (EP/R020957/1).Peer reviewedPublisher PD

    User behaviour identification based on location data

    Get PDF
    Over the years there has been an almost exponential increase in the use of new technologies in various sectors. These technologies have as their main objective, to improve or facilitate our daily life. This study will focus on one of these technologies used within a theme that has been widely talked about over the last few years, the use of personal data of various people to identify certain types of behavior. More specifically, this study aims primarily to use the GPS data stored in the respective Google accounts of nine volunteers in order to identify the places they frequent most, also known as Points of Interest. This same data will also be used to identify the trajectories covered more often by each of the same volunteers. A study was carried out with a sample of 9 participants, sending them their maps with POI and trajectories, thus obtaining their validation. It was thus possible to conclude that the best way to identify POI is to use daily clusters using DBSCAN. In the case of trajectories, the Snap-to-Road method was the one that gave the best results. It was found that it was possible to respond to the initial problem, and thus a method was found that identifies most of the POI successfully and also some trajectories.Based on this work, there is a great opportunity to improve some of the algorithms and processes that have some limitations in the future, and with this in mind it's possible to develop more effective solutions.Ao longo dos anos tem-se verificado um aumento quase exponencial no que toca à utilização de novas tecnologias em vários sectores. Estas tecnologias têm como objetivo principal, melhorar ou facilitar o quotidiano. O presente estudo vai incidir sobre uma destas tecnologias utilizada dentro de um tema que tem sido muito falado nos últimos anos, a utilização de dados pessoais de um grupo de indvíduos para identificar certos tipos de comportamentos. Mais concretamente, tem como objetivo utilizar os dados de GPS, guardados nas respectivas contas Google de nove voluntários, de modo a identificar os locais que estes mais frequentam - Pontos de Interesse. Os dados são utilizados também para identificar as trajectórias percorridas mais vezes por cada um dos voluntários. Foi realizado um estudo com uma amostra de 9 participantes, enviando-lhes os respectivos mapas com POI e trajectórias obtendo assim a validação dos mesmos. Desta forma foi possível concluir que que a melhor forma de identificar POI tem como base a utilização de clusters diários utilizando DBSCAN. Para o caso das trajectórias, o método Snap-to-Road foi o que originou melhores resultados. Verificou-se que foi possível responder ao problema inicial, desta forma, foi encontrado um método que identifica a maior parte dos POI com sucesso, bem como algumas trajetórias. Com base neste trabalho, existe uma oportunidade para futuramente melhorar alguns dos algoritmos e processos que possuem algumas limitações de modo a desenvolver soluções mais eficazes

    The mechanical and algorithmic design of in-field robotic leaf sampling device

    Get PDF
    Leaf samples analysis is a significant tool to acquire the actual nutrition information of crops. After that, farmers can adjust fertilization programs to prevent nutritional problems and improve the yield of crops. The traditional way for leaf sampling is manual, and researchers need to go to the field and use paper hole punchers with a catch-tube to collect leaf samples. The temperature in summer is hot, and some crop like corn is difficult for researchers to walk through, therefore the manual way of leaf sampling is not a good option. In this thesis, an automatic method of leaf sampling is presented to solve the difficulty of leaf sampling. The contributions of this thesis are the following: (1) Build the end effector of leaf sampling device to punch and store leaf samples separately, (2) Train a neural network to detect the leaves with high horizontal level, (3) Combine point cloud data from the depth camera and vison data from the camera via the sensor fusion to get the leaf rolling angle and grasp point. The method in this thesis can produce a consistent leaf rolling angle estimate quantitatively and qualitatively on multiple corn leaves, especially on leaves with multiple different angles.Ope

    Big Data Mining to Construct Truck Tours

    Get PDF
    Cross-Border shipping of goods among different distributors is an essential part of transportation across Canada and U.S. These two countries are heavily dependent on border crossing locations to facilitate international trade between each other. This research considers the identification of the international tours accomplishing the shipping of goods. A truck tour is a round trip where a truck starts its journey from its firm or an industry, performing stops for different purposes that include taking a rest, fuel refilling, and transferring goods to multiple locations, and returns back to its initial firm location. In this thesis, we present a three step method on mining GPS truck data to identify all possible truck tours belonging to different carriers. In the first step, a clustering technique is applied on the stop locations to discover the firm for each carrier. A modified DBSCAN algorithm is proposed to achieve this task by automatically determining the two input parameters based on the data points provided. Various statistical measures like count of unique trucks and count of truck visits are applied on the resulting clusters to identify the firms of the respective carriers. In the second step, we tackle the problem of classifying the stop locations into two types: primary stops, where goods are transferred, and secondary stops like rest stations, where vehicle and driver needs are met. This problem is solved using one of the trade indicator called Specialization Index. Moreover, several set of features are explored to build the classification model to classify the type of stop locations. In the third step, having identified the firm, primary and secondary locations, an automated path finder is developed to identify the truck tours starting from each firm. The results of the specialization index and the feature-based classification in identifying stop events are compared with the entropy index from previous work. Experimental results show that the proposed set of cluster features significantly add classification power to our model giving 98.79% accuracy which in turn helps in discovering accurate tours

    Modeling Vessel Behaviours By Clustering Ais Data Using Optimized DBSCAN

    Get PDF
    Today, maritime transportation represents substantial international trade. Sustainable development of marine transportation requires systematic modeling and surveillance for maritime situational awareness. In this research thesis, we present an enhanced density-based spatial clustering (DBSCAN) method to model vessel behaviors. The proposed methodology enhances the DBSCAN clustering performance by integrating the Mahalanobis Distance metric that considers the correlations of the points representing the locations of the vessels. The clustering method is applied to historical Automatic Identification System (AIS) data and generates an action recommendation tool and a model for detecting vessel trajectory anomalies. Two case studies present outcomes from the openly available Gulf of Mexico AIS data, and Saint Lawrence Seaway and Great Lakes AIS licensed data acquired from ORBCOMM (a maritime AIS data provider). This research proposes a framework for modeling AIS data, an algorithm for generating a clustering model of the vessels' trajectories, and a model for detecting vessel trajectory anomalies such as unexpected stops, deviations from regulated routes, or inconsistent speed. This work's findings demonstrate the applicability and scalability of the proposed method for modeling more water regions, contributing to situational awareness, vessel collision prevention, safe navigation, route planning, and detection of vessel behavior anomalies for auto-vessels development

    Density-based algorithms for active and anytime clustering

    Get PDF
    Data intensive applications like biology, medicine, and neuroscience require effective and efficient data mining technologies. Advanced data acquisition methods produce a constantly increasing volume and complexity. As a consequence, the need of new data mining technologies to deal with complex data has emerged during the last decades. In this thesis, we focus on the data mining task of clustering in which objects are separated in different groups (clusters) such that objects inside a cluster are more similar than objects in different clusters. Particularly, we consider density-based clustering algorithms and their applications in biomedicine. The core idea of the density-based clustering algorithm DBSCAN is that each object within a cluster must have a certain number of other objects inside its neighborhood. Compared with other clustering algorithms, DBSCAN has many attractive benefits, e.g., it can detect clusters with arbitrary shape and is robust to outliers, etc. Thus, DBSCAN has attracted a lot of research interest during the last decades with many extensions and applications. In the first part of this thesis, we aim at developing new algorithms based on the DBSCAN paradigm to deal with the new challenges of complex data, particularly expensive distance measures and incomplete availability of the distance matrix. Like many other clustering algorithms, DBSCAN suffers from poor performance when facing expensive distance measures for complex data. To tackle this problem, we propose a new algorithm based on the DBSCAN paradigm, called Anytime Density-based Clustering (A-DBSCAN), that works in an anytime scheme: in contrast to the original batch scheme of DBSCAN, the algorithm A-DBSCAN first produces a quick approximation of the clustering result and then continuously refines the result during the further run. Experts can interrupt the algorithm, examine the results, and choose between (1) stopping the algorithm at any time whenever they are satisfied with the result to save runtime and (2) continuing the algorithm to achieve better results. Such kind of anytime scheme has been proven in the literature as a very useful technique when dealing with time consuming problems. We also introduced an extended version of A-DBSCAN called A-DBSCAN-XS which is more efficient and effective than A-DBSCAN when dealing with expensive distance measures. Since DBSCAN relies on the cardinality of the neighborhood of objects, it requires the full distance matrix to perform. For complex data, these distances are usually expensive, time consuming or even impossible to acquire due to high cost, high time complexity, noisy and missing data, etc. Motivated by these potential difficulties of acquiring the distances among objects, we propose another approach for DBSCAN, called Active Density-based Clustering (Act-DBSCAN). Given a budget limitation B, Act-DBSCAN is only allowed to use up to B pairwise distances ideally to produce the same result as if it has the entire distance matrix at hand. The general idea of Act-DBSCAN is that it actively selects the most promising pairs of objects to calculate the distances between them and tries to approximate as much as possible the desired clustering result with each distance calculation. This scheme provides an efficient way to reduce the total cost needed to perform the clustering. Thus it limits the potential weakness of DBSCAN when dealing with the distance sparseness problem of complex data. As a fundamental data clustering algorithm, density-based clustering has many applications in diverse fields. In the second part of this thesis, we focus on an application of density-based clustering in neuroscience: the segmentation of the white matter fiber tracts in human brain acquired from Diffusion Tensor Imaging (DTI). We propose a model to evaluate the similarity between two fibers as a combination of structural similarity and connectivity-related similarity of fiber tracts. Various distance measure techniques from fields like time-sequence mining are adapted to calculate the structural similarity of fibers. Density-based clustering is used as the segmentation algorithm. We show how A-DBSCAN and A-DBSCAN-XS are used as novel solutions for the segmentation of massive fiber datasets and provide unique features to assist experts during the fiber segmentation process.Datenintensive Anwendungen wie Biologie, Medizin und Neurowissenschaften erfordern effektive und effiziente Data-Mining-Technologien. Erweiterte Methoden der Datenerfassung erzeugen stetig wachsende Datenmengen und Komplexit\"at. In den letzten Jahrzehnten hat sich daher ein Bedarf an neuen Data-Mining-Technologien f\"ur komplexe Daten ergeben. In dieser Arbeit konzentrieren wir uns auf die Data-Mining-Aufgabe des Clusterings, in der Objekte in verschiedenen Gruppen (Cluster) getrennt werden, so dass Objekte in einem Cluster untereinander viel \"ahnlicher sind als Objekte in verschiedenen Clustern. Insbesondere betrachten wir dichtebasierte Clustering-Algorithmen und ihre Anwendungen in der Biomedizin. Der Kerngedanke des dichtebasierten Clustering-Algorithmus DBSCAN ist, dass jedes Objekt in einem Cluster eine bestimmte Anzahl von anderen Objekten in seiner Nachbarschaft haben muss. Im Vergleich mit anderen Clustering-Algorithmen hat DBSCAN viele attraktive Vorteile, zum Beispiel kann es Cluster mit beliebiger Form erkennen und ist robust gegen\"uber Ausrei{\ss}ern. So hat DBSCAN in den letzten Jahrzehnten gro{\ss}es Forschungsinteresse mit vielen Erweiterungen und Anwendungen auf sich gezogen. Im ersten Teil dieser Arbeit wollen wir auf die Entwicklung neuer Algorithmen eingehen, die auf dem DBSCAN Paradigma basieren, um mit den neuen Herausforderungen der komplexen Daten, insbesondere teurer Abstandsma{\ss}e und unvollst\"andiger Verf\"ugbarkeit der Distanzmatrix umzugehen. Wie viele andere Clustering-Algorithmen leidet DBSCAN an schlechter Per- formanz, wenn es teuren Abstandsma{\ss}en f\"ur komplexe Daten gegen\"uber steht. Um dieses Problem zu l\"osen, schlagen wir einen neuen Algorithmus vor, der auf dem DBSCAN Paradigma basiert, genannt Anytime Density-based Clustering (A-DBSCAN), der mit einem Anytime Schema funktioniert. Im Gegensatz zu dem urspr\"unglichen Schema DBSCAN, erzeugt der Algorithmus A-DBSCAN zuerst eine schnelle Ann\"aherung des Clusterings-Ergebnisses und verfeinert dann kontinuierlich das Ergebnis im weiteren Verlauf. Experten k\"onnen den Algorithmus unterbrechen, die Ergebnisse pr\"ufen und w\"ahlen zwischen (1) Anhalten des Algorithmus zu jeder Zeit, wann immer sie mit dem Ergebnis zufrieden sind, um Laufzeit sparen und (2) Fortsetzen des Algorithmus, um bessere Ergebnisse zu erzielen. Eine solche Art eines "Anytime Schemas" ist in der Literatur als eine sehr n\"utzliche Technik erprobt, wenn zeitaufwendige Problemen anfallen. Wir stellen auch eine erweiterte Version von A-DBSCAN als A-DBSCAN-XS vor, die effizienter und effektiver als A-DBSCAN beim Umgang mit teuren Abstandsma{\ss}en ist. Da DBSCAN auf der Kardinalit\"at der Nachbarschaftsobjekte beruht, ist es notwendig, die volle Distanzmatrix auszurechen. F\"ur komplexe Daten sind diese Distanzen in der Regel teuer, zeitaufwendig oder sogar unm\"oglich zu errechnen, aufgrund der hohen Kosten, einer hohen Zeitkomplexit\"at oder verrauschten und fehlende Daten. Motiviert durch diese m\"oglichen Schwierigkeiten der Berechnung von Entfernungen zwischen Objekten, schlagen wir einen anderen Ansatz f\"ur DBSCAN vor, namentlich Active Density-based Clustering (Act-DBSCAN). Bei einer Budgetbegrenzung B, darf Act-DBSCAN nur bis zu B ideale paarweise Distanzen verwenden, um das gleiche Ergebnis zu produzieren, wie wenn es die gesamte Distanzmatrix zur Hand h\"atte. Die allgemeine Idee von Act-DBSCAN ist, dass es aktiv die erfolgversprechendsten Paare von Objekten w\"ahlt, um die Abst\"ande zwischen ihnen zu berechnen, und versucht, sich so viel wie m\"oglich dem gew\"unschten Clustering mit jeder Abstandsberechnung zu n\"ahern. Dieses Schema bietet eine effiziente M\"oglichkeit, die Gesamtkosten der Durchf\"uhrung des Clusterings zu reduzieren. So schr\"ankt sie die potenzielle Schw\"ache des DBSCAN beim Umgang mit dem Distance Sparseness Problem von komplexen Daten ein. Als fundamentaler Clustering-Algorithmus, hat dichte-basiertes Clustering viele Anwendungen in den unterschiedlichen Bereichen. Im zweiten Teil dieser Arbeit konzentrieren wir uns auf eine Anwendung des dichte-basierten Clusterings in den Neurowissenschaften: Die Segmentierung der wei{\ss}en Substanz bei Faserbahnen im menschlichen Gehirn, die vom Diffusion Tensor Imaging (DTI) erfasst werden. Wir schlagen ein Modell vor, um die \"Ahnlichkeit zwischen zwei Fasern als einer Kombination von struktureller und konnektivit\"atsbezogener \"Ahnlichkeit von Faserbahnen zu beurteilen. Verschiedene Abstandsma{\ss}e aus Bereichen wie dem Time-Sequence Mining werden angepasst, um die strukturelle \"Ahnlichkeit von Fasern zu berechnen. Dichte-basiertes Clustering wird als Segmentierungsalgorithmus verwendet. Wir zeigen, wie A-DBSCAN und A-DBSCAN-XS als neuartige L\"osungen f\"ur die Segmentierung von sehr gro{\ss}en Faserdatens\"atzen verwendet werden, und bieten innovative Funktionen, um Experten w\"ahrend des Fasersegmentierungsprozesses zu unterst\"utzen
    corecore