13 research outputs found

    Faster DBScan and HDBScan in Low-Dimensional Euclidean Spaces

    Get PDF
    We present a new algorithm for the widely used density-based clustering method DBScan. Our algorithm computes the DBScan-clustering in O(n log n) time in R^2, irrespective of the scale parameter eps, but assuming the second parameter MinPts is set to a fixed constant, as is the case in practice. We also present an O(n log n) randomized algorithm for HDBScan in the plane---HDBScans is a hierarchical version of DBScan introduced recently---and we show how to compute an approximate version of HDBScan in near-linear time in any fixed dimension

    AUTOMATIC EEG CLASSIFICATION USING DENSITY BASED ALGORITHMS DBSCAN AND DENCLUE

    Get PDF
    Electroencephalograph (EEG) is a commonly used method in neurological practice. Automatic classifiers (algorithms) highlight signal sections with interesting activity and assist an expert with record scoring. Algorithm K-means is one of the most commonly used methods for EEG inspection. In this paper, we propose/apply a method based on density-oriented algorithms DBSCAN and DENCLUE. DBSCAN and DENCLUE separate the nested clusters against K-means. All three algorithms were validated on a testing dataset and after that adapted for a real EEG records classification. 24 dimensions EEG feature space were classified into 5 classes (physiological, epileptic, EOG, electrode, and EMG artefact). Modified DBSCAN and DENCLUE create more than two homogeneous classes of the epileptic EEG data. The results offer an opportunity for the EEG scoring in clinical practice. The big advantage of the proposed algorithms is the high homogeneity of the epileptic class

    A Self-Adjusting Approach to Identify Hotspots

    Get PDF
    Hotspot identification or detection has been widely used in many fields; however the traditional grid-based approaches may incur some problems when dealing with point database. This article expands on three types of mismatch problems in grid-based approach and suggests a point-based approach may be more suitable. Inspired by the DBSCAN algorithm, a self-adjusting approach is then proposed for hotspot detection which overcomes the weakness of parameter sensitivity shared by most clustering approaches. Finally, the data of commercial points of interest of a city is used for demonstration

    A Fast Clustering Algorithm based on pruning unnecessary distance computations in DBSCAN for High-Dimensional Data

    Get PDF
    Clustering is an important technique to deal with large scale data which are explosively created in internet. Most data are high-dimensional with a lot of noise, which brings great challenges to retrieval, classification and understanding. No current existing approach is “optimal” for large scale data. For example, DBSCAN requires O(n2) time, Fast-DBSCAN only works well in 2 dimensions, and ρ-Approximate DBSCAN runs in O(n) expected time which needs dimension D to be a relative small constant for the linear running time to hold. However, we prove theoretically and experimentally that ρ-Approximate DBSCAN degenerates to an O(n2) algorithm in very high dimension such that 2D >  > n. In this paper, we propose a novel local neighborhood searching technique, and apply it to improve DBSCAN, named as NQ-DBSCAN, such that a large number of unnecessary distance computations can be effectively reduced. Theoretical analysis and experimental results show that NQ-DBSCAN averagely runs in O(n*log(n)) with the help of indexing technique, and the best case is O(n) if proper parameters are used, which makes it suitable for many realtime data

    GriT-DBSCAN: A Spatial Clustering Algorithm for Very Large Databases

    Full text link
    DBSCAN is a fundamental spatial clustering algorithm with numerous practical applications. However, a bottleneck of the algorithm is in the worst case, the run time complexity is O(n2)O(n^2). To address this limitation, we propose a new grid-based algorithm for exact DBSCAN in Euclidean space called GriT-DBSCAN, which is based on the following two techniques. First, we introduce a grid tree to organize the non-empty grids for the purpose of efficient non-empty neighboring grids queries. Second, by utilising the spatial relationships among points, we propose a technique that iteratively prunes unnecessary distance calculations when determining whether the minimum distance between two sets is less than or equal to a certain threshold. We theoretically prove that the complexity of GriT-DBSCAN is linear to the data set size. In addition, we obtain two variants of GriT-DBSCAN by incorporating heuristics, or by combining the second technique with an existing algorithm. Experiments are conducted on both synthetic and real-world data sets to evaluate the efficiency of GriT-DBSCAN and its variants. The results of our analyses show that our algorithms outperform existing algorithms

    Big data clustering with varied density based on MapReduce

    Get PDF
    The DBSCAN algorithm is a prevalent method of density-based clustering algorithms, the most important feature of which is the ability to detect arbitrary shapes and varied clusters and noise data. Nevertheless, this algorithm faces a number of challenges, including failure to find clusters of varied densities. On the other hand, with the rapid development of the information age, plenty of data are produced every day, such that a single machine alone cannot process this volume of data; hence, new technologies are required to store and extract information from this volume of data. A large volume of data that is beyond the capabilities of existing software is called Big data. In this paper, we have attempted to introduce a new algorithm for clustering big data with varied density using a Hadoop platform running MapReduce. The main idea of this research is the use of local density to find each point’s density. This strategy can avoid the situation of connecting clusters with varying densities. The proposed algorithm is implemented and compared with other algorithms using the MapReduce paradigm and shows the best varying density clustering capability and scalability

    Theoretically-Efficient and Practical Parallel DBSCAN

    Full text link
    The DBSCAN method for spatial clustering has received significant attention due to its applicability in a variety of data analysis tasks. There are fast sequential algorithms for DBSCAN in Euclidean space that take O(nlogn)O(n\log n) work for two dimensions, sub-quadratic work for three or more dimensions, and can be computed approximately in linear work for any constant number of dimensions. However, existing parallel DBSCAN algorithms require quadratic work in the worst case, making them inefficient for large datasets. This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth). We present implementations of our algorithms along with optimizations that improve their practical performance. We perform a comprehensive experimental evaluation of our algorithms on a variety of datasets and parameter settings. Our experiments on a 36-core machine with hyper-threading show that we outperform existing parallel DBSCAN implementations by up to several orders of magnitude, and achieve speedups by up to 33x over the best sequential algorithms

    The mechanical and algorithmic design of in-field robotic leaf sampling device

    Get PDF
    Leaf samples analysis is a significant tool to acquire the actual nutrition information of crops. After that, farmers can adjust fertilization programs to prevent nutritional problems and improve the yield of crops. The traditional way for leaf sampling is manual, and researchers need to go to the field and use paper hole punchers with a catch-tube to collect leaf samples. The temperature in summer is hot, and some crop like corn is difficult for researchers to walk through, therefore the manual way of leaf sampling is not a good option. In this thesis, an automatic method of leaf sampling is presented to solve the difficulty of leaf sampling. The contributions of this thesis are the following: (1) Build the end effector of leaf sampling device to punch and store leaf samples separately, (2) Train a neural network to detect the leaves with high horizontal level, (3) Combine point cloud data from the depth camera and vison data from the camera via the sensor fusion to get the leaf rolling angle and grasp point. The method in this thesis can produce a consistent leaf rolling angle estimate quantitatively and qualitatively on multiple corn leaves, especially on leaves with multiple different angles.Ope
    corecore