5 research outputs found

    Theoretically-Efficient and Practical Parallel DBSCAN

    Full text link
    The DBSCAN method for spatial clustering has received significant attention due to its applicability in a variety of data analysis tasks. There are fast sequential algorithms for DBSCAN in Euclidean space that take O(nlogn)O(n\log n) work for two dimensions, sub-quadratic work for three or more dimensions, and can be computed approximately in linear work for any constant number of dimensions. However, existing parallel DBSCAN algorithms require quadratic work in the worst case, making them inefficient for large datasets. This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth). We present implementations of our algorithms along with optimizations that improve their practical performance. We perform a comprehensive experimental evaluation of our algorithms on a variety of datasets and parameter settings. Our experiments on a 36-core machine with hyper-threading show that we outperform existing parallel DBSCAN implementations by up to several orders of magnitude, and achieve speedups by up to 33x over the best sequential algorithms

    IoT and Machine Learning Based Anomaly Detection in WSN for a Smart Greenhouse

    Get PDF
    Agriculture is the most crucial sector which raises the economy of every country; several techniques have been developed to control and monitor the environment in which a particular crop is growing. Famers need efficient support in terms of monitoring the temperature, the humidity, the water supply etc. However, the measurements provided by a wireless sensor network within a smart greenhouse are an essential aspect to take into consideration when it comes to evaluating the performance of sensor nodes used for controlling and monitoring the climatic condition (temperature, humidity, water supply, etc.). Therefore, this paper proposes a machine learning-based anomaly detection approach with the help of the DBSCAN algorithm of clustering to determine whether an unusual event has been found in the data. This approach allows farmers to ensure the reliability of the network. In this paper, we presented the description of the DBSCAN algorithm; we used an existing dataset that incorporates information about rose cultivation. With the used dataset, we introduced some noise, and we used MATLAB and Python to analyse and predict whether the introduced data is noise or not with DBSCAN. The performance of the algorithm after performing the prediction is 100% for two chosen features of the dataset and 75.4% for five features of the dataset in terms of precision

    ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms

    Full text link
    Connected components is a fundamental kernel in graph applications due to its usefulness in measuring how well-connected a graph is, as well as its use as subroutines in many other graph algorithms. The fastest existing parallel multicore algorithms for connectivity are based on some form of edge sampling and/or linking and compressing trees. However, many combinations of these design choices have been left unexplored. In this paper, we design the ConnectIt framework, which provides different sampling strategies as well as various tree linking and compression schemes. ConnectIt enables us to obtain several hundred new variants of connectivity algorithms, most of which extend to computing spanning forest. In addition to static graphs, we also extend ConnectIt to support mixes of insertions and connectivity queries in the concurrent setting. We present an experimental evaluation of ConnectIt on a 72-core machine, which we believe is the most comprehensive evaluation of parallel connectivity algorithms to date. Compared to a collection of state-of-the-art static multicore algorithms, we obtain an average speedup of 37.4x (2.36x average speedup over the fastest existing implementation for each graph). Using ConnectIt, we are able to compute connectivity on the largest publicly-available graph (with over 3.5 billion vertices and 128 billion edges) in under 10 seconds using a 72-core machine, providing a 3.1x speedup over the fastest existing connectivity result for this graph, in any computational setting. For our incremental algorithms, we show that our algorithms can ingest graph updates at up to several billion edges per second. Finally, to guide the user in selecting the best variants in ConnectIt for different situations, we provide a detailed analysis of the different strategies in terms of their work and locality
    corecore