5 research outputs found
Theoretically-Efficient and Practical Parallel DBSCAN
The DBSCAN method for spatial clustering has received significant attention
due to its applicability in a variety of data analysis tasks. There are fast
sequential algorithms for DBSCAN in Euclidean space that take work
for two dimensions, sub-quadratic work for three or more dimensions, and can be
computed approximately in linear work for any constant number of dimensions.
However, existing parallel DBSCAN algorithms require quadratic work in the
worst case, making them inefficient for large datasets. This paper bridges the
gap between theory and practice of parallel DBSCAN by presenting new parallel
algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the
work bounds of their sequential counterparts, and are highly parallel
(polylogarithmic depth). We present implementations of our algorithms along
with optimizations that improve their practical performance. We perform a
comprehensive experimental evaluation of our algorithms on a variety of
datasets and parameter settings. Our experiments on a 36-core machine with
hyper-threading show that we outperform existing parallel DBSCAN
implementations by up to several orders of magnitude, and achieve speedups by
up to 33x over the best sequential algorithms
IoT and Machine Learning Based Anomaly Detection in WSN for a Smart Greenhouse
Agriculture is the most crucial sector which raises the economy of every
country; several techniques have been developed to control and monitor the
environment in which a particular crop is growing. Famers need efficient
support in terms of monitoring the temperature, the humidity, the water
supply etc. However, the measurements provided by a wireless sensor network
within a smart greenhouse are an essential aspect to take into consideration
when it comes to evaluating the performance of sensor nodes used for
controlling and monitoring the climatic condition (temperature, humidity,
water supply, etc.). Therefore, this paper proposes a machine learning-based
anomaly detection approach with the help of the DBSCAN algorithm of
clustering to determine whether an unusual event has been found in the data.
This approach allows farmers to ensure the reliability of the network. In this
paper, we presented the description of the DBSCAN algorithm; we used an
existing dataset that incorporates information about rose cultivation. With the
used dataset, we introduced some noise, and we used MATLAB and Python to
analyse and predict whether the introduced data is noise or not with DBSCAN.
The performance of the algorithm after performing the prediction is 100% for
two chosen features of the dataset and 75.4% for five features of the dataset
in terms of precision
ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms
Connected components is a fundamental kernel in graph applications due to its
usefulness in measuring how well-connected a graph is, as well as its use as
subroutines in many other graph algorithms. The fastest existing parallel
multicore algorithms for connectivity are based on some form of edge sampling
and/or linking and compressing trees. However, many combinations of these
design choices have been left unexplored. In this paper, we design the
ConnectIt framework, which provides different sampling strategies as well as
various tree linking and compression schemes. ConnectIt enables us to obtain
several hundred new variants of connectivity algorithms, most of which extend
to computing spanning forest. In addition to static graphs, we also extend
ConnectIt to support mixes of insertions and connectivity queries in the
concurrent setting.
We present an experimental evaluation of ConnectIt on a 72-core machine,
which we believe is the most comprehensive evaluation of parallel connectivity
algorithms to date. Compared to a collection of state-of-the-art static
multicore algorithms, we obtain an average speedup of 37.4x (2.36x average
speedup over the fastest existing implementation for each graph). Using
ConnectIt, we are able to compute connectivity on the largest
publicly-available graph (with over 3.5 billion vertices and 128 billion edges)
in under 10 seconds using a 72-core machine, providing a 3.1x speedup over the
fastest existing connectivity result for this graph, in any computational
setting. For our incremental algorithms, we show that our algorithms can ingest
graph updates at up to several billion edges per second. Finally, to guide the
user in selecting the best variants in ConnectIt for different situations, we
provide a detailed analysis of the different strategies in terms of their work
and locality