12,774 research outputs found
OutlierDetection.jl: A modular outlier detection ecosystem for the Julia programming language
OutlierDetection.jl is an open-source ecosystem for outlier detection in
Julia. It provides a range of high-performance outlier detection algorithms
implemented directly in Julia. In contrast to previous packages, our ecosystem
enables the development highly-scalable outlier detection algorithms using a
high-level programming language. Additionally, it provides a standardized, yet
flexible, interface for future outlier detection algorithms and allows for
model composition unseen in previous packages. Best practices such as unit
testing, continuous integration, and code coverage reporting are enforced
across the ecosystem. The most recent version of OutlierDetection.jl is
available at https://github.com/OutlierDetectionJL/OutlierDetection.jl.Comment: 5 pages, 5 figure
In-Network Outlier Detection in Wireless Sensor Networks
To address the problem of unsupervised outlier detection in wireless sensor
networks, we develop an approach that (1) is flexible with respect to the
outlier definition, (2) computes the result in-network to reduce both bandwidth
and energy usage,(3) only uses single hop communication thus permitting very
simple node failure detection and message reliability assurance mechanisms
(e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data.
We examine performance using simulation with real sensor data streams. Our
results demonstrate that our approach is accurate and imposes a reasonable
communication load and level of power consumption.Comment: Extended version of a paper appearing in the Int'l Conference on
Distributed Computing Systems 200
Efficient Distributed Outlier Detection in Data Streams
Anomaly detection is one of the major data mining tasks in modern applications. An element that shows significant deviation from the "usual" behavior is marked as an outlier. This means that this element either corresponds to noise or it requires more careful examination because it may be important. Also, many clustering algorithms are very sensitive to outliers. In any case, outliers must be identified and explored further, meaning that efficient outlier mining techniques are required. In this paper, we focus on distributed density-based outlier detection over multi-dimensional data streams. In particular, we focus on the approximation method for computing the Local Correlation Integral (LOCI) of multi-dimensional points. Each object p is assigned a score score(p) which represents the outlier score of p. Thus, one can select the top-k elements from the dataset that have the highest outlier scores. Our proposal has been implemented in Apache Spark using Scala and experiments have been conducted in a physical cluster running Apache Hadoop 2.7 and Apache Spark 2.4.0. Performance evaluation results demonstrate that the proposed algorithm is efficient and scalable and therefore it can be used to mine outliers in large distributed datasets
- …