12,774 research outputs found

    OutlierDetection.jl: A modular outlier detection ecosystem for the Julia programming language

    Full text link
    OutlierDetection.jl is an open-source ecosystem for outlier detection in Julia. It provides a range of high-performance outlier detection algorithms implemented directly in Julia. In contrast to previous packages, our ecosystem enables the development highly-scalable outlier detection algorithms using a high-level programming language. Additionally, it provides a standardized, yet flexible, interface for future outlier detection algorithms and allows for model composition unseen in previous packages. Best practices such as unit testing, continuous integration, and code coverage reporting are enforced across the ecosystem. The most recent version of OutlierDetection.jl is available at https://github.com/OutlierDetectionJL/OutlierDetection.jl.Comment: 5 pages, 5 figure

    In-Network Outlier Detection in Wireless Sensor Networks

    Full text link
    To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth and energy usage,(3) only uses single hop communication thus permitting very simple node failure detection and message reliability assurance mechanisms (e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data. We examine performance using simulation with real sensor data streams. Our results demonstrate that our approach is accurate and imposes a reasonable communication load and level of power consumption.Comment: Extended version of a paper appearing in the Int'l Conference on Distributed Computing Systems 200

    Efficient Distributed Outlier Detection in Data Streams

    Get PDF
    Anomaly detection is one of the major data mining tasks in modern applications. An element that shows significant deviation from the "usual" behavior is marked as an outlier. This means that this element either corresponds to noise or it requires more careful examination because it may be important. Also, many clustering algorithms are very sensitive to outliers. In any case, outliers must be identified and explored further, meaning that efficient outlier mining techniques are required. In this paper, we focus on distributed density-based outlier detection over multi-dimensional data streams. In particular, we focus on the approximation method for computing the Local Correlation Integral (LOCI) of multi-dimensional points. Each object p is assigned a score score(p) which represents the outlier score of p. Thus, one can select the top-k elements from the dataset that have the highest outlier scores. Our proposal has been implemented in Apache Spark using Scala and experiments have been conducted in a physical cluster running Apache Hadoop 2.7 and Apache Spark 2.4.0. Performance evaluation results demonstrate that the proposed algorithm is efficient and scalable and therefore it can be used to mine outliers in large distributed datasets
    • …
    corecore