36 research outputs found

    SNAPSKETCH: Graph Representation Approach for Anomaly Detection in Graph Stream

    Get PDF
    A novel unsupervised graph representation approach in a graph stream called SNAPSKETCH for anomaly detection is proposed. It first performs a fixed-length random walk from each node in a network and constructs n-shingles from a walk path. The top discriminative n-shingles identified using a frequency measure are projected into a dimensional projection vector chosen uniformly at random. Finally, a network is sketched into a low-dimensional sketch vector using a simplified hashing of projection vector and the cost of shingles. Using the learned sketch vector, anomaly detection is done using the state-of-the-art anomaly detection approach called RRCF [1]. SNAPSKETCHhas several advantages: Fully unsupervised learning, Constant memory space usage, Entire-graph embedding, and Real-time anomaly detection

    Moving Metric Detection and Alerting System at eBay

    Full text link
    At eBay, there are thousands of product health metrics for different domain teams to monitor. We built a two-phase alerting system to notify users with actionable alerts based on anomaly detection and alert retrieval. In the first phase, we developed an efficient anomaly detection algorithm, called Moving Metric Detector (MMD), to identify potential alerts among metrics with distribution agnostic criteria. In the second alert retrieval phase, we built additional logic with feedbacks to select valid actionable alerts with point-wise ranking model and business rules. Compared with other trend and seasonality decomposition methods, our decomposer is faster and better to detect anomalies in unsupervised cases. Our two-phase approach dramatically improves alert precision and avoids alert spamming in eBay production.Comment: The work is oral presented on the AAAI-20 Workshop on Cloud Intelligence, 202

    SENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis

    Full text link
    In this paper, we propose a novel approach, called SENATUS, for joint traffic anomaly detection and root-cause analysis. Inspired from the concept of a senate, the key idea of the proposed approach is divided into three stages: election, voting and decision. At the election stage, a small number of \nop{traffic flow sets (termed as senator flows)}senator flows are chosen\nop{, which are used} to represent approximately the total (usually huge) set of traffic flows. In the voting stage, anomaly detection is applied on the senator flows and the detected anomalies are correlated to identify the most possible anomalous time bins. Finally in the decision stage, a machine learning technique is applied to the senator flows of each anomalous time bin to find the root cause of the anomalies. We evaluate SENATUS using traffic traces collected from the Pan European network, GEANT, and compare against another approach which detects anomalies using lossless compression of traffic histograms. We show the effectiveness of SENATUS in diagnosing anomaly types: network scans and DoS/DDoS attacks

    Guarantee structural anomaly detection in streaming data using the RRCF model: selection of detector parameters and its stabilization under additive noise conditions

    Get PDF
    A method for stabilizing structural anomaly detection under additive noise conditions as well as an algorithm for formal selection of the parameters of the solver rule in the structural anomaly detector based on the Robust Random Cut Forest (RRCF) method are proposed. In the framework of the developed approach, in order to stabilize the process of structural anomaly detection under the influence of additive noise, it is proposed to feed to the input of the RRCF-detector a data stream which is pre-processed by one of the digital filtering methods. In this case, the decision rule for anomaly detection is strictly formalized and transparently interpreted. The selection of parameters of the RRCF-based anomaly detector stabilized by pre-filtering methods of the input data stream is formalized. The RRCF-detector parameters choice within the proposed scheme guarantees a predetermined upper bound for the false alarm probability when deciding to detect a structural anomaly. This property is rigorously proved and formalized as a theorem. The performance of the stabilized RRCF-detector is investigated numerically. The achieved results confirm the performance of the proposed approach provided that the detection threshold is selected in the way proposed in this paper. An example of practical application of the proposed method is presented. The developed approach is promising for the detection of structural anomalies in conditions of observation additive noise, in a situation where it is important to guarantee an upper bound for the probability of false alarm. In particular, the approach can find application in monitoring technological regimes of liquid pumping in pipeline systems or in systems for detecting pre-failure states of technological equipment

    itsdm: Isolation forest-based presence-only species distribution modelling and explanation in r

    Get PDF
    Multiple statistical algorithms have been used for species distribution modelling (SDM). Due to shortcomings in species occurrence datasets, presence-only methods (such as MaxEnt) have become increasingly widely used. However, sampling bias remains a challenging issue, particularly for density-based approaches. The Isolation Forest (iForest) algorithm is a presence-only method less sensitive to sampling patterns and over-fitting because it fits the model by describing the unsuitable instead of suitable conditions. Here, we present the itsdm package for species distribution modelling with iForest, which provides a workflow wrapper for the algorithms in iForest family and convenient tools for model diagnostic and post-modelling analysis. itsdm allows users to fit and evaluate an iForest SDM using presence-only occurrence data. It also helps the users to understand relationships between species and the living environment using Shapley values, a suggested technique in explainable artificial intelligence (xAI). Additionally, itsdm can make spatial response maps that indicate how species respond to environmental variables across space and detect areas potentially affected by a changing environment. We demonstrated the usage of the itsdm package and compared iForest with other mainstream SDMs using virtual species. The results enlightened that iForest is an advantageous presence-only SDM when the actual distribution range is unclear. © 2023 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society
    corecore