36 research outputs found
SNAPSKETCH: Graph Representation Approach for Anomaly Detection in Graph Stream
A novel unsupervised graph representation approach in a graph stream called SNAPSKETCH for anomaly detection is proposed. It first performs a fixed-length random walk from each node in a network and constructs n-shingles from a walk path. The top discriminative n-shingles identified using a frequency measure are projected into a dimensional projection vector chosen uniformly at random. Finally, a network is sketched into a low-dimensional sketch vector using a simplified hashing of projection vector and the cost of shingles. Using the learned sketch vector, anomaly detection is done using the state-of-the-art anomaly detection approach called RRCF [1]. SNAPSKETCHhas several advantages: Fully unsupervised learning, Constant memory space usage, Entire-graph embedding, and Real-time anomaly detection
Moving Metric Detection and Alerting System at eBay
At eBay, there are thousands of product health metrics for different domain
teams to monitor. We built a two-phase alerting system to notify users with
actionable alerts based on anomaly detection and alert retrieval. In the first
phase, we developed an efficient anomaly detection algorithm, called Moving
Metric Detector (MMD), to identify potential alerts among metrics with
distribution agnostic criteria. In the second alert retrieval phase, we built
additional logic with feedbacks to select valid actionable alerts with
point-wise ranking model and business rules. Compared with other trend and
seasonality decomposition methods, our decomposer is faster and better to
detect anomalies in unsupervised cases. Our two-phase approach dramatically
improves alert precision and avoids alert spamming in eBay production.Comment: The work is oral presented on the AAAI-20 Workshop on Cloud
Intelligence, 202
SENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis
In this paper, we propose a novel approach, called SENATUS, for joint traffic
anomaly detection and root-cause analysis. Inspired from the concept of a
senate, the key idea of the proposed approach is divided into three stages:
election, voting and decision. At the election stage, a small number of
\nop{traffic flow sets (termed as senator flows)}senator flows are chosen\nop{,
which are used} to represent approximately the total (usually huge) set of
traffic flows. In the voting stage, anomaly detection is applied on the senator
flows and the detected anomalies are correlated to identify the most possible
anomalous time bins. Finally in the decision stage, a machine learning
technique is applied to the senator flows of each anomalous time bin to find
the root cause of the anomalies. We evaluate SENATUS using traffic traces
collected from the Pan European network, GEANT, and compare against another
approach which detects anomalies using lossless compression of traffic
histograms. We show the effectiveness of SENATUS in diagnosing anomaly types:
network scans and DoS/DDoS attacks
Guarantee structural anomaly detection in streaming data using the RRCF model: selection of detector parameters and its stabilization under additive noise conditions
A method for stabilizing structural anomaly detection under additive noise conditions as well as an algorithm for formal
selection of the parameters of the solver rule in the structural anomaly detector based on the Robust Random Cut Forest (RRCF) method are proposed. In the framework of the developed approach, in order to stabilize the process of structural
anomaly detection under the influence of additive noise, it is proposed to feed to the input of the RRCF-detector a data
stream which is pre-processed by one of the digital filtering methods. In this case, the decision rule for anomaly detection
is strictly formalized and transparently interpreted. The selection of parameters of the RRCF-based anomaly detector
stabilized by pre-filtering methods of the input data stream is formalized. The RRCF-detector parameters choice within
the proposed scheme guarantees a predetermined upper bound for the false alarm probability when deciding to detect a
structural anomaly. This property is rigorously proved and formalized as a theorem. The performance of the stabilized
RRCF-detector is investigated numerically. The achieved results confirm the performance of the proposed approach
provided that the detection threshold is selected in the way proposed in this paper. An example of practical application
of the proposed method is presented. The developed approach is promising for the detection of structural anomalies
in conditions of observation additive noise, in a situation where it is important to guarantee an upper bound for the
probability of false alarm. In particular, the approach can find application in monitoring technological regimes of liquid
pumping in pipeline systems or in systems for detecting pre-failure states of technological equipment
itsdm: Isolation forest-based presence-only species distribution modelling and explanation in r
Multiple statistical algorithms have been used for species distribution modelling (SDM). Due to shortcomings in species occurrence datasets, presence-only methods (such as MaxEnt) have become increasingly widely used. However, sampling bias remains a challenging issue, particularly for density-based approaches. The Isolation Forest (iForest) algorithm is a presence-only method less sensitive to sampling patterns and over-fitting because it fits the model by describing the unsuitable instead of suitable conditions. Here, we present the itsdm package for species distribution modelling with iForest, which provides a workflow wrapper for the algorithms in iForest family and convenient tools for model diagnostic and post-modelling analysis. itsdm allows users to fit and evaluate an iForest SDM using presence-only occurrence data. It also helps the users to understand relationships between species and the living environment using Shapley values, a suggested technique in explainable artificial intelligence (xAI). Additionally, itsdm can make spatial response maps that indicate how species respond to environmental variables across space and detect areas potentially affected by a changing environment. We demonstrated the usage of the itsdm package and compared iForest with other mainstream SDMs using virtual species. The results enlightened that iForest is an advantageous presence-only SDM when the actual distribution range is unclear. © 2023 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society