64 research outputs found
SNAPSKETCH: Graph Representation Approach for Anomaly Detection in Graph Stream
A novel unsupervised graph representation approach in a graph stream called SNAPSKETCH for anomaly detection is proposed. It first performs a fixed-length random walk from each node in a network and constructs n-shingles from a walk path. The top discriminative n-shingles identified using a frequency measure are projected into a dimensional projection vector chosen uniformly at random. Finally, a network is sketched into a low-dimensional sketch vector using a simplified hashing of projection vector and the cost of shingles. Using the learned sketch vector, anomaly detection is done using the state-of-the-art anomaly detection approach called RRCF [1]. SNAPSKETCHhas several advantages: Fully unsupervised learning, Constant memory space usage, Entire-graph embedding, and Real-time anomaly detection
SENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis
In this paper, we propose a novel approach, called SENATUS, for joint traffic
anomaly detection and root-cause analysis. Inspired from the concept of a
senate, the key idea of the proposed approach is divided into three stages:
election, voting and decision. At the election stage, a small number of
\nop{traffic flow sets (termed as senator flows)}senator flows are chosen\nop{,
which are used} to represent approximately the total (usually huge) set of
traffic flows. In the voting stage, anomaly detection is applied on the senator
flows and the detected anomalies are correlated to identify the most possible
anomalous time bins. Finally in the decision stage, a machine learning
technique is applied to the senator flows of each anomalous time bin to find
the root cause of the anomalies. We evaluate SENATUS using traffic traces
collected from the Pan European network, GEANT, and compare against another
approach which detects anomalies using lossless compression of traffic
histograms. We show the effectiveness of SENATUS in diagnosing anomaly types:
network scans and DoS/DDoS attacks
Precision and Recall for Range-Based Anomaly Detection
Classical anomaly detection is principally concerned with point- based anomalies, anomalies that occur at a single data point. In this paper, we present a new mathematical model to express range- based anomalies, anomalies that occur over a range (or period) of time
Beyond Individual Input for Deep Anomaly Detection on Tabular Data
Anomaly detection is crucial in various domains, such as finance, healthcare,
and cybersecurity. In this paper, we propose a novel deep anomaly detection
method for tabular data that leverages Non-Parametric Transformers (NPTs), a
model initially proposed for supervised tasks, to capture both feature-feature
and sample-sample dependencies. In a reconstruction-based framework, we train
the NPT model to reconstruct masked features of normal samples. We use the
model's ability to reconstruct the masked features during inference to generate
an anomaly score. To the best of our knowledge, our proposed method is the
first to combine both feature-feature and sample-sample dependencies for
anomaly detection on tabular datasets. We evaluate our method on an extensive
benchmark of tabular datasets and demonstrate that our approach outperforms
existing state-of-the-art methods based on both the F1-Score and AUROC.
Moreover, our work opens up new research directions for exploring the potential
of NPTs for other tasks on tabular data
Finding Skewed Subcubes Under a Distribution
Say that we are given samples from a distribution ? over an n-dimensional space. We expect or desire ? to behave like a product distribution (or a k-wise independent distribution over its marginals for small k). We propose the problem of enumerating/list-decoding all large subcubes where the distribution ? deviates markedly from what we expect; we refer to such subcubes as skewed subcubes. Skewed subcubes are certificates of dependencies between small subsets of variables in ?. We motivate this problem by showing that it arises naturally in the context of algorithmic fairness and anomaly detection.
In this work we focus on the special but important case where the space is the Boolean hypercube, and the expected marginals are uniform. We show that the obvious definition of skewed subcubes can lead to intractable list sizes, and propose a better definition of a minimal skewed subcube, which are subcubes whose skew cannot be attributed to a larger subcube that contains it. Our main technical contribution is a list-size bound for this definition and an algorithm to efficiently find all such subcubes. Both the bound and the algorithm rely on Fourier-analytic techniques, especially the powerful hypercontractive inequality.
On the lower bounds side, we show that finding skewed subcubes is as hard as the sparse noisy parity problem, and hence our algorithms cannot be improved on substantially without a breakthrough on this problem which is believed to be intractable. Motivated by this, we study alternate models allowing query access to ? where finding skewed subcubes might be easier
- …