38,429 research outputs found
Deep Weakly-supervised Anomaly Detection
Anomaly detection is typically posited as an unsupervised learning task in
the literature due to the prohibitive cost and difficulty to obtain large-scale
labeled anomaly data, but this ignores the fact that a very small number
(e.g.,, a few dozens) of labeled anomalies can often be made available with
small/trivial cost in many real-world anomaly detection applications. To
leverage such labeled anomaly data, we study an important anomaly detection
problem termed weakly-supervised anomaly detection, in which, in addition to a
large amount of unlabeled data, a limited number of labeled anomalies are
available during modeling. Learning with the small labeled anomaly data enables
anomaly-informed modeling, which helps identify anomalies of interest and
address the notorious high false positives in unsupervised anomaly detection.
However, the problem is especially challenging, since (i) the limited amount of
labeled anomaly data often, if not always, cannot cover all types of anomalies
and (ii) the unlabeled data is often dominated by normal instances but has
anomaly contamination. We address the problem by formulating it as a pairwise
relation prediction task. Particularly, our approach defines a two-stream
ordinal regression neural network to learn the relation of randomly sampled
instance pairs, i.e., whether the instance pair contains two labeled anomalies,
one labeled anomaly, or just unlabeled data instances. The resulting model
effectively leverages both the labeled and unlabeled data to substantially
augment the training data and learn well-generalized representations of
normality and abnormality. Comprehensive empirical results on 40 real-world
datasets show that our approach (i) significantly outperforms four
state-of-the-art methods in detecting both of the known and previously unseen
anomalies and (ii) is substantially more data-efficient.Comment: Theoretical results are refined and extended. Significant more
empirical results are added, including results on detecting previously
unknown anomalie
A hybrid intrusion detection system
Anomaly intrusion detection normally has high false alarm rates, and a high volume of false alarms will prevent system administrators identifying the real attacks. Machine learning methods provide an effective way to decrease the false alarm rate and improve the detection rate of anomaly intrusion detection. In this research, we propose a novel approach using kernel methods and Support Vector Machine (SVM) for improving anomaly intrusion detectors\u27 accuracy. Two kernels, STIDE kernel and Markov Chain kernel, are developed specially for intrusion detection applications. The experiments show the STIDE and Markov Chain kernel based two class SVM anomaly detectors have better accuracy rate than the original STIDE and Markov Chain anomaly detectors.;Generally, anomaly intrusion detection approaches build normal profiles from labeled training data. However, labeled training data for intrusion detection is expensive and not easy to obtain. We propose an anomaly detection approach, using STIDE kernel and Markov Chain kernel based one class SVM, that does not need labeled training data. To further increase the detection rate and lower the false alarm rate, an approach of integrating specification based intrusion detection with anomaly intrusion detection is also proposed.;This research also establish a platform which generates automatically both misuse and anomaly intrusion detection software agents. In our method, a SIFT representing an intrusion is automatically converted to a Colored Petri Net (CPNs) representing an intrusion detection template, subsequently, the CPN is compiled into code for misuse intrusion detection software agents using a compiler and dynamically loaded and launched for misuse intrusion detection. On the other hand, a model representing a normal profile is automatically generated from training data, subsequently, an anomaly intrusion detection agent which carries this model is generated and launched for anomaly intrusion detection. By engaging both misuse and anomaly intrusion detection agents, our system can detect known attacks as well as novel unknown attacks
Semi-supervised Time Series Anomaly Detection Model Based on LSTM Autoencoder
Nowadays, time series data is more and more likely to appear in various real-world systems, such as power plants, medical care, etc. In these systems, time series anomaly detection is necessary, which involves predictive maintenance, intrusion detection, anti-fraud, cloud platform monitoring and management, etc. Generally, the anomaly detection of time series is regarded as an unsupervised learning problem. However, in a real scenario, in addition to a large set of unlabeled data, there is usually a small set of available labeled data, such as normal or abnormal data sets labeled by experts. Only a few methods use labeled data, and the existing semi-supervised algorithms are not yet suitable for the field of time series anomaly detection. In this work, we propose a semi-supervised time series anomaly detection model based on LSTM autoencoder. We improve the loss function of the LSTM autoencoder so that it can be affected by unlabeled data and labeled data at the same time, and learn the distribution of unlabeled data and labeled data at the same time by minimizing the loss function. In a large number of experiments on the Yahoo! Webscope S5 and NAB data sets, we compared the performance of the unsupervised model and the semi-supervised model of the same network framework to prove that the performance of the semi-supervised model is improved compared to the unsupervised model
SAD: Semi-Supervised Anomaly Detection on Dynamic Graphs
Anomaly detection aims to distinguish abnormal instances that deviate
significantly from the majority of benign ones. As instances that appear in the
real world are naturally connected and can be represented with graphs, graph
neural networks become increasingly popular in tackling the anomaly detection
problem. Despite the promising results, research on anomaly detection has
almost exclusively focused on static graphs while the mining of anomalous
patterns from dynamic graphs is rarely studied but has significant application
value. In addition, anomaly detection is typically tackled from semi-supervised
perspectives due to the lack of sufficient labeled data. However, most proposed
methods are limited to merely exploiting labeled data, leaving a large number
of unlabeled samples unexplored. In this work, we present semi-supervised
anomaly detection (SAD), an end-to-end framework for anomaly detection on
dynamic graphs. By a combination of a time-equipped memory bank and a
pseudo-label contrastive learning module, SAD is able to fully exploit the
potential of large unlabeled samples and uncover underlying anomalies on
evolving graph streams. Extensive experiments on four real-world datasets
demonstrate that SAD efficiently discovers anomalies from dynamic graphs and
outperforms existing advanced methods even when provided with only little
labeled data.Comment: Accepted to IJCAI'23. Code will be available at
https://github.com/D10Andy/SA
Deep Semi-Supervised Anomaly Detection for Finding Fraud in the Futures Market
Modern financial electronic exchanges are an exciting and fast-paced
marketplace where billions of dollars change hands every day. They are also
rife with manipulation and fraud. Detecting such activity is a major
undertaking, which has historically been a job reserved exclusively for humans.
Recently, more research and resources have been focused on automating these
processes via machine learning and artificial intelligence. Fraud detection is
overwhelmingly associated with the greater field of anomaly detection, which is
usually performed via unsupervised learning techniques because of the lack of
labeled data needed for supervised learning. However, a small quantity of
labeled data does often exist. This research article aims to evaluate the
efficacy of a deep semi-supervised anomaly detection technique, called Deep
SAD, for detecting fraud in high-frequency financial data. We use exclusive
proprietary limit order book data from the TMX exchange in Montr\'eal, with a
small set of true labeled instances of fraud, to evaluate Deep SAD against its
unsupervised predecessor. We show that incorporating a small amount of labeled
data into an unsupervised anomaly detection framework can greatly improve its
accuracy.Comment: 8 pages, 3 figure
- …