38,429 research outputs found

    Deep Weakly-supervised Anomaly Detection

    Full text link
    Anomaly detection is typically posited as an unsupervised learning task in the literature due to the prohibitive cost and difficulty to obtain large-scale labeled anomaly data, but this ignores the fact that a very small number (e.g.,, a few dozens) of labeled anomalies can often be made available with small/trivial cost in many real-world anomaly detection applications. To leverage such labeled anomaly data, we study an important anomaly detection problem termed weakly-supervised anomaly detection, in which, in addition to a large amount of unlabeled data, a limited number of labeled anomalies are available during modeling. Learning with the small labeled anomaly data enables anomaly-informed modeling, which helps identify anomalies of interest and address the notorious high false positives in unsupervised anomaly detection. However, the problem is especially challenging, since (i) the limited amount of labeled anomaly data often, if not always, cannot cover all types of anomalies and (ii) the unlabeled data is often dominated by normal instances but has anomaly contamination. We address the problem by formulating it as a pairwise relation prediction task. Particularly, our approach defines a two-stream ordinal regression neural network to learn the relation of randomly sampled instance pairs, i.e., whether the instance pair contains two labeled anomalies, one labeled anomaly, or just unlabeled data instances. The resulting model effectively leverages both the labeled and unlabeled data to substantially augment the training data and learn well-generalized representations of normality and abnormality. Comprehensive empirical results on 40 real-world datasets show that our approach (i) significantly outperforms four state-of-the-art methods in detecting both of the known and previously unseen anomalies and (ii) is substantially more data-efficient.Comment: Theoretical results are refined and extended. Significant more empirical results are added, including results on detecting previously unknown anomalie

    A hybrid intrusion detection system

    Get PDF
    Anomaly intrusion detection normally has high false alarm rates, and a high volume of false alarms will prevent system administrators identifying the real attacks. Machine learning methods provide an effective way to decrease the false alarm rate and improve the detection rate of anomaly intrusion detection. In this research, we propose a novel approach using kernel methods and Support Vector Machine (SVM) for improving anomaly intrusion detectors\u27 accuracy. Two kernels, STIDE kernel and Markov Chain kernel, are developed specially for intrusion detection applications. The experiments show the STIDE and Markov Chain kernel based two class SVM anomaly detectors have better accuracy rate than the original STIDE and Markov Chain anomaly detectors.;Generally, anomaly intrusion detection approaches build normal profiles from labeled training data. However, labeled training data for intrusion detection is expensive and not easy to obtain. We propose an anomaly detection approach, using STIDE kernel and Markov Chain kernel based one class SVM, that does not need labeled training data. To further increase the detection rate and lower the false alarm rate, an approach of integrating specification based intrusion detection with anomaly intrusion detection is also proposed.;This research also establish a platform which generates automatically both misuse and anomaly intrusion detection software agents. In our method, a SIFT representing an intrusion is automatically converted to a Colored Petri Net (CPNs) representing an intrusion detection template, subsequently, the CPN is compiled into code for misuse intrusion detection software agents using a compiler and dynamically loaded and launched for misuse intrusion detection. On the other hand, a model representing a normal profile is automatically generated from training data, subsequently, an anomaly intrusion detection agent which carries this model is generated and launched for anomaly intrusion detection. By engaging both misuse and anomaly intrusion detection agents, our system can detect known attacks as well as novel unknown attacks

    Semi-supervised Time Series Anomaly Detection Model Based on LSTM Autoencoder

    Get PDF
    Nowadays, time series data is more and more likely to appear in various real-world systems, such as power plants, medical care, etc. In these systems, time series anomaly detection is necessary, which involves predictive maintenance, intrusion detection, anti-fraud, cloud platform monitoring and management, etc. Generally, the anomaly detection of time series is regarded as an unsupervised learning problem. However, in a real scenario, in addition to a large set of unlabeled data, there is usually a small set of available labeled data, such as normal or abnormal data sets labeled by experts. Only a few methods use labeled data, and the existing semi-supervised algorithms are not yet suitable for the field of time series anomaly detection. In this work, we propose a semi-supervised time series anomaly detection model based on LSTM autoencoder. We improve the loss function of the LSTM autoencoder so that it can be affected by unlabeled data and labeled data at the same time, and learn the distribution of unlabeled data and labeled data at the same time by minimizing the loss function. In a large number of experiments on the Yahoo! Webscope S5 and NAB data sets, we compared the performance of the unsupervised model and the semi-supervised model of the same network framework to prove that the performance of the semi-supervised model is improved compared to the unsupervised model

    SAD: Semi-Supervised Anomaly Detection on Dynamic Graphs

    Full text link
    Anomaly detection aims to distinguish abnormal instances that deviate significantly from the majority of benign ones. As instances that appear in the real world are naturally connected and can be represented with graphs, graph neural networks become increasingly popular in tackling the anomaly detection problem. Despite the promising results, research on anomaly detection has almost exclusively focused on static graphs while the mining of anomalous patterns from dynamic graphs is rarely studied but has significant application value. In addition, anomaly detection is typically tackled from semi-supervised perspectives due to the lack of sufficient labeled data. However, most proposed methods are limited to merely exploiting labeled data, leaving a large number of unlabeled samples unexplored. In this work, we present semi-supervised anomaly detection (SAD), an end-to-end framework for anomaly detection on dynamic graphs. By a combination of a time-equipped memory bank and a pseudo-label contrastive learning module, SAD is able to fully exploit the potential of large unlabeled samples and uncover underlying anomalies on evolving graph streams. Extensive experiments on four real-world datasets demonstrate that SAD efficiently discovers anomalies from dynamic graphs and outperforms existing advanced methods even when provided with only little labeled data.Comment: Accepted to IJCAI'23. Code will be available at https://github.com/D10Andy/SA

    Deep Semi-Supervised Anomaly Detection for Finding Fraud in the Futures Market

    Full text link
    Modern financial electronic exchanges are an exciting and fast-paced marketplace where billions of dollars change hands every day. They are also rife with manipulation and fraud. Detecting such activity is a major undertaking, which has historically been a job reserved exclusively for humans. Recently, more research and resources have been focused on automating these processes via machine learning and artificial intelligence. Fraud detection is overwhelmingly associated with the greater field of anomaly detection, which is usually performed via unsupervised learning techniques because of the lack of labeled data needed for supervised learning. However, a small quantity of labeled data does often exist. This research article aims to evaluate the efficacy of a deep semi-supervised anomaly detection technique, called Deep SAD, for detecting fraud in high-frequency financial data. We use exclusive proprietary limit order book data from the TMX exchange in Montr\'eal, with a small set of true labeled instances of fraud, to evaluate Deep SAD against its unsupervised predecessor. We show that incorporating a small amount of labeled data into an unsupervised anomaly detection framework can greatly improve its accuracy.Comment: 8 pages, 3 figure
    • …
    corecore