34 research outputs found
Anomaly Detection on Graph Time Series
In this paper, we use variational recurrent neural network to investigate the
anomaly detection problem on graph time series. The temporal correlation is
modeled by the combination of recurrent neural network (RNN) and variational
inference (VI), while the spatial information is captured by the graph
convolutional network. In order to incorporate external factors, we use feature
extractor to augment the transition of latent variables, which can learn the
influence of external factors. With the target function as accumulative ELBO,
it is easy to extend this model to on-line method. The experimental study on
traffic flow data shows the detection capability of the proposed method
How Far Should We Look Back to Achieve Effective Real-Time Time-Series Anomaly Detection?
Anomaly detection is the process of identifying unexpected events or
ab-normalities in data, and it has been applied in many different areas such as
system monitoring, fraud detection, healthcare, intrusion detection, etc.
Providing real-time, lightweight, and proactive anomaly detection for time
series with neither human intervention nor domain knowledge could be highly
valuable since it reduces human effort and enables appropriate countermeasures
to be undertaken before a disastrous event occurs. To our knowledge, RePAD
(Real-time Proactive Anomaly Detection algorithm) is a generic approach with
all above-mentioned features. To achieve real-time and lightweight detection,
RePAD utilizes Long Short-Term Memory (LSTM) to detect whether or not each
upcoming data point is anomalous based on short-term historical data points.
However, it is unclear that how different amounts of historical data points
affect the performance of RePAD. Therefore, in this paper, we investigate the
impact of different amounts of historical data on RePAD by introducing a set of
performance metrics that cover novel detection accuracy measures, time
efficiency, readiness, and resource consumption, etc. Empirical experiments
based on real-world time series datasets are conducted to evaluate RePAD in
different scenarios, and the experimental results are presented and discussed.Comment: 12 pages, 5 figures, and 9 tables, Proceedings of the 35th
International Conference on Advanced Information Network-ing and Applications
(AINA 2021
A Novel Approach for Detecting Outliers by Using Isolation Forest with Reducing Under Fitting Issue
The effectiveness of machine learning for a particular activity depends on a variety of parameters. The incident database's description and validity come first and primary. Information retrieval even during the training cycle is more challenging if there is a lot of repetitious, unimportant information or incomplete information available. It is good knowledge that running time for ML tasks is significantly impacted by conditions as follows and sorting stages. To increase the accuracy of any model data cleansing is essential. Without sufficient data scrubbing, no predictive model accuracy can begin. EDA, or exploratory data analysis, is the name of this procedure. In this study, we discussed outlier identification, one of many EDA processes for complete perfect data. In this research, we attempted to use the isolation forest approach to calculate the outlier factor. Then a model known as an outlier finding model is created. The problem of outlier detection leads to a collection of connected supervised learning for binary classification. We carry out in-depth tests on various datasets and demonstrate that in our latest outlier finding technique compare with the old way. Our approach yields superior outcomes in terms of accuracy, precision, recall & F-1 score. Additionally, we successfully lowered the machine learning algorithms' under fitting issue
Associative classifier for uncertain data
Associative classifiers are relatively easy for people to understand and often outperform decision tree learners on many classification problems. Existing associative classifiers only work with certain data. However, data uncertainty is prevalent in many real-world applications such as sensor network, market analysis and medical diagnosis. And uncertainty may render many conventional classifiers inapplicable to uncertain classification tasks. In this paper, based on U-Apriori algorothm and CBA algorithm, we propose an associative classifier for uncertain data, uCBA (uncertain Classification Based on Associative), which can classify both certain and uncertain data. The algorithm redefines the support, confidence, rule pruning and classification strategy of CBA. Experimental results on 21 datasets from UCI Repository demonstrate that the proposed algorithm yields good performance and has satisfactory performance even on highly uncertain data
RePAD: Real-time Proactive Anomaly Detection for Time Series
During the past decade, many anomaly detection approaches have been
introduced in different fields such as network monitoring, fraud detection, and
intrusion detection. However, they require understanding of data pattern and
often need a long off-line period to build a model or network for the target
data. Providing real-time and proactive anomaly detection for streaming time
series without human intervention and domain knowledge is highly valuable since
it greatly reduces human effort and enables appropriate countermeasures to be
undertaken before a disastrous damage, failure, or other harmful event occurs.
However, this issue has not been well studied yet. To address it, this paper
proposes RePAD, which is a Real-time Proactive Anomaly Detection algorithm for
streaming time series based on Long Short-Term Memory (LSTM). RePAD utilizes
short-term historic data points to predict and determine whether or not the
upcoming data point is a sign that an anomaly is likely to happen in the near
future. By dynamically adjusting the detection threshold over time, RePAD is
able to tolerate minor pattern change in time series and detect anomalies
either proactively or on time. Experiments based on two time series datasets
collected from the Numenta Anomaly Benchmark demonstrate that RePAD is able to
proactively detect anomalies and provide early warnings in real time without
human intervention and domain knowledge.Comment: 12 pages, 8 figures, the 34th International Conference on Advanced
Information Networking and Applications (AINA 2020
From Physical to Cyber: Escalating Protection for Personalized Auto Insurance
Nowadays, auto insurance companies set personalized insurance rate based on
data gathered directly from their customers' cars. In this paper, we show such
a personalized insurance mechanism -- wildly adopted by many auto insurance
companies -- is vulnerable to exploit. In particular, we demonstrate that an
adversary can leverage off-the-shelf hardware to manipulate the data to the
device that collects drivers' habits for insurance rate customization and
obtain a fraudulent insurance discount. In response to this type of attack, we
also propose a defense mechanism that escalates the protection for insurers'
data collection. The main idea of this mechanism is to augment the insurer's
data collection device with the ability to gather unforgeable data acquired
from the physical world, and then leverage these data to identify manipulated
data points. Our defense mechanism leveraged a statistical model built on
unmanipulated data and is robust to manipulation methods that are not foreseen
previously. We have implemented this defense mechanism as a proof-of-concept
prototype and tested its effectiveness in the real world. Our evaluation shows
that our defense mechanism exhibits a false positive rate of 0.032 and a false
negative rate of 0.013.Comment: Appeared in Sensys 201
Uncertain distance-based outlier detection with arbitrarily shaped data objects
AbstractEnabling information systems to face anomalies in the presence of uncertainty is a compelling and challenging task. In this work the problem of unsupervised outlier detection in large collections of data objects modeled by means of arbitrary multidimensional probability density functions is considered. We present a novel definition ofuncertain distance-based outlierunder the attribute level uncertainty model, according to which an uncertain object is an object that always exists but its actual value is modeled by a multivariate pdf. According to this definition an uncertain object is declared to be an outlier on the basis of the expected number of its neighbors in the dataset. To the best of our knowledge this is the first work that considers the unsupervised outlier detection problem on data objects modeled by means of arbitrarily shaped multidimensional distribution functions. We present the UDBOD algorithm which efficiently detects the outliers in an input uncertain dataset by taking advantages of three optimized phases, that are parameter estimation, candidate selection, and the candidate filtering. An experimental campaign is presented, including a sensitivity analysis, a study of the effectiveness of the technique, a comparison with related algorithms, also in presence of high dimensional data, and a discussion about the behavior of our technique in real case scenarios