3,425 research outputs found
SENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis
In this paper, we propose a novel approach, called SENATUS, for joint traffic
anomaly detection and root-cause analysis. Inspired from the concept of a
senate, the key idea of the proposed approach is divided into three stages:
election, voting and decision. At the election stage, a small number of
\nop{traffic flow sets (termed as senator flows)}senator flows are chosen\nop{,
which are used} to represent approximately the total (usually huge) set of
traffic flows. In the voting stage, anomaly detection is applied on the senator
flows and the detected anomalies are correlated to identify the most possible
anomalous time bins. Finally in the decision stage, a machine learning
technique is applied to the senator flows of each anomalous time bin to find
the root cause of the anomalies. We evaluate SENATUS using traffic traces
collected from the Pan European network, GEANT, and compare against another
approach which detects anomalies using lossless compression of traffic
histograms. We show the effectiveness of SENATUS in diagnosing anomaly types:
network scans and DoS/DDoS attacks
Fake View Analytics in Online Video Services
Online video-on-demand(VoD) services invariably maintain a view count for
each video they serve, and it has become an important currency for various
stakeholders, from viewers, to content owners, advertizers, and the online
service providers themselves. There is often significant financial incentive to
use a robot (or a botnet) to artificially create fake views. How can we detect
the fake views? Can we detect them (and stop them) using online algorithms as
they occur? What is the extent of fake views with current VoD service
providers? These are the questions we study in the paper. We develop some
algorithms and show that they are quite effective for this problem.Comment: 25 pages, 15 figure
Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow
Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow
Contamination Estimation via Convex Relaxations
Identifying anomalies and contamination in datasets is important in a wide
variety of settings. In this paper, we describe a new technique for estimating
contamination in large, discrete valued datasets. Our approach considers the
normal condition of the data to be specified by a model consisting of a set of
distributions. Our key contribution is in our approach to contamination
estimation. Specifically, we develop a technique that identifies the minimum
number of data points that must be discarded (i.e., the level of contamination)
from an empirical data set in order to match the model to within a specified
goodness-of-fit, controlled by a p-value. Appealing to results from large
deviations theory, we show a lower bound on the level of contamination is
obtained by solving a series of convex programs. Theoretical results guarantee
the bound converges at a rate of , where p is the size of
the empirical data set.Comment: To appear, ISIT 201
- …