6,722 research outputs found
Survey on Incremental Approaches for Network Anomaly Detection
As the communication industry has connected distant corners of the globe
using advances in network technology, intruders or attackers have also
increased attacks on networking infrastructure commensurately. System
administrators can attempt to prevent such attacks using intrusion detection
tools and systems. There are many commercially available signature-based
Intrusion Detection Systems (IDSs). However, most IDSs lack the capability to
detect novel or previously unknown attacks. A special type of IDSs, called
Anomaly Detection Systems, develop models based on normal system or network
behavior, with the goal of detecting both known and unknown attacks. Anomaly
detection systems face many problems including high rate of false alarm,
ability to work in online mode, and scalability. This paper presents a
selective survey of incremental approaches for detecting anomaly in normal
system or network traffic. The technological trends, open problems, and
challenges over anomaly detection using incremental approach are also
discussed.Comment: 14 pages, 1 figure, 11 tables referred journal publicatio
Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications
The von Neumann graph entropy (VNGE) facilitates measurement of information
divergence and distance between graphs in a graph sequence. It has been
successfully applied to various learning tasks driven by network-based data.
While effective, VNGE is computationally demanding as it requires the full
eigenspectrum of the graph Laplacian matrix. In this paper, we propose a new
computational framework, Fast Incremental von Neumann Graph EntRopy (FINGER),
which approaches VNGE with a performance guarantee. FINGER reduces the cubic
complexity of VNGE to linear complexity in the number of nodes and edges, and
thus enables online computation based on incremental graph changes. We also
show asymptotic equivalence of FINGER to the exact VNGE, and derive its
approximation error bounds. Based on FINGER, we propose efficient algorithms
for computing Jensen-Shannon distance between graphs. Our experimental results
on different random graph models demonstrate the computational efficiency and
the asymptotic equivalence of FINGER. In addition, we apply FINGER to two
real-world applications and one synthesized anomaly detection dataset, and
corroborate its superior performance over seven baseline graph similarity
methods.Comment: Published at ICML 201
On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data
Ever growing volume and velocity of data coupled with decreasing attention
span of end users underscore the critical need for real-time analytics. In this
regard, anomaly detection plays a key role as an application as well as a means
to verify data fidelity. Although the subject of anomaly detection has been
researched for over 100 years in a multitude of disciplines such as, but not
limited to, astronomy, statistics, manufacturing, econometrics, marketing, most
of the existing techniques cannot be used as is on real-time data streams.
Further, the lack of characterization of performance -- both with respect to
real-timeliness and accuracy -- on production data sets makes model selection
very challenging. To this end, we present an in-depth analysis, geared towards
real-time streaming data, of anomaly detection techniques. Given the
requirements with respect to real-timeliness and accuracy, the analysis
presented in this paper should serve as a guide for selection of the "best"
anomaly detection technique. To the best of our knowledge, this is the first
characterization of anomaly detection techniques proposed in very diverse set
of fields, using production data sets corresponding to a wide set of
application domains.Comment: 12 page
A Network Intrusions Detection System based on a Quantum Bio Inspired Algorithm
Network intrusion detection systems (NIDSs) have a role of identifying
malicious activities by monitoring the behavior of networks. Due to the
currently high volume of networks trafic in addition to the increased number of
attacks and their dynamic properties, NIDSs have the challenge of improving
their classification performance. Bio-Inspired Optimization Algorithms (BIOs)
are used to automatically extract the the discrimination rules of normal or
abnormal behavior to improve the classification accuracy and the detection
ability of NIDS. A quantum vaccined immune clonal algorithm with the estimation
of distribution algorithm (QVICA-with EDA) is proposed in this paper to build a
new NIDS. The proposed algorithm is used as classification algorithm of the new
NIDS where it is trained and tested using the KDD data set. Also, the new NIDS
is compared with another detection system based on particle swarm optimization
(PSO). Results shows the ability of the proposed algorithm of achieving high
intrusions classification accuracy where the highest obtained accuracy is 94.8
%
Sequential Outlier Detection based on Incremental Decision Trees
We introduce an online outlier detection algorithm to detect outliers in a
sequentially observed data stream. For this purpose, we use a two-stage
filtering and hedging approach. In the first stage, we construct a multi-modal
probability density function to model the normal samples. In the second stage,
given a new observation, we label it as an anomaly if the value of
aforementioned density function is below a specified threshold at the newly
observed point. In order to construct our multi-modal density function, we use
an incremental decision tree to construct a set of subspaces of the observation
space. We train a single component density function of the exponential family
using the observations, which fall inside each subspace represented on the
tree. These single component density functions are then adaptively combined to
produce our multi-modal density function, which is shown to achieve the
performance of the best convex combination of the density functions defined on
the subspaces. As we observe more samples, our tree grows and produces more
subspaces. As a result, our modeling power increases in time, while mitigating
overfitting issues. In order to choose our threshold level to label the
observations, we use an adaptive thresholding scheme. We show that our adaptive
threshold level achieves the performance of the optimal pre-fixed threshold
level, which knows the observation labels in hindsight. Our algorithm provides
significant performance improvements over the state of the art in our wide set
of experiments involving both synthetic as well as real data
Should I Raise The Red Flag? A comprehensive survey of anomaly scoring methods toward mitigating false alarms
Nowadays, advanced intrusion detection systems (IDSs) rely on a combination
of anomaly detection and signature-based methods. An IDS gathers observations,
analyzes behavioral patterns, and reports suspicious events for further
investigation. A notorious issue anomaly detection systems (ADSs) and IDSs face
is the possibility of high false alarms, which even state-of-the-art systems
have not overcome. This is especially a problem with large and complex systems.
The number of non-critical alarms can easily overwhelm administrators and
increase the likelihood of ignoring future alerts. Mitigation strategies thus
aim to avoid raising `too many' false alarms without missing potentially
dangerous situations. There are two major categories of false alarm-mitigation
strategies: (1) methods that are customized to enhance the quality of anomaly
scoring; (2) approaches acting as filtering methods in contexts that aim to
decrease false alarm rates. These methods have been widely utilized by many
scholars. Herein, we review and compare the existing techniques for false alarm
mitigation in ADSs. We also examine the use of promising techniques in
signature-based IDS and other relevant contexts, such as commercial security
information and event management tools, which are promising for ADSs. We
conclude by highlighting promising directions for future research.Comment: arXiv admin note: text overlap with arXiv:1802.04431,
arXiv:1503.01158 by other author
Unsupervised Place Discovery for Place-Specific Change Classifier
In this study, we address the problem of supervised change detection for
robotic map learning applications, in which the aim is to train a
place-specific change classifier (e.g., support vector machine (SVM)) to
predict changes from a robot's view image. An open question is the manner in
which to partition a robot's workspace into places (e.g., SVMs) to maximize the
overall performance of change classifiers. This is a chicken-or-egg problem: if
we have a well-trained change classifier, partitioning the robot's workspace
into places is rather easy. However, training a change classifier requires a
set of place-specific training data. In this study, we address this novel
problem, which we term unsupervised place discovery. In addition, we present a
solution powered by convolutional-feature-based visual place recognition, and
validate our approach by applying it to two place-specific change classifiers,
namely, nuisance and anomaly predictors.Comment: 6 pages, 6 figures, 2 tables, Technical repor
Review of Smart Meter Data Analytics: Applications, Methodologies, and Challenges
The widespread popularity of smart meters enables an immense amount of
fine-grained electricity consumption data to be collected. Meanwhile, the
deregulation of the power industry, particularly on the delivery side, has
continuously been moving forward worldwide. How to employ massive smart meter
data to promote and enhance the efficiency and sustainability of the power grid
is a pressing issue. To date, substantial works have been conducted on smart
meter data analytics. To provide a comprehensive overview of the current
research and to identify challenges for future research, this paper conducts an
application-oriented review of smart meter data analytics. Following the three
stages of analytics, namely, descriptive, predictive and prescriptive
analytics, we identify the key application areas as load analysis, load
forecasting, and load management. We also review the techniques and
methodologies adopted or developed to address each application. In addition, we
also discuss some research trends, such as big data issues, novel machine
learning technologies, new business models, the transition of energy systems,
and data privacy and security.Comment: IEEE Transactions on Smart Grid, 201
CADDeLaG: Framework for distributed anomaly detection in large dense graph sequences
Random walk based distance measures for graphs such as commute-time distance
are useful in a variety of graph algorithms, such as clustering, anomaly
detection, and creating low dimensional embeddings. Since such measures hinge
on the spectral decomposition of the graph, the computation becomes a
bottleneck for large graphs and do not scale easily to graphs that cannot be
loaded in memory. Most existing graph mining libraries for large graphs either
resort to sampling or exploit the sparsity structure of such graphs for
spectral analysis. However, such methods do not work for dense graphs
constructed for studying pairwise relationships among entities in a data set.
Examples of such studies include analyzing pairwise locations in gridded
climate data for discovering long distance climate phenomena. These graphs
representations are fully connected by construction and cannot be sparsified
without loss of meaningful information. In this paper we describe CADDeLaG, a
framework for scalable computation of commute-time distance based anomaly
detection in large dense graphs without the need to load the entire graph in
memory. The framework relies on Apache Spark's memory-centric cluster-computing
infrastructure and consists of two building blocks: a decomposable algorithm
for commute time distance computation and a distributed linear system solver.
We illustrate the scalability of CADDeLaG and its dependency on various factors
using both synthetic and real world data sets. We demonstrate the usefulness of
CADDeLaG in identifying anomalies in a climate graph sequence, that have been
historically missed due to ad hoc graph sparsification and on an election
donation data set
Energy-based Models for Video Anomaly Detection
Automated detection of abnormalities in data has been studied in research
area in recent years because of its diverse applications in practice including
video surveillance, industrial damage detection and network intrusion
detection. However, building an effective anomaly detection system is a
non-trivial task since it requires to tackle challenging issues of the shortage
of annotated data, inability of defining anomaly objects explicitly and the
expensive cost of feature engineering procedure. Unlike existing appoaches
which only partially solve these problems, we develop a unique framework to
cope the problems above simultaneously. Instead of hanlding with ambiguous
definition of anomaly objects, we propose to work with regular patterns whose
unlabeled data is abundant and usually easy to collect in practice. This allows
our system to be trained completely in an unsupervised procedure and liberate
us from the need for costly data annotation. By learning generative model that
capture the normality distribution in data, we can isolate abnormal data points
that result in low normality scores (high abnormality scores). Moreover, by
leverage on the power of generative networks, i.e. energy-based models, we are
also able to learn the feature representation automatically rather than
replying on hand-crafted features that have been dominating anomaly detection
research over many decades. We demonstrate our proposal on the specific
application of video anomaly detection and the experimental results indicate
that our method performs better than baselines and are comparable with
state-of-the-art methods in many benchmark video anomaly detection datasets
- …