77,631 research outputs found
Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees
This paper investigates an important problem in stream mining, i.e.,
classification under streaming emerging new classes or SENC. The common
approach is to treat it as a classification problem and solve it using either a
supervised learner or a semi-supervised learner. We propose an alternative
approach by using unsupervised learning as the basis to solve this problem. The
SENC problem can be decomposed into three sub problems: detecting emerging new
classes, classifying for known classes, and updating models to enable
classification of instances of the new class and detection of more emerging new
classes. The proposed method employs completely random trees which have been
shown to work well in unsupervised learning and supervised learning
independently in the literature. This is the first time, as far as we know,
that completely random trees are used as a single common core to solve all
three sub problems: unsupervised learning, supervised learning and model update
in data streams. We show that the proposed unsupervised-learning-focused method
often achieves significantly better outcomes than existing
classification-focused methods
Tiresias: Online Anomaly Detection for Hierarchical Operational Network Data
Operational network data, management data such as customer care call logs and
equipment system logs, is a very important source of information for network
operators to detect problems in their networks. Unfortunately, there is lack of
efficient tools to automatically track and detect anomalous events on
operational data, causing ISP operators to rely on manual inspection of this
data. While anomaly detection has been widely studied in the context of network
data, operational data presents several new challenges, including the
volatility and sparseness of data, and the need to perform fast detection
(complicating application of schemes that require offline processing or
large/stable data sets to converge).
To address these challenges, we propose Tiresias, an automated approach to
locating anomalous events on hierarchical operational data. Tiresias leverages
the hierarchical structure of operational data to identify high-impact
aggregates (e.g., locations in the network, failure modes) likely to be
associated with anomalous events. To accommodate different kinds of operational
network data, Tiresias consists of an online detection algorithm with low time
and space complexity, while preserving high detection accuracy. We present
results from two case studies using operational data collected at a large
commercial IP network operated by a Tier-1 ISP: customer care call logs and
set-top box crash logs. By comparing with a reference set verified by the ISP's
operational group, we validate that Tiresias can achieve >94% accuracy in
locating anomalies. Tiresias also discovered several previously unknown
anomalies in the ISP's customer care cases, demonstrating its effectiveness
APHRODITE: an Anomaly-based Architecture for False Positive Reduction
We present APHRODITE, an architecture designed to reduce false positives in
network intrusion detection systems. APHRODITE works by detecting anomalies in
the output traffic, and by correlating them with the alerts raised by the NIDS
working on the input traffic. Benchmarks show a substantial reduction of false
positives and that APHRODITE is effective also after a "quick setup", i.e. in
the realistic case in which it has not been "trained" and set up optimall
Support Vector Machine for Network Intrusion and Cyber-Attack Detection
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Cyber-security threats are a growing concern in networked environments. The development of Intrusion Detection Systems (IDSs) is fundamental in order to provide extra level of security. We have developed an unsupervised anomaly-based IDS that uses statistical techniques to conduct the detection process. Despite providing many advantages, anomaly-based IDSs tend to generate a high number of false alarms. Machine Learning (ML) techniques have gained wide interest in tasks of intrusion detection. In this work, Support Vector Machine (SVM) is deemed as an ML technique that could complement the performance of our IDS, providing a second line of detection to reduce the number of false alarms, or as an alternative detection technique. We assess the performance of our IDS against one-class and two-class SVMs, using linear and non-linear forms. The results that we present show that linear two-class SVM generates highly accurate results, and the accuracy of the linear one-class SVM is very comparable, and it does not need training datasets associated with malicious data. Similarly, the results evidence that our IDS could benefit from the use of ML techniques to increase its accuracy when analysing datasets comprising of non-homogeneous features
Efficient classification using parallel and scalable compressed model and Its application on intrusion detection
In order to achieve high efficiency of classification in intrusion detection,
a compressed model is proposed in this paper which combines horizontal
compression with vertical compression. OneR is utilized as horizontal
com-pression for attribute reduction, and affinity propagation is employed as
vertical compression to select small representative exemplars from large
training data. As to be able to computationally compress the larger volume of
training data with scalability, MapReduce based parallelization approach is
then implemented and evaluated for each step of the model compression process
abovementioned, on which common but efficient classification methods can be
directly used. Experimental application study on two publicly available
datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the
classification using the compressed model proposed can effectively speed up the
detection procedure at up to 184 times, most importantly at the cost of a
minimal accuracy difference with less than 1% on average
- …