3 research outputs found

    Proportional Voting based Semi-Unsupervised Machine Learning Intrusion Detection System

    Full text link
    Feature selection of NSL-KDD data set is usually done by finding co-relationships among features, irrespective of target prediction. We aim to determine the relationship between features and target goals to facilitate different target detection goals regardless of the correlated feature selection. The unbalanced data structure in NSL-KDD data can be relaxed by Proportional Representation (PR). However, adopting PR would deny the notion of winner-take-all by attracting a majority of the vote and also provide a fairly proportional share for any grouping of like-minded data. Furthermore, minorities and majorities would get a fair share of power and representation in data structure distribution. Particle Swarm Optimization (PSO) utilizes attack data for minority while majority employs non-attack data along with targeted classes to increase detection rate and reduce false alarms, especially for R2L and U2R attacks, as the output target goal influences feature selections and corresponding detection rate and false alarm rate. Our simulation study confirms the feasibility of the Voting Representation for minority protection and increased detection rate while reducing false alarms, which is favorable to minority over the majority

    Non-intrusive anomaly detection for encrypted networks

    Get PDF
    The use of encryption is steadily increasing. Packet payloads that are encrypted are becoming increasingly difficult to analyze using IDSs. This investigation uses a new non-intrusive IDS approach to detect network intrusions using a K-Means clustering methodology. It was found that this approach was able to detect many intrusions for these datasets while maintaining the encrypted confidentiality of packet information. This work utilized the KDD \u2799 and NSL-KDD evaluation datasets for testing

    Supervised fault detection using unstructured server-log data to support root cause analysis

    Get PDF
    Fault detection is one of the most important aspects of telecommunication networks. Considering the growing scale and complexity of communication networks, maintenance and debugging have become extremely complicated and expensive. In complex systems, a higher rate of failure, due to the large number of components, has increased the importance of both fault detection and root cause analysis. Fault detection for communication networks is based on analyzing system logs from servers or different components in a network in order to determine if there is any unusual activity. However, detecting and diagnosing problems in such huge systems are challenging tasks for human, since the amount of information, which needs to be processed goes far beyond the level that can be handled manually. Therefore, there is an immense demand for automatic processing of datasets to extract the relevant data needed for detecting anomalies. In a Big Data world, using machine learning techniques to analyze log data automatically becomes more and more popular. Machine learning based fault detection does not require any prior knowledge about the types of problems and does not rely on explicit programming (such as rule-based). Machine learning has the ability to improve its performance automatically through learning from experience. In this thesis, we investigate supervised machine learning approaches to detect known faults from unstructured log data as a fast and efficient approach. As the aim is to identify abnormal cases against normal ones, anomaly detection is considered to be a binary classification. For extracting numerical features from event logs as a primary step in any classification, we used windowing along with bag-of-words approaches considering their textual characteristics (high dimension and sparseness). We focus on linear classification methods such as single layer perceptron and Support Vector Machines as promising candidate methods for supervised fault detection based on the textual characteristics of network-based server-log data. In order to generate an appropriate approach generalizing for detecting known faults, two important factors are investigated, namely the size of datasets and the time duration of faults. By investigating the experimental results concerning these two aforementioned factors, a two-layer classification is proposed to overcome the windowing and feature extraction challenges for long lasting faults. The thesis proposes a novel approach for collecting feature vectors for two layers of a two-layer classification. In the first layer we attempt to detect the starting line of each fault repetition as well as the fault duration. The obtained models from the first layer are used to create feature vectors for the second layer. In order to evaluate the learning algorithms and select the best detection model, cross validation and F-scores are used in this thesis because traditional metrics such as accuracy and error rates are not well suited for imbalanced datasets. The experimental results show that the proposed SVM classifier provides the best performance independent of fault duration, while factors such as labelling rule and reduction of the feature space have no significant effect on the performance. In addition, the results show that the two-layer classification system can improve the performance of fault detection; however, a more suited approach for collecting feature vectors with smaller time span needs to be further investigated
    corecore