722 research outputs found

    Evolution of a Hybrid Model for an Effective Perimeter Security Device

    Get PDF
    Clustering and classification models, or hybrid models are the most widely used models that can handle the diverse nature of NIDS dataset. Dirichlet process clustering technique is a non-parametric Bayesian mixture model that considers the data distribution of the dataset for the formation of distinct clusters. The number of clusters is not known a priori and it differs across different datasets. Determining the number of clusters based on the distribution of data instances can increase the performance of the model. Naive Bayes model, a supervised learning classification technique, maintains a better computational efficiency, by reducing the training time. In this paper, we propose a hybrid model to exploit the positive aspect of proper clustering of data instances and the computational efficiency in building a NIDS. RIPPER algorithm is used to extract rules from the traffic description for updation of the rule database. Experiments were conducted in the KDD CUP’99 and SSENet-2011 datasets to study the performance of the proposed model. Also, a comparison of three hybrid methods with the proposed hybrid model was carried out. The results showed that the proposed hybrid model is superior in building a robust perimeter security device

    Bayesian Learning Frameworks for Multivariate Beta Mixture Models

    Get PDF
    Mixture models have been widely used as a statistical learning paradigm in various unsupervised machine learning applications, where labeling a vast amount of data is impractical and costly. They have shown a significant success and encouraging performance in many real-world problems from different fields such as computer vision, information retrieval and pattern recognition. One of the most widely used distributions in mixture models is Gaussian distribution, due to its characteristics, such as its simplicity and fitting capabilities. However, data obtained from some applications could have different properties like non-Gaussian and asymmetric nature. In this thesis, we propose multivariate Beta mixture models which offer flexibility, various shapes with promising attributes. These models can be considered as decent alternatives to Gaussian distributions. We explore multiple Bayesian inference approaches for multivariate Beta mixture models and propose a suitable solution for the problem of estimating parameters using Markov Chain Monte Carlo (MCMC) technique. We exploit Gibbs sampling within Metropolis-Hastings for learning parameters of our finite mixture model. Moreover, a fully Bayesian approach based on birth-death MCMC technique is proposed which simultaneously allows cluster assignments, parameters estimation and the selection of the optimal number of clusters. Finally, we develop a nonparametric Bayesian framework by extending our finite mixture model to infinity using Dirichlet process to tackle the model selection problem. Experimental results obtained from challenging applications (e.g., intrusion detection, medical, etc.) confirm that our proposed frameworks can provide effective solutions comparing to existing alternatives

    Security Evaluation of Support Vector Machines in Adversarial Environments

    Full text link
    Support Vector Machines (SVMs) are among the most popular classification techniques adopted in security applications like malware detection, intrusion detection, and spam filtering. However, if SVMs are to be incorporated in real-world security systems, they must be able to cope with attack patterns that can either mislead the learning algorithm (poisoning), evade detection (evasion), or gain information about their internal parameters (privacy breaches). The main contributions of this chapter are twofold. First, we introduce a formal general framework for the empirical evaluation of the security of machine-learning systems. Second, according to our framework, we demonstrate the feasibility of evasion, poisoning and privacy attacks against SVMs in real-world security problems. For each attack technique, we evaluate its impact and discuss whether (and how) it can be countered through an adversary-aware design of SVMs. Our experiments are easily reproducible thanks to open-source code that we have made available, together with all the employed datasets, on a public repository.Comment: 47 pages, 9 figures; chapter accepted into book 'Support Vector Machine Applications

    Featured Anomaly Detection Methods and Applications

    Get PDF
    Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people’s daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks

    A Validity-Based Approach for Feature Selection in Intrusion Detection Systems

    Get PDF
    Intrusion detection systems are tools that detect and remedy the presence of malicious activities. Intrusion detection systems face many challenges in terms of accurate analysis and evaluation. One such challenge is the involvement of many features during analysis, which leads to high data volume and ultimately excessive computational overhead. This research surrounds the development of a new intrusion detection system by employing an entropy-based measure called v-measure to select significant features and reduce dimensionality. After the development of the intrusion detection system, this feature reduction technique was tested on public datasets by applying machine learning classifiers such as Decision Tree, Random Forest, and AdaBoost algorithms. We have compared the results of the features selected with other feature selection techniques for correct classification of attacks. The findings demonstrated dimension and data volume reduction while maintaining low false positive rate, low false negative rate, and high detection rate
    • …
    corecore