3 research outputs found

    Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.

    Get PDF
    For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions

    Machine learning for network based intrusion detection : an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data

    Get PDF
    For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    An Approach To The Correlation Of Security Events Based On Machine Learning Techniques

    No full text
    Organizations face the ever growing challenge of providing security within their IT infrastructures. Static approaches to security, such as perimetral defense, have proven less than effective - and, therefore, more vulnerable - in a new scenario characterized by increasingly complex systems and by the evolution and automation of cyber attacks. Moreover, dynamic detection of attacks through IDSs (Instrusion Detection Systems) presents too many false positives to be effective. This work presents an approach on how to collect and normalize, as well as how to fuse and classify, security alerts. This approach involves collecting alerts from different sources and normalizes them according to standardized structures - IDMEF (Intrusion Detection Message Exchange Format). The normalized alerts are grouped into meta-alerts (fusion, or clustering), which are later classified using machine learning techniques into attacks or false alarms. We validate and report an implementation of this approach against the DARPA Challenge and the Scan of the Month, using three different classifications - SVMs, Bayesian Networks and Decision Trees - having achieved high levels of attack detection with little false positives. Our results also indicate that our approach outperforms other works when it comes to detecting new kinds of attacks, making it more suitable to a world of evolving attacks. © 2013 Stroeh et al.41116Joosen, W., Lagaisse, B., Truyen, E., Handekyn, K., Towards application driven security dashboards in future middleware (2012) J Internet Serv Appl, 3, pp. 107-115. , 10.1007/s13174-011-0047-6Hale, J., Brusil, P., Secur(e/ity) management: A continuing uphill climb (2007) J Netw Syst Manage, 15 (4), pp. 525-553Ganame, A.K., Bourgeois, J., Bidou, R., Spies, F., A global security architecture for intrusion detection on computer networks (2008) Elsevier Comput Secur, 27, pp. 30-47Giacinto, G., Perdisci, R., Roli, F., (2005) Alarm Clustering For Intrusion Detection Systems In Computer Networks, 19, pp. 429-438. , In: Perner P, Imiya A (eds)Ning, P., Cui, Y., Reeves, D.S., Xu, D., Techniques and tools for analyzing intrusion alerts (2004) ACM Trans Inf Syst Secur (TISSEC), 7, pp. 274-318Boyer, S., Dain, O., Cunningham, R., Stellar: A fusion system for scenario construction and security risk assessment (2005) Proceedings of the Third IEEE International Workshop On Information Assurance, pp. 105-116. , IEEE Computer SocietyJulisch, K., Clustering intrusion detection alarms to support root cause analysis (2003) ACM Trans Inf Syst Security, 6, pp. 443-471Liu, P., Zang, W., Yu, M., Incentive-based modeling and inference of attacker intent, objectives, and strategies (2005) ACM Trans Inf Syst Secur (TISSEC), 8, pp. 78-118Sabata, B., Evidence aggregation in hierarchical evidential reasoning (2005) UAI Applications Workshop, Uncertainty In AI 2005, , Edinburgh, ScotlandChyssler, T., Burschka, S., Semling, M., Lingvall, T., Burbeck, K., Alarm reduction and correlation in intrusion detection systems (2004) Detection of Intrusions and Malware & Vulnerability Assessment Workshop (DIMVA), pp. 9-24. , Dortmund, DeutschlandOhta, S., Kurebayashi, R., Kobayashi, K., Minimizing false positives of a decision tree classifier for intrusion detection on the internet (2008) J Netw Syst Manage, 16, pp. 399-419Haines, J.W., Lippmann, R.P., Fried, D.J., Tran, E., Boswell, S., Zissman, M.A., The 1999 darpa off-line intrusion detection evaluation (2000) Comput Netw Int J Comput Telecommunications Netw, 34, pp. 579-595Project, T.H., (2004) Know Your Enemy: Learning About Security Threats, , (2nd Edition). Addison-Wesley ProfessionalSommer, R., Paxson, V., Outside the closed world: On using machine learning for network intrusion detection (2010) Proceedings of the IEEE Symposium On Security and PrivacyBowen, T., Chee, D., Segal, M., Sekar, R., Shanbhag, T., Uppuluri, P., Building survivable systems: An integrated approach based on intrustion detection and damage containment (2000) DARPA Information Survivability Conference (DISCEX)Vigna, G., Eckmann, S.T., Kemmerer, R.A., The stat tool suite (2000) Proceedings of DISCEX 2000, , Hilton Head, IEEE Computer Society PressLee, W., Stolfo, S.J., Chan, P.K., Eskin, E., Fan, W., Miller, M., Hershkop, S., Zhang, J., Real time data mining-based intrusion detection (2001) Proc. Second DARPA Information Survivability Conference and Exposition, pp. 85-100. , Anaheim, USANeumann, P.G., Porras, P.A., Experience with EMERALD to date (2005) Proceedings 1st USENIX Workshop On Intrusion Detection and Network Monitoring, pp. 73-80. , Santa Clara, CA, USAGrimaila, M., Myers, J., Mills, R., Peterson, G., Design and analysis of a dynamically configured log-based distributed security event detection methodology (2011) J Defense Model Simul: Appl Methodolgy Tech, pp. 1-23Rieke, R., Stoynova, Z., Predictive security analysis for eventdriven processes (2010) MMM-ACNS'10 Proceedings of the 5th International Conference On Mathematical Methods, models and architectures for computer network securityValdes, A., Skinner, K., Probabilistic alert correlation (2001) Proceedings of the 4th International Symposium On Recent Advances In Intrusion Detection (RAID 2001), pp. 54-68. , Davis, CA, USAAsif-Iqbal, H., Udzir, N.I., Mahmod, R., Ghani, A.A.A., Filtering events using clustering in heterogeneous security logs (2011) Inf Technol J, 10, pp. 798-806Corona, I., Giacinto, G., Mazzariello, C., Roli, F., Sansone, C., Information fusion for computer security: State of the art and open issues (2011) Inf Fusion, 10, pp. 274-284Burroughs, D.J., Wilson, L.F., Cybenko, G.V., Analysis of distributed intrusion detection systems using bayesian methods (2002) Proceedings of IEEE International Performance Computing and Communication Conference, pp. 329-334. , Phoenix, AZ, USASabata, B., Ornes, C., Multisource evidence fusion for cyber-situation assessment (2006) Proc. SPIE, 6242. , (Apr. 18, 2006). Orlando, FL, USAEndsley, M.R., Toward a theory of situation awareness in dynamic systems (1995) Human Factors: J Human Factor Ergon Soc, 37, pp. 32-64Debar, H., Curry, D., Feinstein, B., The intrusion detection message exchange format (idmef) (2007) Internet Experimental RFC, p. 4765. , http://tools.ietf.org/html/rfc4765, Available atLan, F., Chunlei, W., Guoqing, M., A framework for network security situation awareness based on knowledge discovery (2010) Computer Engineering and Technology (ICCET)Cox, K., Gerg, C., (2004) Managing Security With Snort and IDS Tools, , O'Reilly Media, SebastopolAlfedaghi, S., Mahdi, F., Events classification in log audit (2010) Int J Netw Secur Appl (IJNSA), 2, pp. 58-73Valdes, A., Skinner, K., International, S., Adaptive, model-based monitoring for cyber attack detection (2000) Recent Advances In Intrusion Detection (RAID 2000), pp. 80-92. , Springer-VerlagMahoney, M.V., Chan, P.K., Learning nonstationary models of normal network traffic for detecting novel attacks (2002) Proceedings of the Eighth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, pp. 376-385. , ACMMukkamala, S., Sung, A.H., Abraham, A., Intrusion detection using ensemble of soft computing (2003) Paradigms, Advances in Soft Computing, pp. 239-248. , Springer VerlagFaraoun, K.M., Boukelif, A., Securing network traffic using genetically evolved transformations (2006) Malays J Comput Sci, 19 (1), pp. 9-28. , (ISSN 0127-9084)Faraoun, K.M., Boukelif, A., Neural networks learning improvement using the k-means clustering algorithm to detect network intrusions (2006) Int J Comput Intell Appl, 6 (1), pp. 77-99Tandon, G., Chan, P., Learning rules from system call arguments and sequences for anomaly detection (2003) ICDM Workshop On Data Mining For Computer Security (DMSEC), pp. 20-29. , Melbourne, FL, USAMukkamala, S., Sung, A.H., Feature ranking and selection for intrusion detection systems using support vector machines (2002) Proceedings of the Second Digital Forensic Research WorkshopChang, C.C., Lin, C.J., (2001) LIBSVM: A Library For Support Vector Machines, , http://www.csie.ntu.edu.tw/cjlin/libsvm, Available atHsu, W.C., Chang, C.C., Lin, J.C., (2007) A Practical Guide to Support Vector Classification, , http://www.csie.ntu.edu.tw/cjlin, tech. rep., Department of Computer Science, National Taiwan University. Available atWitten, I.H., Frank, E., (2000) Data Mining: Practical Machine Learning Tools and Techniques, , (Second Edition), Morgan KaufmannKayacik, H.G., Zincir-Heywood, A.N., (2003) Using Intrusion Detection Systems With a Firewall: Evaluation On Darpa 99 Dataset, , Tech. rep., NIMS Technical Report 06200
    corecore