1,690 research outputs found

    Towards Enhancement of Machine Learning Techniques Using CSE-CIC-IDS2018 Cybersecurity Dataset

    Get PDF
    In machine learning, balanced datasets play a crucial role in the bias observed towards classification and prediction. The CSE-CIC IDS datasets published in 2017 and 2018 have both attracted considerable scholarly attention towards research in intrusion detection systems. Recent work published using this dataset indicates little attention paid to the imbalance of the dataset. The study presented in this paper sets out to explore the degree to which imbalance has been treated and provide a taxonomy of the machine learning approaches developed using these datasets. A survey of published works related to these datasets was done to deliver a combined qualitative and quantitative methodological approach for our analysis towards deriving a taxonomy. The research presented here confirms that the impact of bias due to the imbalance datasets is rarely addressed. This data supports further research and development of supervised machine learning techniques that reduce bias in classification or prediction due to these imbalance datasets. This study\u27s experiment is to train the model using the train, and test split function from sci-kit learn library on the CSE-CIC-IDS2018. The system needs to be trained by a learning algorithm to accomplish this. There are many machine learning algorithms available and presented by the literature. Among which there are three types of classification based Supervised ML techniques which are used in our study: 1) KNN, 2) Random Forest (RF) and 3) Logistic Regression (LR). This experiment also determines how each of the dataset\u27s 67 preprocessed features affects the ML model\u27s performance. Feature drop selection is performed in two ways, independent and group drop. Experimental results generate the threshold values for each classifier and performance metric values such as accuracy, precision, recall, and F1-score. Also, results are generated from the comparison of manual feature drop methods. A good amount of drop is noticed in the group for most of the classifiers

    The Effect of Dual Hyperparameter Optimization on Software Vulnerability Prediction Models

    Get PDF
    Background: Prediction of software vulnerabilities is a major concern in the field of software security. Many researchers have worked to construct various software vulnerability prediction (SVP) models. The emerging machine learning domain aids in building effective SVP models. The employment of data balancing/resampling techniques and optimal hyperparameters can upgrade their performance. Previous research studies have shown the impact of hyperparameter optimization (HPO) on machine learning algorithms and data balancing techniques. Aim: The current study aims to analyze the impact of dual hyperparameter optimization on metrics-based SVP models. Method: This paper has proposed the methodology using the python framework Optuna that optimizes the hyperparameters for both machine learners and data balancing techniques. For the experimentation purpose, we have compared six combinations of five machine learners and five resampling techniques considering default parameters and optimized hyperparameters. Results: Additionally, the Wilcoxon signed-rank test with the Bonferroni correction method was implied, and observed that dual HPO performs better than HPO on learners and HPO on data balancers. Furthermore, the paper has assessed the impact of data complexity measures and concludes that HPO does not improve the performance of those datasets that exhibit high overlap. Conclusion: The experimental analysis unveils that dual HPO is 64% effective in enhancing the productivity of SVP models

    ReCon: Revealing and Controlling PII Leaks in Mobile Network Traffic

    Get PDF
    It is well known that apps running on mobile devices extensively track and leak users' personally identifiable information (PII); however, these users have little visibility into PII leaked through the network traffic generated by their devices, and have poor control over how, when and where that traffic is sent and handled by third parties. In this paper, we present the design, implementation, and evaluation of ReCon: a cross-platform system that reveals PII leaks and gives users control over them without requiring any special privileges or custom OSes. ReCon leverages machine learning to reveal potential PII leaks by inspecting network traffic, and provides a visualization tool to empower users with the ability to control these leaks via blocking or substitution of PII. We evaluate ReCon's effectiveness with measurements from controlled experiments using leaks from the 100 most popular iOS, Android, and Windows Phone apps, and via an IRB-approved user study with 92 participants. We show that ReCon is accurate, efficient, and identifies a wider range of PII than previous approaches.Comment: Please use MobiSys version when referencing this work: http://dl.acm.org/citation.cfm?id=2906392. 18 pages, recon.meddle.mob

    An Integrated Cybersecurity Risk Management (I-CSRM) Framework for Critical Infrastructure Protection

    Get PDF
    Risk management plays a vital role in tackling cyber threats within the Cyber-Physical System (CPS) for overall system resilience. It enables identifying critical assets, vulnerabilities, and threats and determining suitable proactive control measures to tackle the risks. However, due to the increased complexity of the CPS, cyber-attacks nowadays are more sophisticated and less predictable, which makes risk management task more challenging. This research aims for an effective Cyber Security Risk Management (CSRM) practice using assets criticality, predication of risk types and evaluating the effectiveness of existing controls. We follow a number of techniques for the proposed unified approach including fuzzy set theory for the asset criticality, machine learning classifiers for the risk predication and Comprehensive Assessment Model (CAM) for evaluating the effectiveness of the existing controls. The proposed approach considers relevant CSRM concepts such as threat actor attack pattern, Tactic, Technique and Procedure (TTP), controls and assets and maps these concepts with the VERIS community dataset (VCDB) features for the purpose of risk predication. Also, the tool serves as an additional component of the proposed framework that enables asset criticality, risk and control effectiveness calculation for a continuous risk assessment. Lastly, the thesis employs a case study to validate the proposed i-CSRM framework and i-CSRMT in terms of applicability. Stakeholder feedback is collected and evaluated using critical criteria such as ease of use, relevance, and usability. The analysis results illustrate the validity and acceptability of both the framework and tool for an effective risk management practice within a real-world environment. The experimental results reveal that using the fuzzy set theory in assessing assets' criticality, supports stakeholder for an effective risk management practice. Furthermore, the results have demonstrated the machine learning classifiers’ have shown exemplary performance in predicting different risk types including denial of service, cyber espionage, and Crimeware. An accurate prediction can help organisations model uncertainty with machine learning classifiers, detect frequent cyber-attacks, affected assets, risk types, and employ the necessary corrective actions for its mitigations. Lastly, to evaluate the effectiveness of the existing controls, the CAM approach is used, and the result shows that some controls such as network intrusion, authentication, and anti-virus show high efficacy in controlling or reducing risks. Evaluating control effectiveness helps organisations to know how effective the controls are in reducing or preventing any form of risk before an attack occurs. Also, organisations can implement new controls earlier. The main advantage of using the CAM approach is that the parameters used are objective, consistent and applicable to CPS
    corecore