3 research outputs found

    Feature Selection on Permissions, Intents and APIs for Android Malware Detection

    Get PDF
    Malicious applications pose an enormous security threat to mobile computing devices. Currently 85% of all smartphones run Android, Google’s open-source operating system, making that platform the primary threat vector for malware attacks. Android is a platform that hosts roughly 99% of known malware to date, and is the focus of most research efforts in mobile malware detection due to its open source nature. One of the main tools used in this effort is supervised machine learning. While a decade of work has made a lot of progress in detection accuracy, there is an obstacle that each stream of research is forced to overcome, feature selection, i.e., determining which attributes of Android are most effective as inputs into machine learning models. This dissertation aims to address that problem by providing the community with an exhaustive analysis of the three primary types of Android features used by researchers: Permissions, Intents and API Calls. The intent of the report is not to describe a best performing feature set or a best performing machine learning model, nor to explain why certain Permissions, Intents or API Calls get selected above others, but rather to provide a holistic methodology to help guide feature selection for Android malware detection. The experiments used eleven different feature selection techniques covering filter methods, wrapper methods and embedded methods. Each feature selection technique was applied to seven different datasets based on the seven combinations available of Permissions, Intents and API Calls. Each of those seven datasets are from a base set of 119k Android apps. All of the result sets were then validated against three different machine learning models, Random Forest, SVM and a Neural Net, to test applicability across algorithm type. The experiments show that using a combination of Permissions, Intents and API Calls produced higher accuracy than using any of those alone or in any other combination and that feature selection should be performed on the combined dataset, not by feature type and then combined. The data also shows that, in general, a feature set size of 200 or more attributes is required for optimal results. Finally, the feature selection methods Relief, Correlation-based Feature Selection (CFS) and Recursive Feature Elimination (RFE) using a Neural Net are not satisfactory approaches for Android malware detection work. Based on the proposed methodology and experiments, this research provided insights into feature selection – a significant but often overlooked issue in Android malware detection. We believe the results reported herein is an important step for effective feature evaluation and selection in assisting malware detection especially for datasets with a large number of features. The methodology also has the potential to be applied to similar malware detection tasks or even in broader domains such as pattern recognition

    A Bayesian-Network-Based Framework for Risk Analysis and Decision Making in Cybersecurity

    Get PDF
    PhD ThesesMany approaches have been proposed to define, measure and manage cybersecurity risk. A common theme underpinning Cybersecurity Risk Assessment (CRA) involves modelling relationships between risk factors and the use of statistical and probabilistic inference to calculate risk. This thesis focuses on the use of Bayesian Networks (BNs) for this dual purpose. The application of BNs to CRA was a nontrivial task while with the computational efficiency and flexibility of BN algorithms has improved such that they can now be widely applied to solve a variety of CRA problems. One such advance is in Hybrid Bayesian Networks (HBNs) to support inference in models containing discrete and continuous variables. HBNs are now routinely used for prediction and diagnostic inference tasks and have been extended, in the form of Influence Diagrams (IDs), to support decision making tasks. This thesis proposes an HBN based CRA framework for comprehensive cybersecurity causal risk analysis and probabilistic calculation. We introduce causal risk analysis into cybersecurity problems and use a kill chain model to illustrate how causal analysis can guide the cybersecurity risk modelling. The proposed framework is flexible and extensible in a way that it can incorporate other CRA models built using BNs. We illustrate this by showing how the framework can incorporate risk analysis models of both organizational and technical perspectives. For organizational risk analysis, where the focus is on defending information assets/systems of organizations in an economically efficient way, the thesis shows how BNs can be used for modelling causal/probabilistic relationship between involved variables and conducting risk assessment. For technical risk analysis, which is motived by the perspective of cybersecurity analysts, it argues that IDs can be used to model the game between the defender and the attacker in a cybersecurity problem, calculate risks and support designing optimal cyber defenses dynamically
    corecore