3 research outputs found

    Robust Mobile Malware Detection

    Get PDF
    The increasing popularity and use of smartphones and hand-held devices have made them the most popular target for malware attackers. Researchers have proposed machine learning-based models to automatically detect malware attacks on these devices. Since these models learn application behaviors solely from the extracted features, choosing an appropriate and meaningful feature set is one of the most crucial steps for designing an effective mobile malware detection system. There are four categories of features for mobile applications. Previous works have taken arbitrary combinations of these categories to design models, resulting in sub-optimal performance. This thesis systematically investigates the individual impact of these feature categories on mobile malware detection systems. Feature categories that complement each other are investigated and categories that add redundancy to the feature space (thereby degrading the performance) are analyzed. In the process, the combination of feature categories that provides the best detection results is identified. Ensuring reliability and robustness of the above-mentioned malware detection systems is of utmost importance as newer techniques to break down such systems continue to surface. Adversarial attack is one such evasive attack that can bypass a detection system by carefully morphing a malicious sample even though the sample was originally correctly identified by the same system. Self-crafted adversarial samples can be used to retrain a model to defend against such attacks. However, randomly using too many such samples, as is currently done in the literature, can further degrade detection performance. This work proposed two intelligent approaches to retrain a classifier through the intelligent selection of adversarial samples. The first approach adopts a distance-based scheme where the samples are chosen based on their distance from malware and benign cluster centers while the second selects the samples based on a probability measure derived from a kernel-based learning method. The second method achieved a 6% improvement in terms of accuracy. To ensure practical deployment of malware detection systems, it is necessary to keep the real-world data characteristics in mind. For example, the benign applications deployed in the market greatly outnumber malware applications. However, most studies have assumed a balanced data distribution. Also, techniques to handle imbalanced data in other domains cannot be applied directly to mobile malware detection since they generate synthetic samples with broken functionality, making them invalid. In this regard, this thesis introduces a novel synthetic over-sampling technique that ensures valid sample generation. This technique is subsequently combined with a dynamic cost function in the learning scheme that automatically adjusts minority class weight during model training which counters the bias towards the majority class and stabilizes the model. This hybrid method provided a 9% improvement in terms of F1-score. Aiming to design a robust malware detection system, this thesis extensively studies machine learning-based mobile malware detection in terms of best feature category combination, resilience against evasive attacks, and practical deployment of detection models. Given the increasing technological advancements in mobile and hand-held devices, this study will be very useful for designing robust cybersecurity systems to ensure safe usage of these devices.Doctor of Philosoph

    Selective adversarial learning for mobile malware

    No full text
    Machine learning models, including deep neural networks, have been shown to be vulnerable to adversarial attacks. Adversarial samples are crafted from legitimate inputs by carefully introducing small perturbation to the input so that the classifier is fooled. Adversarial retraining, which involves retraining the classifier using adversarial samples, has been shown to improve the robustness of the classifier against adversarial attacks. However, it has been also shown that retraining with too many samples can lead to performance degradation. Hence, a careful selection of the adversarial samples that are used to retrain the classifier is necessary, yet existing works select these samples in a randomized fashion. In our work, we propose two novel approaches for selecting adversarial samples: based on the distance from cluster center of malware and based on the probability derived from a kernel based learning (KBL). Our experiment results show that both of our selective mechanisms for adversarial retraining outperform the random selection technique and significantly improve the classifier performance against adversarial attacks. In particular, selection with KBL delivers above 6% improvement in detection accuracy compared to random selection. The method proposed here has greater impact in designing robust machine learning system for security applications

    Selective adversarial learning for mobile malware

    No full text
    Imam, T ORCiD: 0000-0002-8864-4155Machine learning models, including deep neural networks, have been shown to be vulnerable to adversarial attacks. Adversarial samples are crafted from legitimate inputs by carefully introducing small perturbation to the input so that the classifier is fooled. Adversarial retraining, which involves retraining the classifier using adversarial samples, has been shown to improve the robustness of the classifier against adversarial attacks. However, it has been also shown that retraining with too many samples can lead to performance degradation. Hence, a careful selection of the adversarial samples that are used to retrain the classifier is necessary, yet existing works select these samples in a randomized fashion. In our work, we propose two novel approaches for selecting adversarial samples: based on the distance from cluster center of malware and based on the probability derived from a kernel based learning (KBL). Our experiment results show that both of our selective mechanisms for adversarial retraining outperform the random selection technique and significantly improve the classifier performance against adversarial attacks. In particular, selection with KBL delivers above 6% improvement in detection accuracy compared to random selection. The method proposed here has greater impact in designing robust machine learning system for security applications
    corecore