485 research outputs found

    Malware Analysis and Detection with Explainable Machine Learning

    Get PDF
    Malware detection is one of the areas where machine learning is successfully employed due to its high discriminating power and the capability of identifying novel variants of malware samples. Typically, the problem formulation is strictly correlated to the use of a wide variety of features covering several characteristics of the entities to classify. Apparently, this practice allows achieving considerable detection performance. However, it hardly permits us to gain insights into the knowledge extracted by the learning algorithm, causing two main issues. First, detectors might learn spurious patterns; thus, undermining their effectiveness in real environments. Second, they might be particularly vulnerable to adversarial attacks; thus, weakening their security. These concerns give rise to the necessity to develop systems that are tailored to the specific peculiarities of the attacks to detect. Within malware detection, Android ransomware represents a challenging yet illustrative domain for assessing the relevance of this issue. Ransomware represents a serious threat that acts by locking the compromised device or encrypting its data, then forcing the device owner to pay a ransom in order to restore the device functionality. Attackers typically develop such dangerous apps so that normally-legitimate components and functionalities perform malicious behaviour; thus, making them harder to be distinguished from genuine applications. In this sense, adopting a well-defined variety of features and relying on some kind of explanations about the logic behind such detectors could improve their design process since it could reveal truly characterising features; hence, guiding the human expert towards the understanding of the most relevant attack patterns. Given this context, the goal of the thesis is to explore strategies that may improve the design process of malware detectors. In particular, the thesis proposes to evaluate and integrate approaches based on rising research on Explainable Machine Learning. To this end, the work follows two pathways. The first and main one focuses on identifying the main traits that result to be characterising and effective for Android ransomware detection. Then, explainability techniques are used to propose methods to assess the validity of the considered features. The second pathway broadens the view by exploring the relationship between explainable machine learning and adversarial attacks. In this regard, the contribution consists of pointing out metrics extracted from explainability techniques that can reveal models' robustness to adversarial attacks, together with an assessment of the practical feasibility for attackers to alter the features that affect models' output the most. Ultimately, this work highlights the necessity to adopt a design process that is aware of the weaknesses and attacks against machine learning-based detectors, and proposes explainability techniques as one of the tools to counteract them

    A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization

    Full text link
    Existing Android malware detection approaches use a variety of features such as security sensitive APIs, system calls, control-flow structures and information flows in conjunction with Machine Learning classifiers to achieve accurate detection. Each of these feature sets provides a unique semantic perspective (or view) of apps' behaviours with inherent strengths and limitations. Meaning, some views are more amenable to detect certain attacks but may not be suitable to characterise several other attacks. Most of the existing malware detection approaches use only one (or a selected few) of the aforementioned feature sets which prevent them from detecting a vast majority of attacks. Addressing this limitation, we propose MKLDroid, a unified framework that systematically integrates multiple views of apps for performing comprehensive malware detection and malicious code localisation. The rationale is that, while a malware app can disguise itself in some views, disguising in every view while maintaining malicious intent will be much harder. MKLDroid uses a graph kernel to capture structural and contextual information from apps' dependency graphs and identify malice code patterns in each view. Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted combination of the views which yields the best detection accuracy. Besides multi-view learning, MKLDroid's unique and salient trait is its ability to locate fine-grained malice code portions in dependency graphs (e.g., methods/classes). Through our large-scale experiments on several datasets (incl. wild apps), we demonstrate that MKLDroid outperforms three state-of-the-art techniques consistently, in terms of accuracy while maintaining comparable efficiency. In our malicious code localisation experiments on a dataset of repackaged malware, MKLDroid was able to identify all the malice classes with 94% average recall

    Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research

    Get PDF
    This survey presents a comprehensive review of current literature on Explainable Artificial Intelligence (XAI) methods for cyber security applications. Due to the rapid development of Internet-connected systems and Artificial Intelligence in recent years, Artificial Intelligence including Machine Learning and Deep Learning has been widely utilized in the fields of cyber security including intrusion detection, malware detection, and spam filtering. However, although Artificial Intelligence-based approaches for the detection and defense of cyber attacks and threats are more advanced and efficient compared to the conventional signature-based and rule-based cyber security strategies, most Machine Learning-based techniques and Deep Learning-based techniques are deployed in the “black-box” manner, meaning that security experts and customers are unable to explain how such procedures reach particular conclusions. The deficiencies of transparencies and interpretability of existing Artificial Intelligence techniques would decrease human users’ confidence in the models utilized for the defense against cyber attacks, especially in current situations where cyber attacks become increasingly diverse and complicated. Therefore, it is essential to apply XAI in the establishment of cyber security models to create more explainable models while maintaining high accuracy and allowing human users to comprehend, trust, and manage the next generation of cyber defense mechanisms. Although there are papers reviewing Artificial Intelligence applications in cyber security areas and the vast literature on applying XAI in many fields including healthcare, financial services, and criminal justice, the surprising fact is that there are currently no survey research articles that concentrate on XAI applications in cyber security. Therefore, the motivation behind the survey is to bridge the research gap by presenting a detailed and up-to-date survey of XAI approaches applicable to issues in the cyber security field. Our work is the first to propose a clear roadmap for navigating the XAI literature in the context of applications in cyber security

    Android Malware Characterization using Metadata and Machine Learning Techniques

    Get PDF
    Android Malware has emerged as a consequence of the increasing popularity of smartphones and tablets. While most previous work focuses on inherent characteristics of Android apps to detect malware, this study analyses indirect features and meta-data to identify patterns in malware applications. Our experiments show that: (1) the permissions used by an application offer only moderate performance results; (2) other features publicly available at Android Markets are more relevant in detecting malware, such as the application developer and certificate issuer, and (3) compact and efficient classifiers can be constructed for the early detection of malware applications prior to code inspection or sandboxing.Comment: 4 figures, 2 tables and 8 page
    • …
    corecore